TraceDogTraceDog
ExperimentsOpen sourceAboutContactGitHub
DashboardRead docs

Trace health

Is the system healthy right now? Summary first — details in the table.

10 loaded

Admin: scoring smoke job

Runs a tiny deterministic scoring task on the server (async job, polled from the browser). Use the same secret as ADMIN_API_KEY on the API. Only share this with operators you trust; it grants access to admin routes.

Total traces10
Avg reliability0.54
Avg risk0.54
Failure rate0%

Reliability over time

Drift and regressions in the selected window.

Risk distribution

Count of traces by hallucination risk band.

Low1
Med6
High3

Success vs failure

GOOD+RISKY vs FAIL classification.

100%healthy
OK 10 Fail 0

No high-severity anomalies in the last 5 minutes.

By agent

squad-eval-runner
10 tracesHigh risk: 5Low rel: 3Fails: 0

Model comparison

Same evaluation slice — compare grounding, risk, latency, and consistency. One color per model in charts.

Models

All models included — click a chip to compare a subset.

Best overall

gpt-4o-mini (score by balanced)

Weighted by the metric selector above.

Fastest

gpt-4o-mini (1293 ms)

Lowest average latency in this slice.

Lowest risk

gpt-4o-mini (risk 0.54)

Lowest mean hallucination risk.

Most consistent

gpt-4o-mini (σ rel 0.193)

Lowest reliability σ (needs 2+ traces).

Reliability trend by model

Each line is one model’s reliability over time.

gpt-4o-mini

Quality vs latency

Upper-left is best — high reliability, low latency.

gpt-4o-mini: reliability 0.54, latency 1292.9msLatency →Reliability →

Average metrics by model

Normalized bars — hover for raw values.

gpt-4o-mini
Grounding
Reliability
Risk
Latency
n = 10
ModelnAvg GAvg relAvg riskAvg msσ relTradeoff
gpt-4o-mini100.4620.5430.53812930.19290.420
Advanced: side-by-side trace diff

Open two trace detail pages in separate tabs and compare prompt, evidence, grounding, and verdict. A future release can link traces that share the same prompt hash or experiment case id.

All matching traces

StatusIDAgentModelReliabilityRiskGroundingmsWhen
GOODbd8d59b0…squad-eval-runnergpt-4o-mini
0.66
0.40
0.60
8722026-03-26T04:47:23.401067+00:00
RISKY097cfca9…squad-eval-runnergpt-4o-mini
0.29
0.83
0.17
17662026-03-26T04:47:22.190408+00:00
RISKY908cc176…squad-eval-runnergpt-4o-mini
0.29
0.83
0.17
13732026-03-26T04:47:20.269925+00:00
RISKYa739e196…squad-eval-runnergpt-4o-mini
0.29
0.83
0.17
8812026-03-26T04:47:18.653557+00:00
GOOD10ef6a96…squad-eval-runnergpt-4o-mini
0.81
0.23
0.77
12332026-03-26T04:47:17.439296+00:00
GOODd82b9243…squad-eval-runnergpt-4o-mini
0.70
0.35
0.65
6842026-03-26T04:47:16.059313+00:00
RISKY100f029f…squad-eval-runnergpt-4o-mini
0.50
0.59
0.41
17022026-03-26T04:47:15.015926+00:00
RISKY3ff2841e…squad-eval-runnergpt-4o-mini
0.56
0.52
0.48
15742026-03-26T04:47:12.960999+00:00
GOOD34e75c8c…squad-eval-runnergpt-4o-mini
0.71
0.34
0.66
14622026-03-26T04:47:10.975707+00:00
GOOD59ce104f…squad-eval-runnergpt-4o-mini
0.62
0.45
0.55
13822026-03-26T04:47:09.141293+00:00