Is the system healthy right now? Summary first — details in the table.
Runs a tiny deterministic scoring task on the server (async job, polled from the browser). Use the same secret as ADMIN_API_KEY on the API. Only share this with operators you trust; it grants access to admin routes.
Drift and regressions in the selected window.
Count of traces by hallucination risk band.
GOOD+RISKY vs FAIL classification.
No high-severity anomalies in the last 5 minutes.
Same evaluation slice — compare grounding, risk, latency, and consistency. One color per model in charts.
All models included — click a chip to compare a subset.
gpt-4o-mini (score by balanced)
Weighted by the metric selector above.
gpt-4o-mini (1293 ms)
Lowest average latency in this slice.
gpt-4o-mini (risk 0.54)
Lowest mean hallucination risk.
gpt-4o-mini (σ rel 0.193)
Lowest reliability σ (needs 2+ traces).
Each line is one model’s reliability over time.
Upper-left is best — high reliability, low latency.
| Model | n | Avg G | Avg rel | Avg risk | Avg ms | σ rel | Tradeoff |
|---|---|---|---|---|---|---|---|
| gpt-4o-mini | 10 | 0.462 | 0.543 | 0.538 | 1293 | 0.1929 | 0.420 |
Open two trace detail pages in separate tabs and compare prompt, evidence, grounding, and verdict. A future release can link traces that share the same prompt hash or experiment case id.
| Status | ID | Agent | Model | Reliability | Risk | Grounding | ms | When |
|---|---|---|---|---|---|---|---|---|
| GOOD | bd8d59b0… | squad-eval-runner | gpt-4o-mini | 872 | 2026-03-26T04:47:23.401067+00:00 | |||
| RISKY | 097cfca9… | squad-eval-runner | gpt-4o-mini | 1766 | 2026-03-26T04:47:22.190408+00:00 | |||
| RISKY | 908cc176… | squad-eval-runner | gpt-4o-mini | 1373 | 2026-03-26T04:47:20.269925+00:00 | |||
| RISKY | a739e196… | squad-eval-runner | gpt-4o-mini | 881 | 2026-03-26T04:47:18.653557+00:00 | |||
| GOOD | 10ef6a96… | squad-eval-runner | gpt-4o-mini | 1233 | 2026-03-26T04:47:17.439296+00:00 | |||
| GOOD | d82b9243… | squad-eval-runner | gpt-4o-mini | 684 | 2026-03-26T04:47:16.059313+00:00 | |||
| RISKY | 100f029f… | squad-eval-runner | gpt-4o-mini | 1702 | 2026-03-26T04:47:15.015926+00:00 | |||
| RISKY | 3ff2841e… | squad-eval-runner | gpt-4o-mini | 1574 | 2026-03-26T04:47:12.960999+00:00 | |||
| GOOD | 34e75c8c… | squad-eval-runner | gpt-4o-mini | 1462 | 2026-03-26T04:47:10.975707+00:00 | |||
| GOOD | 59ce104f… | squad-eval-runner | gpt-4o-mini | 1382 | 2026-03-26T04:47:09.141293+00:00 |