TraceDogTraceDog
ExperimentsOpen sourceAboutContactGitHub
DashboardRead docs
← Traces / Detail

Trace decision

Looks grounded

Evidence aligns well with the response.

Grounding0.60Strong
Risk0.40Medium
Reliability0.66High
Recommended actionSafe to use as signal
gpt-4o-mini872ms totalNormans2026-03-26T04:47:23.401067+00:00
Why it scored this way
Why this scored as grounded

Retrieved evidence aligns reasonably well with the response (best hybrid 0.60).

  • Best grounding score: 0.60
  • Sentence match: 0.47
  • Keyword overlap: 0.91
Full scorer narrative

Retrieved evidence aligns reasonably well with the response (best hybrid 0.60).

What we measured

  • Hybrid grounding (per chunk): best 0.60 (strong ≥ 0.52)
  • Mean hybrid across chunks: 0.60
  • Best raw sentence match: 0.47 (short answers often score low vs. whole paragraphs)
  • Lexical overlap with sources: 0.91
  • Blend: standard blend (70% best-sentence + 30% keyword)
  • Weak / strong cutoffs: 0.35 / 0.52
Evidence
Prompt & response
Grounding vs thresholds0.60
weak < 0.35review 0.35–0.52strong ≥ 0.52
Blend contribution
Sentence 0.47 (70%)Keyword 0.91 (30%)
Confidence trend

Series padded from this trace’s score — batch view shows drift.

Failure mix (this trace)

One bar — fleet-wide % needs aggregate metrics from the API.

  • Hallucination risk 0%
  • Low grounding 0%
  • Failures 0%
  • Healthy 100%
Execution runtime

Total 872ms

Retrieval218ms✓
LLM654ms✓
  • ✓Retrieval218ms
  • ✓LLM654ms
Debug