Why this needs reviewThe response is not strongly supported by retrieved evidence (best hybrid 0.41; strong threshold 0.52).
- Best grounding score: 0.41
- Sentence match: 0.18
- Keyword overlap: 0.60
Full scorer narrative
The response is not strongly supported by retrieved evidence (best hybrid 0.41; strong threshold 0.52).
Next steps
- Inspect retrieved docs below — scores blend best-sentence similarity with keyword overlap.
- Tighten retrieval or add citations in the agent prompt.
What we measured
- Hybrid grounding (per chunk): best 0.41 (strong ≥ 0.52)
- Mean hybrid across chunks: 0.41
- Best raw sentence match: 0.18 (short answers often score low vs. whole paragraphs)
- Lexical overlap with sources: 0.60
- Blend: short-answer blend (45% sentence + 55% keyword)
- Weak / strong cutoffs: 0.35 / 0.52
EvidencePrompt & response