Why this needs reviewRetrieved chunks align weakly with the response after answer-aware scoring (best hybrid 0.17 < 0.35).
- Best grounding score: 0.17
- Sentence match: 0.13
- Keyword overlap: 0.25
Full scorer narrative
Retrieved chunks align weakly with the response after answer-aware scoring (best hybrid 0.17 < 0.35).
Likely causes
- Retrieval returned irrelevant passages.
- The answer uses wording that still does not match any sentence closely.
What we measured
- Hybrid grounding (per chunk): best 0.17 (strong ≥ 0.52)
- Mean hybrid across chunks: 0.17
- Best raw sentence match: 0.13 (short answers often score low vs. whole paragraphs)
- Lexical overlap with sources: 0.25
- Blend: standard blend (70% best-sentence + 30% keyword)
- Weak / strong cutoffs: 0.35 / 0.52
EvidencePrompt & response