Your RAG system is hallucinating even though the correct context was retrieved. How do you debug it?
Check that the retrieved chunk actually contains the answer and survived the context window without truncation, then inspect prompt construction and instructions telling the model to answer only from context. Add grounding and citation requirements, lower temperature, and use a faithfulness metric or judge to verify the answer is entailed by the retrieved text, also checking for conflicting context or parametric-knowledge override.
How to think about it
Check that the retrieved chunk actually contains the answer and survived the context window without truncation, then inspect prompt construction and instructions telling the model to answer only from context. Add grounding and citation requirements, lower temperature, and use a faithfulness metric or judge to verify the answer is entailed by the retrieved text, also checking for conflicting context or parametric-knowledge override.