Skip to content
GAIEA
Evaluation and Monitoring
hard
Question 5 of 24

An ML team runs Agent Evaluation on their RAG chatbot's evaluation set. 30% of responses fail quality checks, and the root cause analysis shows failures concentrated in the groundedness judge. The team confirms the LLM is following the system prompt grounding instructions correctly. What is the most likely root cause and recommended fix?

AThe LLM's temperature setting is too high, causing random fabrications despite grounding instructions.
BThe retrieval step is returning documents that do not contain information relevant to the query, so the LLM cannot be grounded and may hallucinate or decline to answer.
CThe evaluation set's expected_response fields are too strict, causing legitimate answers to be marked as failing.
DThe MLflow experiment is logging to the wrong workspace, causing metric calculation errors.

Educational Content — CertQnA practice questions are written against official exam objectives, covering the same domains tested on the real exam. All content is original and independent — not actual exam questions, not affiliated with any certification vendor. Learn more about our content policy

Discussion

Be the first to share your understanding of this concept

⚠️ Discussion is for concept clarification only. Do not share or request actual exam questions or answers.

Sign in to join the discussion