An ML team runs Agent Evaluation on their RAG chatbot's evaluation set. 30% of responses fail quality checks, and the root cause analysis shows failures concentrated in the groundedness judge. The team confirms the LLM is following the system prompt grounding instructions correctly. What is the most likely root cause and recommended fix?
temperature setting is too high, causing random fabrications despite grounding instructions.expected_response fields are too strict, causing legitimate answers to be marked as failing.More Evaluation and Monitoring Questions
24 questions
Full Databricks Certified Generative AI Engineer Associate Practice Test
All topics covered
All Databricks Certified Generative AI Engineer Associate Questions
Browse by topic
Related Questions
In Mosaic AI Agent Evaluation, which quality dimension does the "groundedness" LLM judge evaluate?...
A developer wants to use `mlflow.evaluate()` to assess a Databricks AI agent using Mosaic AI Agent E...
An ML team is building an evaluation set for their RAG customer support agent. Which optional fields...
What is the key difference between offline evaluation and online monitoring in Mosaic AI Agent Evalu...
A team wants domain experts to review and provide quality feedback on an AI agent's responses before...
Educational Content — CertQnA practice questions are written against official exam objectives, covering the same domains tested on the real exam. All content is original and independent — not actual exam questions, not affiliated with any certification vendor. Learn more about our content policy
Discussion
Be the first to share your understanding of this concept
Sign in to join the discussion