Evaluation and Monitoring Questions
Practice questions for Evaluation and Monitoring topic in Databricks Certified Generative AI Engineer Associate. 24 questions covering this domain.
An ML team is building an evaluation set for their RAG customer support agent. Which optional fields in the evaluation schema unlock additional LLM ju...
A team wants domain experts to review and provide quality feedback on an AI agent's responses before deploying to production. Which Databricks feature...
In Mosaic AI Agent Evaluation, which quality dimension does the "groundedness" LLM judge evaluate?
A developer wants to use `mlflow.evaluate()` to assess a Databricks AI agent using Mosaic AI Agent Evaluation's LLM judges. Which `model_type` value m...
An ML team runs Agent Evaluation on their RAG chatbot's evaluation set. 30% of responses fail quality checks, and the root cause analysis shows failur...
What is the key difference between offline evaluation and online monitoring in Mosaic AI Agent Evaluation?
A team has run Agent Evaluation with an evaluation set of 200 questions. Analysis of the `eval_results` table shows that the `response/llm_judged/chun...
In Mosaic AI Agent Evaluation, which LLM judge assesses whether the retrieved context chunks contain enough information to answer the user's question?
A team building a document Q&A agent wants to evaluate whether their retrieval pipeline is returning the right documents, independent of the quality o...
A team wants to evaluate whether their RAG agent's responses adhere to a set of organizational communication guidelines (e.g., "Responses must be prof...
A team is iterating on their RAG application and wants to compare the quality of two different system prompts across the same evaluation set. They hav...
A production AI agent has been deployed for three weeks. The team notices that the `response/llm_judged/groundedness/percentage` metric in their monit...
In Mosaic AI Agent Evaluation, which judge is always run regardless of whether the evaluation record includes a ground-truth expected response?
An ML team uses `mlflow.evaluate()` with `model_type="databricks-agent"` and discovers that the `response/llm_judged/correctness/percentage` metric is...
A developer wants to access the per-request judge assessment results after running `mlflow.evaluate()` with `model_type="databricks-agent"`. Which att...
A practitioner has already written a scorer they want to run continuously on production traces. According to Databricks MLflow 3 monitoring, what is t...
Which MLflow capability underpins both development-time evaluation and production monitoring for GenAI apps on Databricks?
A team stores its traces in Unity Catalog and wants to turn on MLflow 3 production monitoring. What additional prerequisite must be configured?
A team wants a custom production scorer to run reliably in MLflow 3 monitoring. Which implementation style is supported?
A team wants subject matter experts to rate real responses from an agent and turn that feedback into evaluation data for future iterations. Which Data...
Sign in to see all 24 questions
Create a free account to browse all questions — completely free during our launch phase.