Evaluation and Monitoring Questions

Practice questions for Evaluation and Monitoring topic in Databricks Certified Generative AI Engineer Associate. 24 questions covering this domain.

24 questions5 easy15 medium4 hard

medium

An ML team is building an evaluation set for their RAG customer support agent. Which optional fields in the evaluation schema unlock additional LLM ju...

medium

A team wants domain experts to review and provide quality feedback on an AI agent's responses before deploying to production. Which Databricks feature...

easy

In Mosaic AI Agent Evaluation, which quality dimension does the "groundedness" LLM judge evaluate?

medium

A developer wants to use `mlflow.evaluate()` to assess a Databricks AI agent using Mosaic AI Agent Evaluation's LLM judges. Which `model_type` value m...

hard

An ML team runs Agent Evaluation on their RAG chatbot's evaluation set. 30% of responses fail quality checks, and the root cause analysis shows failur...

medium

What is the key difference between offline evaluation and online monitoring in Mosaic AI Agent Evaluation?

hard

A team has run Agent Evaluation with an evaluation set of 200 questions. Analysis of the `eval_results` table shows that the `response/llm_judged/chun...

easy

In Mosaic AI Agent Evaluation, which LLM judge assesses whether the retrieved context chunks contain enough information to answer the user's question?

hard

A team building a document Q&A agent wants to evaluate whether their retrieval pipeline is returning the right documents, independent of the quality o...

Q10

medium

A team wants to evaluate whether their RAG agent's responses adhere to a set of organizational communication guidelines (e.g., "Responses must be prof...

Q11

medium

A team is iterating on their RAG application and wants to compare the quality of two different system prompts across the same evaluation set. They hav...

Q12

medium

A production AI agent has been deployed for three weeks. The team notices that the `response/llm_judged/groundedness/percentage` metric in their monit...

Q13

easy

In Mosaic AI Agent Evaluation, which judge is always run regardless of whether the evaluation record includes a ground-truth expected response?

Q14

hard

An ML team uses `mlflow.evaluate()` with `model_type="databricks-agent"` and discovers that the `response/llm_judged/correctness/percentage` metric is...

Q15

medium

A developer wants to access the per-request judge assessment results after running `mlflow.evaluate()` with `model_type="databricks-agent"`. Which att...

Q16

medium

A practitioner has already written a scorer they want to run continuously on production traces. According to Databricks MLflow 3 monitoring, what is t...

Q17

easy

Which MLflow capability underpins both development-time evaluation and production monitoring for GenAI apps on Databricks?

Q18

medium

A team stores its traces in Unity Catalog and wants to turn on MLflow 3 production monitoring. What additional prerequisite must be configured?

Q19

medium

A team wants a custom production scorer to run reliably in MLflow 3 monitoring. Which implementation style is supported?

Q20

medium

A team wants subject matter experts to rate real responses from an agent and turn that feedback into evaluation data for future iterations. Which Data...

Sign in to see all 24 questions

Create a free account to browse all questions — completely free during our launch phase.