Implement generative AI quality assurance and observability Questions

Practice questions for Implement generative AI quality assurance and observability topic in Microsoft Certified: Machine Learning Operations (MLOps) Engineer Associate. 30 questions covering this domain.

30 questions7 easy15 medium8 hard

medium

A team wants automated evaluation runs that use built-in metrics but also include one company-specific scoring rule. Which design best matches AI-300?

medium

A safety reviewer needs to check whether an application produces harmful or unsafe content in response to prompts. Which evaluation area should be con...

easy

A generative AI team wants to measure whether answers are actually supported by the retrieved source material rather than invented by the model. Which...

hard

A platform owner wants to control generative AI spending in production by measuring usage patterns tied directly to inference activity. Which metrics ...

hard

An agent occasionally fails only on certain prompt paths, and the team needs a way to follow model calls, logging details, and debugging traces across...

medium

A production support team wants to watch request performance trends for an agent service. Which metrics are explicitly in scope for AI-300 observabili...

medium

Before launching a new agent, a team wants evaluation results that reflect realistic prompts and expected mappings between test input and reference co...

medium

A team wants distributed tracing for a prompt flow application so they can inspect each tool call and span across requests. Which observability stack ...

easy

Which built-in evaluator measures whether the generated answer addresses the user's question?

Q10

medium

A safety team wants to systematically test a deployed model against adversarial prompts that probe for jailbreaks and harmful content. Which Microsoft...

Q11

medium

A production support team wants quality metrics computed on real production traffic, not just on offline test sets, so regressions are detected after ...

Q12

hard

An MLOps engineer must export prompt flow telemetry (traces, request/response, tool spans) to a centralized log store for cross-team analysis. Which c...

Q13

hard

An evaluator job reports inconsistent groundedness scores across reruns of the same dataset. Which configuration most often causes this and should be ...

Q14

easy

Which built-in Azure AI evaluation metric judges how natural and grammatically correct generated text reads, independent of factual support?

Q15

medium

A team wants their CI pipeline to fail automatically when a new prompt change causes the average coherence score on a held-out dataset to drop below 3...

Q16

medium

A platform team wants to count the total number of tokens consumed by a deployed Azure OpenAI model over the past week to validate budget assumptions....

Q17

easy

Which Azure AI Foundry capability runs a batch of test cases through a flow and computes quality metrics across the entire dataset in a single automat...

Q18

easy

Which built-in Azure AI evaluation metric measures how closely a generated answer matches a ground-truth reference answer?

Q19

hard

A production RAG application's groundedness scores have declined over the past two weeks despite the prompt and model remaining unchanged. The team su...

Q20

medium

A safety team wants to proactively test an agent for jailbreak vulnerabilities by generating adversarial test prompts at scale without manually writin...

Sign in to see all 30 questions

Create a free account to browse all questions — completely free during our launch phase.