Implement generative AI quality assurance and observability Questions
Practice questions for Implement generative AI quality assurance and observability topic in Microsoft Certified: Machine Learning Operations (MLOps) Engineer Associate. 30 questions covering this domain.
A team wants automated evaluation runs that use built-in metrics but also include one company-specific scoring rule. Which design best matches AI-300?
A safety reviewer needs to check whether an application produces harmful or unsafe content in response to prompts. Which evaluation area should be con...
A generative AI team wants to measure whether answers are actually supported by the retrieved source material rather than invented by the model. Which...
A platform owner wants to control generative AI spending in production by measuring usage patterns tied directly to inference activity. Which metrics ...
An agent occasionally fails only on certain prompt paths, and the team needs a way to follow model calls, logging details, and debugging traces across...
A production support team wants to watch request performance trends for an agent service. Which metrics are explicitly in scope for AI-300 observabili...
Before launching a new agent, a team wants evaluation results that reflect realistic prompts and expected mappings between test input and reference co...
A team wants distributed tracing for a prompt flow application so they can inspect each tool call and span across requests. Which observability stack ...
Which built-in evaluator measures whether the generated answer addresses the user's question?
A safety team wants to systematically test a deployed model against adversarial prompts that probe for jailbreaks and harmful content. Which Microsoft...
A production support team wants quality metrics computed on real production traffic, not just on offline test sets, so regressions are detected after ...
An MLOps engineer must export prompt flow telemetry (traces, request/response, tool spans) to a centralized log store for cross-team analysis. Which c...
An evaluator job reports inconsistent groundedness scores across reruns of the same dataset. Which configuration most often causes this and should be ...
Which built-in Azure AI evaluation metric judges how natural and grammatically correct generated text reads, independent of factual support?
A team wants their CI pipeline to fail automatically when a new prompt change causes the average coherence score on a held-out dataset to drop below 3...
A platform team wants to count the total number of tokens consumed by a deployed Azure OpenAI model over the past week to validate budget assumptions....
Which Azure AI Foundry capability runs a batch of test cases through a flow and computes quality metrics across the entire dataset in a single automat...
Which built-in Azure AI evaluation metric measures how closely a generated answer matches a ground-truth reference answer?
A production RAG application's groundedness scores have declined over the past two weeks despite the prompt and model remaining unchanged. The team su...
A safety team wants to proactively test an agent for jailbreak vulnerabilities by generating adversarial test prompts at scale without manually writin...
Sign in to see all 30 questions
Create a free account to browse all questions — completely free during our launch phase.