Perform evaluation, error analysis, and tuning Questions
Practice questions for Perform evaluation, error analysis, and tuning topic in GitHub Agentic AI Developer. 36 questions covering this domain.
In the VS Code Chat Debug view, which section is the best place to verify that custom instructions or an agent description were actually included in t...
Evaluation shows an agent repeatedly picks the wrong external tool for a task. Which tuning action best matches GH-600?
A team wants objective evaluation signals for an agent that updates dependencies. Which signal source best matches GH-600?
Which root-cause category is explicitly listed in the GH-600 study guide for agent failures?
During a long agent session, the model's answer appears cut off. According to VS Code troubleshooting guidance, what is the best next step?
Which evaluation setup best aligns with GH-600 guidance for agent tasks?
An enterprise administrator wants to measure pull request outcomes for work created by Copilot cloud agent. Which metric is explicitly available in Co...
An agent produced the wrong change, and you need to identify why. Which evidence sources does GH-600 say to inspect?
In the VS Code Chat Debug view, which section lets you verify the inputs and outputs of tools that were invoked during a request?
You want to use the /troubleshoot command to ask why a session was slow and which customizations loaded. What must be enabled first?
A path-specific .instructions.md file seems to have no effect. According to VS Code troubleshooting guidance, what should you check first?
Which debug view is designed to visualize interactions between agents and subagents during a complex run?
Evaluation shows an agent keeps following obsolete intermediate notes and misses the current task direction late in long runs. Which tuning action bes...
What is the primary purpose of the Logs view in the Agent Debug panel?
Which Chat Debug section lets you confirm the exact text that was sent as your request, including resolved # mentions?
The AI answers generically and seems unaware of repository files. Which Chat Debug section should you inspect first?
An expected MCP tool never runs during a request. Which check best distinguishes 'the tool was unavailable' from 'the model chose not to use it'?
Which Agent Debug view shows aggregate statistics such as total tool calls, token usage, error count, and overall duration?
A task says, 'Update logging only under src/api and do not change runtime behavior.' Which evaluation setup best matches that development intent?
A failure might involve reasoning mistakes, tool behavior, and workflow state. Which evidence set provides the strongest basis for root-cause analysis...
Sign in to see all 36 questions
Create a free account to browse all questions — completely free during our launch phase.