Troubleshooting Methodology — IT Fundamentals (CompTIA A+ Prep) | CertQnA

The single biggest predictor of a technician's effectiveness isn't what they know — it's how they approach problems they don't know yet. This lesson teaches you the methodology that makes you reliable at the unknown.

The Six CompTIA Troubleshooting Steps

This model appears across all CompTIA exams. Memorise the order:

Identify the problem (gather information, identify symptoms, ask what changed)
Establish a theory of probable cause (question the obvious; consider multiple)
Test the theory to determine cause; if not confirmed, establish a new one or escalate
Establish a plan of action to resolve and identify potential effects
Implement the solution or escalate
Verify full system functionality and, if applicable, implement preventive measures

And: document findings, actions, and outcomes at every step.

Step 1: Identify the Problem

The wrong question is "what's wrong?" — users describe symptoms, not problems. The right questions:

What exactly happens? Walk me through it step by step.
What did you expect to happen?
When did this start?
What changed recently? Updates, new hardware, software install, location change, network change
Has it ever worked? (If never, it's a configuration problem, not a failure)
Does it happen for anyone else? (Scope = user, machine, or environment)
Can you reproduce it on demand? If yes, you're 80% there.

Most intermittent or "weird" problems become obvious once you know what changed.

Step 2: Establish a Theory

Two principles guide good theory:

Question the obvious. Power cable, network cable, was it plugged in, did the user just close the lid. The reason senior techs ask "is it on?" is that 30% of "broken" things aren't.
Occam's razor: The simpler explanation is usually correct. A single recent driver update is more likely to be the cause than a CPU dying.

Brainstorm 2-3 candidate causes; rank by likelihood and ease of testing.

Step 3: Test the Theory

Each test should be designed to confirm or rule out one hypothesis. Cheap tests first:

Reproduce the issue in a clean state (restart, new browser tab, different user)
Swap one variable: cable, port, user account, network connection
Roll back the most recent change (uninstall the recent update; revert a setting)
Check logs for the exact time the user reports the problem started

If your theory tests negative, go back to Step 2 — don't keep escalating effort on the same theory.

Step 4: Plan

Before implementing, think:

Will this fix affect anyone other than the user?
Is there risk of data loss?
Do I need a backup before I proceed?
Do I need a change ticket / approval?
How will I roll back if the fix makes things worse?
What's the impact if I'm wrong? (Reformat = catastrophic; restart Print Spooler = trivial)

For trivial fixes the plan is one line. For high-impact fixes, write it down.

Step 5: Implement

Follow the plan. Note exactly what you did — including commands and timestamps — so you can retrace if something breaks. If a step doesn't go as expected, stop and reassess rather than improvising deeper.

Step 6: Verify

The fix is not done when the error message goes away — it's done when the user can perform their original task end-to-end. Verify:

The originally failing action now succeeds
Related functionality wasn't broken as a side-effect
The user agrees the issue is resolved

Then: consider preventive action. Was this avoidable? Should monitoring catch it next time? Is there a related update or documentation gap?

Document

Every ticket should leave a record:

Symptoms reported
Diagnostic steps performed
Root cause
Solution applied
Verification
Preventive recommendations

This is how "we've seen this before" knowledge accumulates in a team. A good ticket history is the cheapest training tool a help desk has.

Practical Patterns

Divide and conquer

When you don't know where in a chain a problem lies, test the midpoint. If a webpage doesn't load: can you ping the gateway (LAN ok)? Can you ping 8.8.8.8 (internet ok)? Can you reach example.com (DNS ok)? Each test halves the problem space.

Swap to isolate

"Is it the keyboard or the laptop?" → plug the keyboard into another laptop. "Is it the cable or the monitor?" → try a known-good cable.

Read the actual error message

Users say "nothing works" when the error message says exactly what's wrong. Pull up the actual dialog or log line; search the exact text.

Check the log

Almost every OS, app, and device produces logs. Event Viewer on Windows. journalctl / /var/log on Linux. Console.app on macOS. Browser DevTools console. App-specific logs. The logs usually say what's wrong before any guessing is needed.

Reboot is not cheating

A restart fixes a remarkable proportion of issues by clearing leaked state. It's not always the right answer (you want to understand the cause for recurring problems) — but as a first action when time is short, it's effective.

Reproduce on a different account / device

If the problem only happens for one user, the cause is in their profile or permissions. If it happens for everyone, it's systemic.

Bisect when there are many changes

If you can't tell which of N recent changes broke things, revert half and test, then half of the remaining, until you isolate the culprit. (This is how git bisect works — same logic for everything else.)

When to Escalate

Escalate when:

You've reached the limits of your knowledge
The fix requires access or rights you don't have
The blast radius of the proposed fix exceeds your authority
SLA is at risk
Multiple users are affected and you can't quickly identify the cause

When you escalate, give the next tier everything: symptoms, what you've tried, what you ruled out, the logs you've collected. Don't make them start over.

Don't Do These

Don't change multiple things at once — you won't know which fixed it
Don't ignore the user's account of what they did — even when wrong, it points to where they got confused, which is part of the fix
Don't apply a fix you found online without understanding it — especially registry edits, dism / sfc, BIOS resets
Don't tell the user "it's a known issue" without a ticket reference and ETA
Don't close the ticket until verified — re-opens hurt CSAT and team metrics

The 80/20 of IT Troubleshooting

Did you restart it? (Power-cycle the device, restart the service)
Is it plugged in / connected / charged?
What changed?
Does it work for someone else?
What does the log / error message actually say?

Five questions, ~80% of problems solved. The remaining 20% are why you keep learning.