Skip to content
6 min read·Lesson 7 of 10

Security Operations and Incident Response

How a SOC works: monitoring, detection, the NIST incident response lifecycle, and the SIEM/SOAR/XDR/EDR tooling stack.

Security Operations is where prevention meets reality: it's the team and tooling that detects, investigates, and responds to threats day in, day out. This lesson walks through how a Security Operations Centre (SOC) is structured and how an incident actually unfolds.

The SOC

A SOC is the team responsible for continuously monitoring an organisation's security and responding to incidents. Typical tiered structure:

TierResponsibilities
T1 AnalystFirst-line triage of alerts; close false positives; escalate real ones
T2 AnalystDeeper investigation; correlate across systems; contain incidents
T3 / Threat Hunter / Detection EngineerProactive hunting, building new detections, reverse-engineering malware
SOC ManagerRuns the team, reports to CISO, owns metrics and process

SOCs run 24×7 — internally (follow-the-sun across regions) or via a Managed Security Service Provider (MSSP) / Managed Detection & Response (MDR) vendor.

Incident Response Lifecycle (NIST SP 800-61)

  1. Preparation. Build runbooks, train responders, maintain contact lists, run tabletop exercises, deploy detection tooling.
  2. Detection & Analysis. An alert fires (or someone reports something). Validate it. Classify severity. Open an incident ticket. Begin chain-of-custody.
  3. Containment. Stop the bleeding. Isolate the host from the network, disable compromised accounts, block IOCs. Short-term containment buys time; long-term containment prevents reinfection.
  4. Eradication. Remove the attacker — wipe and rebuild systems, rotate credentials, close the entry point.
  5. Recovery. Bring systems back to production, monitor closely for recurrence.
  6. Post-incident Activity. Blameless retrospective: what happened, what we did well, what we'd change. Update detections and runbooks.

The discipline of running this loop matters more than any single tool. A team that never practiced will fumble a real incident regardless of how much they spent on the SIEM.

The Tooling Stack

SIEM — Security Information and Event Management

Aggregates logs from everywhere (endpoints, servers, firewalls, cloud, apps), correlates them, and runs detection rules. The eyes of the SOC. Examples: Splunk, Microsoft Sentinel, Elastic Security, Sumo Logic, Chronicle, Panther.

EDR — Endpoint Detection and Response

Agents on every endpoint that record process, file, network, and registry activity, run behavioural detections, and let analysts respond remotely (kill process, isolate host). Examples: CrowdStrike Falcon, Microsoft Defender for Endpoint, SentinelOne.

XDR — Extended Detection and Response

EDR plus correlation across email, identity, cloud, and network. Tries to give one unified view rather than a stack of disconnected consoles.

SOAR — Security Orchestration, Automation, and Response

Runs playbooks. When a phishing alert fires: automatically pull the email headers, check the attachment in a sandbox, search for who else received it, and pre-fill a containment ticket. Examples: Palo Alto XSOAR, Splunk SOAR, Tines.

Threat Intelligence Platforms

Ingest feeds of IOCs (IPs, domains, hashes) and TTPs from commercial and open sources (MISP, AlienVault OTX) and feed them into detections.

Other tooling

  • UEBA — user/entity behavioural analytics; spots impossible travel, unusual data access
  • NDR — network detection and response; analyses traffic flow and metadata
  • Deception — honeypots, canary tokens that scream when touched

Detection Engineering

Modern SOCs treat detections like code. Practices:

  • Detections live in a Git repo (Sigma rules, Splunk SPL, KQL queries) with code review
  • Each detection links to the MITRE ATT&CK technique it covers
  • Tests run against synthetic and historical data before deployment
  • False-positive rates are tracked; noisy detections are tuned or retired

Threat Hunting

Don't wait for alerts. Threat hunting starts from a hypothesis:

"If an attacker compromised our CI/CD service account, they would clone unusual repositories and download large amounts of data outside business hours."

The hunter then queries telemetry to find evidence — or absence — of that activity. Findings either become new detections, or improve the team's mental model of what normal looks like.

Forensics and Evidence

If an incident may go to legal action, evidence handling matters from minute one:

  • Preserve before you investigate — image disks, snapshot memory; don't reboot
  • Chain of custody — log who touched what artefact, when
  • Hash everything so you can prove evidence wasn't altered
  • Use write blockers for physical media

Tools: Volatility (memory), Autopsy (disk), Velociraptor (live response), KAPE (artefact collection).

Key Metrics

  • MTTD — Mean Time To Detect: from compromise to first alert
  • MTTR — Mean Time To Respond / Resolve
  • Dwell time — total time the attacker was inside (industry medians used to be 200+ days; modern XDR has pushed this much lower)
  • False positive rate per detection
  • Coverage — % of MITRE ATT&CK techniques you can detect

Tabletops and Exercises

Run regular exercises so muscle memory exists when a real incident hits:

  • Tabletop — discussion-only, walk through a scenario in a meeting room
  • Purple team — red attackers and blue defenders work together to test detections
  • Full-scale simulation — replay a realistic attack against a non-prod environment with real tooling

The lessons from these exercises feed back into detections, runbooks, and architecture — closing the loop with the rest of the IR lifecycle.

Key Takeaways

  • A SOC operates in tiers: T1 triages alerts, T2 investigates, T3 hunts and engineers detections.
  • NIST 800-61 IR lifecycle: Prepare → Detect & Analyze → Contain, Eradicate, Recover → Post-incident.
  • SIEM aggregates logs and runs detections; SOAR automates response; EDR/XDR sees endpoints; together they form the SOC stack.
  • Threat hunting is hypothesis-driven: assume compromise and look for it, rather than waiting for alerts.
  • MTTD and MTTR are the headline metrics — measure them, drive them down.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →