Security Operations and Incident Response — Cybersecurity Fundamentals | CertQnA

Security Operations is where prevention meets reality: it's the team and tooling that detects, investigates, and responds to threats day in, day out. This lesson walks through how a Security Operations Centre (SOC) is structured and how an incident actually unfolds.

The SOC

A SOC is the team responsible for continuously monitoring an organisation's security and responding to incidents. Typical tiered structure:

Tier	Responsibilities
T1 Analyst	First-line triage of alerts; close false positives; escalate real ones
T2 Analyst	Deeper investigation; correlate across systems; contain incidents
T3 / Threat Hunter / Detection Engineer	Proactive hunting, building new detections, reverse-engineering malware
SOC Manager	Runs the team, reports to CISO, owns metrics and process

SOCs run 24×7 — internally (follow-the-sun across regions) or via a Managed Security Service Provider (MSSP) / Managed Detection & Response (MDR) vendor.

Incident Response Lifecycle (NIST SP 800-61)

Preparation. Build runbooks, train responders, maintain contact lists, run tabletop exercises, deploy detection tooling.
Detection & Analysis. An alert fires (or someone reports something). Validate it. Classify severity. Open an incident ticket. Begin chain-of-custody.
Containment. Stop the bleeding. Isolate the host from the network, disable compromised accounts, block IOCs. Short-term containment buys time; long-term containment prevents reinfection.
Eradication. Remove the attacker — wipe and rebuild systems, rotate credentials, close the entry point.
Recovery. Bring systems back to production, monitor closely for recurrence.
Post-incident Activity. Blameless retrospective: what happened, what we did well, what we'd change. Update detections and runbooks.

The discipline of running this loop matters more than any single tool. A team that never practiced will fumble a real incident regardless of how much they spent on the SIEM.

The Tooling Stack

SIEM — Security Information and Event Management

Aggregates logs from everywhere (endpoints, servers, firewalls, cloud, apps), correlates them, and runs detection rules. The eyes of the SOC. Examples: Splunk, Microsoft Sentinel, Elastic Security, Sumo Logic, Chronicle, Panther.

EDR — Endpoint Detection and Response

Agents on every endpoint that record process, file, network, and registry activity, run behavioural detections, and let analysts respond remotely (kill process, isolate host). Examples: CrowdStrike Falcon, Microsoft Defender for Endpoint, SentinelOne.

XDR — Extended Detection and Response

EDR plus correlation across email, identity, cloud, and network. Tries to give one unified view rather than a stack of disconnected consoles.

SOAR — Security Orchestration, Automation, and Response

Runs playbooks. When a phishing alert fires: automatically pull the email headers, check the attachment in a sandbox, search for who else received it, and pre-fill a containment ticket. Examples: Palo Alto XSOAR, Splunk SOAR, Tines.

Threat Intelligence Platforms

Ingest feeds of IOCs (IPs, domains, hashes) and TTPs from commercial and open sources (MISP, AlienVault OTX) and feed them into detections.

Other tooling

UEBA — user/entity behavioural analytics; spots impossible travel, unusual data access
NDR — network detection and response; analyses traffic flow and metadata
Deception — honeypots, canary tokens that scream when touched

Detection Engineering

Modern SOCs treat detections like code. Practices:

Detections live in a Git repo (Sigma rules, Splunk SPL, KQL queries) with code review
Each detection links to the MITRE ATT&CK technique it covers
Tests run against synthetic and historical data before deployment
False-positive rates are tracked; noisy detections are tuned or retired

Threat Hunting

Don't wait for alerts. Threat hunting starts from a hypothesis:

"If an attacker compromised our CI/CD service account, they would clone unusual repositories and download large amounts of data outside business hours."

The hunter then queries telemetry to find evidence — or absence — of that activity. Findings either become new detections, or improve the team's mental model of what normal looks like.

Forensics and Evidence

If an incident may go to legal action, evidence handling matters from minute one:

Preserve before you investigate — image disks, snapshot memory; don't reboot
Chain of custody — log who touched what artefact, when
Hash everything so you can prove evidence wasn't altered
Use write blockers for physical media

Tools: Volatility (memory), Autopsy (disk), Velociraptor (live response), KAPE (artefact collection).

Key Metrics

MTTD — Mean Time To Detect: from compromise to first alert
MTTR — Mean Time To Respond / Resolve
Dwell time — total time the attacker was inside (industry medians used to be 200+ days; modern XDR has pushed this much lower)
False positive rate per detection
Coverage — % of MITRE ATT&CK techniques you can detect

Tabletops and Exercises

Run regular exercises so muscle memory exists when a real incident hits:

Tabletop — discussion-only, walk through a scenario in a meeting room
Purple team — red attackers and blue defenders work together to test detections
Full-scale simulation — replay a realistic attack against a non-prod environment with real tooling

The lessons from these exercises feed back into detections, runbooks, and architecture — closing the loop with the rest of the IR lifecycle.