Skip to content
5 min read·Lesson 6 of 10

Structured Logging

Why JSON logs beat text every time. Field conventions, correlation IDs, secrets hygiene, and how to write logs that future you will thank you for.

The single highest-leverage change you can make to your logs is structuring them. Text logs require parsing, regex, fragility. JSON logs are queryable on day one.

Bad vs Good

Bad:

2024-04-12 14:32:01 INFO User 42 logged in from 10.1.1.4 in 230ms

Good:

{
  "ts": "2024-04-12T14:32:01.123Z",
  "level": "info",
  "service": "auth-api",
  "env": "prod",
  "msg": "user_login",
  "user_id": 42,
  "ip": "10.1.1.4",
  "duration_ms": 230,
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "request_id": "req_abc123"
}

Now you can answer "all logins for user 42 in the last hour" with a single filter — no regex.

The Canonical Field Set

Adopt a small core set across every service. Suggested:

FieldPurpose
tsRFC 3339 / ISO 8601 timestamp with timezone
leveldebug, info, warn, error, fatal
serviceService name (auth-api, payments)
envprod / staging / dev
versionBuild SHA or semver
msgShort, low-cardinality event name
trace_id / span_idOpenTelemetry context for pivoting to traces
request_idCorrelation across services for one user request
user_id / tenant_idBusiness context

Add domain-specific fields per log line — but the core ten should always be present.

One Log Per Request: The Canonical Log Line

Stripe popularised this pattern: emit one richly structured log line per request that summarises everything. Instead of 30 small logs, you have one wide event.

{
  "ts": "2024-04-12T14:32:01Z",
  "level": "info",
  "service": "checkout",
  "msg": "request_complete",
  "method": "POST",
  "path": "/checkout",
  "status": 200,
  "duration_ms": 412,
  "db_calls": 3,
  "db_total_ms": 78,
  "downstream_ms": { "auth": 60, "payment": 145, "inventory": 30 },
  "user_id": 4242,
  "tenant_id": "acme",
  "trace_id": "4bf92...",
  "request_id": "req_abc123",
  "feature_flags": ["new_checkout=true", "mfa=on"],
  "build": "9d4f1ab"
}

Search "all checkout requests with status=200 and duration_ms>1000 last hour" with one filter. Faster, cheaper, more useful than 30 scattered DEBUG lines.

Correlation IDs

A request ID generated at the edge (load balancer, API gateway) and propagated via headers (X-Request-Id, traceparent) so every downstream log line carries it. With OpenTelemetry, the trace_id serves the same purpose for free.

The pivot from "user reported error at 14:32" to "every log line for that request across 8 services" is what turns logs from forensic guesswork into a powerful tool.

Log Levels — The Four You Need

DEBUGVerbose for local development. Off in prod by default.
INFOSignificant events you would want during an incident.
WARNSomething unexpected, recoverable, worth investigating later.
ERRORSomething failed, alerting may be appropriate.

Log levels are a contract with future you. WARN means "something to look at"; if you cry wolf at WARN, no one will look.

Secrets and PII

Logs are the easiest place to leak. Common offenders:

  • Authorization headers, API keys, JWTs
  • Full credit card numbers, CVVs (PCI violation)
  • Email addresses, phone numbers, government IDs (PII / GDPR / HIPAA)
  • Stack traces with environment variables
  • SQL queries containing parameters

Three defences:

  1. Field allow-lists in your logger — only known fields are emitted.
  2. Redaction at the shipper (Fluent Bit/Vector regex masks).
  3. Pre-prod tests that scan logs for credit-card and JWT patterns.

Logging in Code: Examples

Node (pino):

import pino from 'pino';
const log = pino({
  base: { service: 'checkout', env: process.env.NODE_ENV, version: process.env.GIT_SHA },
  redact: ['req.headers.authorization', 'password', '*.creditCard'],
});

log.info({ user_id: 42, duration_ms: 230, request_id: req.id }, 'user_login');

Python (structlog):

import structlog
log = structlog.get_logger()

log.info("user_login", user_id=42, duration_ms=230, request_id=req_id)

Go (zap or slog):

slog.Info("user_login",
    "user_id", 42,
    "duration_ms", 230,
    "request_id", reqID,
)

Every modern language has a structured logger. Use it.

Anti-Patterns

  • Concatenating into msg: "User " + id + " logged in" — defeats the whole point.
  • Putting JSON inside JSON — flatten before logging.
  • Stack traces split across many lines — emit as a single field.
  • Logging in tight loops — sample or aggregate.
  • Logging at ERROR for non-errors (404s, validation failures) — wakes people for nothing.

The Practical Bar

If you can answer these in your log tool today, you are doing well:

  • "Show me everything that happened for request_id X across all services."
  • "Show me all errors for tenant Y in the last hour."
  • "Show me requests slower than 1 second by route."
  • "Show me everything that happened during the deploy at 14:30 ± 5 minutes."

If any of those is hard, fix the structure first.

Key Takeaways

  • Structured (JSON) logs are queryable; text logs are forensic-only.
  • Always include level, timestamp, service, env, message, and a correlation ID.
  • Use canonical fields (trace_id, span_id, user_id, request_id) consistently across services.
  • Never log secrets, tokens, full payment details, or unredacted PII.
  • Log at INFO for things you would want to see during an incident, not for every variable.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →