Synchronous request/response is simple — until one downstream call slows everything down, fails, or has different scaling needs. Queues and event streams are the glue that lets services move at different speeds and survive each other's failures.
Why Queues
- Decoupling — producers don't know consumers; consumers can change without producer changes.
- Buffering — absorb spikes; consumers process at their own rate.
- Async processing — return 202 fast; do the slow work later.
- Retries — failed work goes back on the queue or to a dead-letter queue.
- Fan-out — one event, many consumers.
Two Models
Work queue (point-to-point)
Each message is processed by exactly one consumer in a group. Used for tasks: send email, resize image, charge card.
[ Producer ] → [ Queue ] → [ Worker ]
→ [ Worker ]
→ [ Worker ]
each message goes to exactly one worker
Tools: SQS, RabbitMQ, BullMQ (Redis), Celery, Sidekiq.
Pub/sub (topic / event stream)
Each message is broadcast to all subscribers. Used for events: OrderPlaced, UserSignedUp, downstream services react.
[ Producer ] → [ Topic ] → [ Subscriber A ] (e.g. email service)
→ [ Subscriber B ] (e.g. analytics)
→ [ Subscriber C ] (e.g. recommendations)
Tools: Kafka, Pub/Sub, SNS+SQS, EventBridge, NATS, RabbitMQ topic exchanges.
Delivery Guarantees
| Guarantee | Behaviour | When |
|---|---|---|
| At-most-once | May lose messages | Telemetry where loss is acceptable |
| At-least-once | May deliver duplicates | The practical default for most queues |
| Exactly-once | Each message delivered once | Requires transactional broker + idempotent consumer; expensive |
The pragmatic posture: assume at least once, make consumers idempotent. A duplicate charge_card is a refund waiting to happen — guard it with a deduplication key.
Idempotent Consumers
def handle(message):
msg_id = message["id"]
if already_processed(msg_id):
return ack()
do_work(message)
mark_processed(msg_id)
ack()
Track processed message IDs in a database, Redis, or a "processed" table with a TTL. Or design operations so re-applying is naturally safe (UPSERT, MERGE, conditional update).
Ordering
- Most queues guarantee ordering only within a partition / shard.
- Order events for the same user/order on the same partition (key by user_id).
- Global ordering across all events is rarely possible at scale, and usually unnecessary.
Dead-Letter Queues (DLQs)
After N retries, send the failing message to a dedicated DLQ instead of looping forever. Alert on DLQ depth; investigate, fix, replay.
Without a DLQ, one poison message can clog the queue and block all other work behind it.
Backpressure
If consumers cannot keep up, queues grow. Eventually disk fills, or memory runs out, or messages exceed retention. Mitigations:
- Auto-scale consumers based on queue depth.
- Reject new producer requests when the queue exceeds a threshold (load shedding).
- Prioritise critical messages (separate queue or priority queue).
- Monitor lag (Kafka consumer lag, SQS visibility timeout violations) as a first-class SLO.
Event-Driven Architecture
Beyond simple queues, event-driven design treats events as the primary integration medium. Services emit events about facts in their domain; other services subscribe to react.
order-service ──emits──→ OrderPlaced ──→ payment-service
──→ inventory-service
──→ email-service
──→ analytics
Benefits: services don't know each other; new consumers added without changing producers; replay history to bootstrap a new service.
Costs: harder to reason about end-to-end flow; eventual consistency by default; observability requires correlation IDs across many hops.
Event Sourcing
Store the sequence of events as the system of record; current state is derived by replaying events.
- + Full audit trail; can rebuild any view; time travel.
- + Natural fit with CQRS (separate write model and read models).
- − Schema evolution and event versioning are hard.
- − Steeper learning curve; debugging takes practice.
Don't event-source everything. Use it where audit / regulatory / multi-view requirements justify the complexity.
Sagas: Long-Lived Distributed Transactions
You can't use a single ACID transaction across services. Sagas decompose a business transaction into a sequence of local transactions, each with a compensating action if a later step fails.
Place order:
1. ReserveInventory ← compensate: ReleaseInventory
2. ChargePayment ← compensate: RefundPayment
3. CreateShipment ← compensate: CancelShipment
If step 3 fails: run compensations in reverse.
Orchestration vs choreography
- Orchestration — a central saga coordinator drives the flow, calling each service. Easier to see; the coordinator becomes a hot spot.
- Choreography — each service reacts to events and emits the next event; no coordinator. More autonomous; harder to follow at a glance.
Most teams start with orchestration (clearer); some scale to choreography for autonomy.
Choosing a Broker
| Need | Pick |
|---|---|
| Simple work queue, AWS | SQS |
| Pub/sub fan-out, AWS | SNS → SQS / EventBridge |
| High-throughput log, replay, streaming | Kafka / MSK / Confluent / Redpanda |
| Managed pub/sub, GCP | Pub/Sub |
| Managed pub/sub, Azure | Service Bus / Event Hubs / Event Grid |
| Lightweight, self-host | RabbitMQ, NATS, Redis Streams |
Cert Mapping
| Cert | Messaging scope |
|---|---|
| AWS SAA / DEA | SQS, SNS, EventBridge, MSK, Kinesis |
| Azure AZ-204 / AZ-305 | Service Bus, Event Grid, Event Hubs |
| GCP PCA / PDE | Pub/Sub, Eventarc, Workflows |
Heuristic
- Start synchronous. Add async only when latency, throughput, or coupling demands it.
- Default to at-least-once + idempotent consumers.
- Always have a DLQ and an alert on its depth.
- Track end-to-end correlation IDs from the first request through every event.
- Keep events small and immutable; carry IDs, not entire objects, when possible.
The next lesson covers the resilience layer that protects all of this: CDNs, rate limiting, and circuit breakers.