Logs flow from many places (containers, hosts, lambdas, browsers) to one searchable home. The pipeline is conceptually the same regardless of vendor.
The Four Stages of a Log Pipeline
[ Producer ] → [ Shipper / Agent ] → [ Storage / Index ] → [ Query / UI ]
app, syslog, Fluent Bit, Vector, Elasticsearch, Loki, Kibana, Grafana,
journald, k8s Filebeat, FluentD, OpenSearch, S3, GCS CloudWatch console
Logstash, Promtail
Each stage has many implementations; mix and match.
ELK / Elastic Stack
The classic open-source log stack:
- Elasticsearch — full-text search index over logs
- Logstash / Beats — collection and transformation
- Kibana — UI for search, dashboards, and analytics
OpenSearch is AWS's open-source fork after Elastic's licence change. Functionally similar.
Strengths: extremely powerful queries, full-text search, mature dashboards, many integrations.
Trade-offs: indexing every field is expensive — Elasticsearch is RAM- and disk-hungry. Operating a cluster at scale (sharding, hot/warm/cold tiers, snapshots) is real work. Most teams now use a managed offering (Elastic Cloud, AWS OpenSearch Service).
Grafana Loki
Loki was designed as "Prometheus, but for logs." It indexes only a small set of labels (service, env, level) — the log content itself is stored as compressed chunks in object storage (S3, GCS, Azure Blob).
{service="api", env="prod"} |= "timeout" | json | duration_ms > 1000
The query language (LogQL) mirrors PromQL — same brackets, same filtering, same metric-style aggregations.
Strengths: very cheap to run, minimal indexing, integrates beautifully with Grafana and Prometheus, scales horizontally.
Trade-offs: queries that scan large volumes are slower than Elasticsearch full-text. Best when you can narrow by labels first, then grep.
Cloud-Native Log Services
| Service | Notes |
|---|---|
| AWS CloudWatch Logs | Default for Lambda, ECS, EKS. Log Insights query language. |
| Azure Monitor Logs (Log Analytics) | KQL (Kusto Query Language). Powerful, well-integrated. |
| Google Cloud Logging | Auto-collects from GKE, GCE, Cloud Run. Strong filter UI. |
Pros: zero ops, integrated IAM, instant ingestion of platform-emitted logs, link from billing/audit naturally.
Cons: vendor lock-in, per-GB pricing can hurt at high volume, query languages are platform-specific.
Tip: ship a copy to S3/GCS using built-in export for long-term cheap retention; keep the hot index small.
Commercial APMs
Datadog, Splunk, Sumo Logic, New Relic Logs all offer hosted log management bundled with metrics and traces. Strong UX, expensive at scale, but you pay one bill instead of integrating four tools.
Common Shippers
| Shipper | Strength |
|---|---|
| Fluent Bit | Tiny, fast, written in C — default in Kubernetes |
| Vector | Rust, modern, supports complex transforms |
| Filebeat | Elastic's official shipper for files / journald |
| Promtail | Loki's shipper, label-aware |
| FluentD | Older but still common, large plugin ecosystem |
| Logstash | Heavy but powerful transformation language |
In Kubernetes, Fluent Bit DaemonSets are the most common pattern: one pod per node tails container logs from /var/log/containers/*.log and forwards them.
Volume Control
Logs are often the biggest line item on observability bills. Three controls:
- Log levels — INFO in prod, DEBUG only when needed.
- Sampling — keep all errors; sample 10% of successful request logs.
- Filters at the shipper — drop noisy paths (health checks) before ingest.
# fluent-bit example: drop health checks and sample successful requests
[FILTER]
Name grep
Match app.*
Exclude $path /healthz
[FILTER]
Name throttle
Match app.success.*
Rate 100
Window 1
Interval 1s
Retention Strategy
| Tier | Storage | Retention |
|---|---|---|
| Hot | Indexed (Elastic, Loki, CloudWatch) | 7–30 days |
| Warm | Indexed but cheaper hardware | 30–90 days |
| Cold | Object storage (S3, GCS), unindexed | 1–7 years for compliance |
Most queries hit the last 24 hours. Sizing the hot tier for that, with cheap cold storage behind it, dramatically reduces cost.
The Pragmatic Choice
- Already on Grafana for metrics? Add Loki — zero new UI, cheap, simple.
- Need full-text search and complex analytics? Use Elasticsearch / OpenSearch.
- Single-cloud and want zero ops? Use the native service.
- Already paying for Datadog? Just use Datadog Logs.
The next lesson covers structured logging — the application-side practice that makes any of these tools 10× more useful.