Prometheus is the de facto open-source metrics system. It is simple, reliable, and the foundation of Grafana stack and Kubernetes monitoring.
Architecture
[ targets: app, node_exporter, kube-state-metrics ]
│ exposes /metrics
▼
[ Prometheus server ] ← scrapes every 15–60s
│
├──→ [ TSDB on disk (or remote_write) ]
│
├──→ [ Grafana ] ← dashboards
│
└──→ [ Alertmanager ] ← routes alerts to Slack/PagerDuty/email
Key idea: Prometheus pulls. Each target exposes an HTTP endpoint at /metrics. Prometheus discovers targets (via static config, Kubernetes API, Consul, EC2, etc.) and scrapes them on a fixed interval.
Pull model benefits:
- You always know whether a target is up — scrape failure = down.
- No need for a fragile push gateway in the data path.
- Easy to test:
curl http://app:8080/metrics.
For short-lived jobs (cron, batch), use the Pushgateway — but only as the exception.
Metric Types
Counter
Monotonically increasing integer. Reset to 0 on process restart. Used for events: requests, errors, bytes processed.
http_requests_total{method="GET",status="200"} 12345
Always use rate() on counters — the raw value is meaningless.
Gauge
A value that goes up and down: queue depth, connections in use, temperature, memory bytes.
queue_depth 42
memory_bytes 5_368_709_120
Histogram
Counts observations into pre-defined buckets. The right choice for latency.
http_request_duration_seconds_bucket{le="0.1"} 8203
http_request_duration_seconds_bucket{le="0.25"} 9410
http_request_duration_seconds_bucket{le="0.5"} 9512
http_request_duration_seconds_bucket{le="+Inf"} 9530
http_request_duration_seconds_sum 451.2
http_request_duration_seconds_count 9530
Lets you compute quantiles like p95 across many instances using histogram_quantile.
Summary
Like a histogram, but quantiles are pre-computed per instance. Cheaper to query, but you cannot aggregate quantiles across instances. Prefer histograms in modern setups.
PromQL in Three Examples
Requests per second over the last 5 minutes:
rate(http_requests_total[5m])
Error rate as a percentage:
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m])) * 100
p95 latency by service:
histogram_quantile(
0.95,
sum by (service, le) (
rate(http_request_duration_seconds_bucket[5m])
)
)
Most operations are some combination of rate(), sum by(), and histogram_quantile(). Master those and you can answer 80% of questions.
Exporters
For things that do not speak Prometheus natively, an exporter translates. Common ones:
| Exporter | Exposes |
|---|---|
| node_exporter | Linux host metrics (CPU, mem, disk, net) |
| cAdvisor | Container resource usage |
| kube-state-metrics | Kubernetes object state (deployments, pods, nodes) |
| blackbox_exporter | Probes (HTTP, ICMP, DNS) for endpoint checks |
| postgres_exporter / mysqld_exporter / redis_exporter | Database internals |
| cloudwatch_exporter / Yet Another CloudWatch Exporter | AWS metrics in Prom format |
Instrumenting Your App
Most languages have a Prometheus client library. Example in Node:
import client from 'prom-client';
const httpDuration = new client.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests',
labelNames: ['method', 'route', 'status'],
buckets: [0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
});
app.use((req, res, next) => {
const end = httpDuration.startTimer();
res.on('finish', () => {
end({ method: req.method, route: req.route?.path ?? 'unknown', status: res.statusCode });
});
next();
});
app.get('/metrics', async (_req, res) => {
res.set('Content-Type', client.register.contentType);
res.end(await client.register.metrics());
});
Recording Rules and Alerting Rules
Recording rules pre-compute expensive queries on a schedule:
groups:
- name: api
interval: 30s
rules:
- record: api:request_rate5m
expr: sum by (service) (rate(http_requests_total[5m]))
Alerting rules fire when expressions are true for a duration:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m])) > 0.02
for: 10m
labels: { severity: page }
annotations:
summary: "Error rate above 2%"
runbook: "https://runbooks/api/high-errors"
Scaling Prometheus
One Prometheus server handles a lot — millions of series — but eventually you need:
- Federation — child Prom scrapes services, parent Prom scrapes a subset of aggregated series from children.
- Remote_write — Prom forwards to a long-term store: Mimir, Thanos, Cortex, VictoriaMetrics, or a managed service (Grafana Cloud, Amazon Managed Prometheus).
- Sharding — multiple Prom instances each scraping a slice of targets.
Cardinality, Again
The single most common production problem with Prometheus is exploding cardinality. Avoid putting these in labels: user_id, email, request_id, full URL, IP address, error message string. Use a small set of bounded values: service, route template (/users/:id), region, status code.
If you have to ask "is this label safe?" — assume not, and put it in a log instead.