Skip to content
6 min read·Lesson 3 of 10

Prometheus Fundamentals

Prometheus architecture, the pull model, metric types, PromQL basics, exporters, and how to design metrics that scale.

Prometheus is the de facto open-source metrics system. It is simple, reliable, and the foundation of Grafana stack and Kubernetes monitoring.

Architecture

[ targets: app, node_exporter, kube-state-metrics ]
            │  exposes /metrics
            ▼
       [ Prometheus server ]   ← scrapes every 15–60s
            │
            ├──→ [ TSDB on disk (or remote_write) ]
            │
            ├──→ [ Grafana ]   ← dashboards
            │
            └──→ [ Alertmanager ]   ← routes alerts to Slack/PagerDuty/email

Key idea: Prometheus pulls. Each target exposes an HTTP endpoint at /metrics. Prometheus discovers targets (via static config, Kubernetes API, Consul, EC2, etc.) and scrapes them on a fixed interval.

Pull model benefits:

  • You always know whether a target is up — scrape failure = down.
  • No need for a fragile push gateway in the data path.
  • Easy to test: curl http://app:8080/metrics.

For short-lived jobs (cron, batch), use the Pushgateway — but only as the exception.

Metric Types

Counter

Monotonically increasing integer. Reset to 0 on process restart. Used for events: requests, errors, bytes processed.

http_requests_total{method="GET",status="200"} 12345

Always use rate() on counters — the raw value is meaningless.

Gauge

A value that goes up and down: queue depth, connections in use, temperature, memory bytes.

queue_depth 42
memory_bytes 5_368_709_120

Histogram

Counts observations into pre-defined buckets. The right choice for latency.

http_request_duration_seconds_bucket{le="0.1"} 8203
http_request_duration_seconds_bucket{le="0.25"} 9410
http_request_duration_seconds_bucket{le="0.5"} 9512
http_request_duration_seconds_bucket{le="+Inf"} 9530
http_request_duration_seconds_sum 451.2
http_request_duration_seconds_count 9530

Lets you compute quantiles like p95 across many instances using histogram_quantile.

Summary

Like a histogram, but quantiles are pre-computed per instance. Cheaper to query, but you cannot aggregate quantiles across instances. Prefer histograms in modern setups.

PromQL in Three Examples

Requests per second over the last 5 minutes:

rate(http_requests_total[5m])

Error rate as a percentage:

sum(rate(http_requests_total{status=~"5.."}[5m]))
  /
sum(rate(http_requests_total[5m])) * 100

p95 latency by service:

histogram_quantile(
  0.95,
  sum by (service, le) (
    rate(http_request_duration_seconds_bucket[5m])
  )
)

Most operations are some combination of rate(), sum by(), and histogram_quantile(). Master those and you can answer 80% of questions.

Exporters

For things that do not speak Prometheus natively, an exporter translates. Common ones:

ExporterExposes
node_exporterLinux host metrics (CPU, mem, disk, net)
cAdvisorContainer resource usage
kube-state-metricsKubernetes object state (deployments, pods, nodes)
blackbox_exporterProbes (HTTP, ICMP, DNS) for endpoint checks
postgres_exporter / mysqld_exporter / redis_exporterDatabase internals
cloudwatch_exporter / Yet Another CloudWatch ExporterAWS metrics in Prom format

Instrumenting Your App

Most languages have a Prometheus client library. Example in Node:

import client from 'prom-client';
const httpDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests',
  labelNames: ['method', 'route', 'status'],
  buckets: [0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
});

app.use((req, res, next) => {
  const end = httpDuration.startTimer();
  res.on('finish', () => {
    end({ method: req.method, route: req.route?.path ?? 'unknown', status: res.statusCode });
  });
  next();
});

app.get('/metrics', async (_req, res) => {
  res.set('Content-Type', client.register.contentType);
  res.end(await client.register.metrics());
});

Recording Rules and Alerting Rules

Recording rules pre-compute expensive queries on a schedule:

groups:
  - name: api
    interval: 30s
    rules:
      - record: api:request_rate5m
        expr: sum by (service) (rate(http_requests_total[5m]))

Alerting rules fire when expressions are true for a duration:

- alert: HighErrorRate
  expr: |
    sum(rate(http_requests_total{status=~"5.."}[5m]))
      /
    sum(rate(http_requests_total[5m])) > 0.02
  for: 10m
  labels: { severity: page }
  annotations:
    summary: "Error rate above 2%"
    runbook: "https://runbooks/api/high-errors"

Scaling Prometheus

One Prometheus server handles a lot — millions of series — but eventually you need:

  • Federation — child Prom scrapes services, parent Prom scrapes a subset of aggregated series from children.
  • Remote_write — Prom forwards to a long-term store: Mimir, Thanos, Cortex, VictoriaMetrics, or a managed service (Grafana Cloud, Amazon Managed Prometheus).
  • Sharding — multiple Prom instances each scraping a slice of targets.

Cardinality, Again

The single most common production problem with Prometheus is exploding cardinality. Avoid putting these in labels: user_id, email, request_id, full URL, IP address, error message string. Use a small set of bounded values: service, route template (/users/:id), region, status code.

If you have to ask "is this label safe?" — assume not, and put it in a log instead.

Key Takeaways

  • Prometheus pulls metrics over HTTP from /metrics endpoints — no agent required.
  • Four core metric types: counter, gauge, histogram, summary.
  • PromQL is the query language; rate() over counters is the workhorse.
  • Exporters expose metrics for systems that do not speak Prometheus natively.
  • Cardinality is the main scaling constraint — design labels carefully.

Test your knowledge

Try exam-style practice questions to reinforce what you've learned.

Practice Questions →