Kubernetes Cost Management — Cloud FinOps and Cost Management | CertQnA

Kubernetes is FinOps on hard mode. The cloud bill says "$120,000 for EC2 m5.4xlarge." The cluster runs 200 microservices across 30 teams. Mapping from one to the other is exactly what Kubernetes cost management has to solve.

Why It Is Hard

Many workloads share each node — the cloud bill cannot see them.
Resources are requested in requests and limited by limits — neither is usage.
Cost depends on scheduling, bin-packing, autoscaling, and Spot interruption — non-obvious.
Shared services (ingress controllers, monitoring, service mesh) need to be allocated somehow.
Multi-tenant clusters mix prod/non-prod, team A and team B, dev and CI workloads.

You need an in-cluster tool that observes pods, requests, usage, and node prices, and produces an allocation.

OpenCost and Kubecost

OpenCost is the CNCF-incubating, vendor-neutral standard for Kubernetes cost monitoring. It pulls cloud pricing data (AWS / Azure / GCP / on-prem) and joins it with metrics from kube-state-metrics and the kube-scheduler.

Kubecost (the commercial parent) extends OpenCost with multi-cluster aggregation, retention, anomaly detection, optimisation recommendations, and richer UI. It has a free self-hosted tier that covers most use cases.

Both produce per-namespace, per-deployment, per-pod, per-label cost in dollars per hour, day, month.

Alternatives: CAST AI, Densify, native cloud features (GKE Cost Allocation, EKS in Cost Explorer with Split Cost Allocation Data). For most teams: OpenCost + a Prometheus + Grafana dashboard, or Kubecost free, is the right start.

Allocation Mechanics

The dominant model: cost is allocated by resource requests, not usage. If a pod requests 1 CPU and 2 GiB and the node it lands on costs $0.20/hour for 4 CPU and 8 GiB, the pod is allocated 25% of the node-hour, or $0.05/hour.

Why requests, not usage? Because requests reserve capacity from the scheduler — the cluster can't pack another workload into that slot whether you use it or not. Usage-based allocation under-charges hoggers and over-charges efficient teams.

Shared costs (control plane, idle headroom, system pods, ingress) are usually distributed across tenants proportionally.

The Tagging-Equivalent

For Kubernetes the "tag" is the Kubernetes label. Recommended labels via Kubernetes Recommended Labels:

app.kubernetes.io/name
app.kubernetes.io/part-of
app.kubernetes.io/managed-by
Plus your organisation's: team, cost-center, product, environment

Enforce via admission policy (Kyverno, OPA Gatekeeper): pods without required labels are rejected. Without enforcement you will see most pods unlabelled within months.

Rightsizing Kubernetes Workloads

Three knobs:

CPU and memory requests — the floor and the billing basis.
CPU and memory limits — the ceiling.
Replica count — managed by HPA.

Common patterns:

Vertical Pod Autoscaler (VPA) in recommendation mode — produces per-workload suggested requests. Apply them in IaC.
VPA in auto mode — applies recommendations automatically; some users find this aggressive; safer with workloads that can restart.
Horizontal Pod Autoscaler (HPA) on CPU and / or custom metrics — adds replicas under load.
KEDA — event-driven autoscaling (Kafka lag, queue depth, scheduled). Includes scale-to-zero for non-prod.
Goldilocks (from Fairwinds) — a VPA wrapper that visualises recommendations clearly.

The most consistent finding across deployments: pods request 3-10x more memory than they use. Requests are conservatively set on day one and never revisited. A single sweep of memory rightsizing often saves 20-40% of node cost.

Node-Level Optimisation

Once pods are right-sized, the cluster autoscaler can do its job — bin-packing pods onto the fewest nodes.

Karpenter (AWS, now multi-cloud) — replaced Cluster Autoscaler for many users. Provisions nodes based on actual pod requirements, consolidates aggressively, integrates with Spot.
Cluster Autoscaler — the standard predecessor; less efficient bin-packing but works everywhere.
CA on AKS / GKE — managed variants with provider-specific features.

Karpenter with Spot adoption commonly reduces node cost by 50-70% on stateless workloads. Pod Disruption Budgets, topology spread constraints, and graceful shutdown handlers are the production-grade adjuncts.

Spot in Kubernetes

Spot instances in Kubernetes are nearly painless if your workloads are stateless and handle SIGTERM cleanly:

Multi-AZ, multi-instance-type spot fleets (Karpenter does this natively)
PodDisruptionBudgets so a node drain respects minimum availability
Termination handlers that translate Spot-interrupt warnings into graceful shutdowns (AWS Node Termination Handler, Azure Spot scheduled events)
Spot for stateless services, on-demand for stateful (databases, brokers, primary leader pods)

Target 60-80% of cluster compute on Spot for stateless workloads.

Idle and Forgotten Workloads

The most common Kubernetes waste:

Namespaces from old projects nobody owns
HPA at minReplicas: 10 when traffic is now zero
Cron jobs that never get triggered but reserve capacity via affinity rules
Forgotten dev / staging clusters
Unused PersistentVolumes (the cloud still bills for the underlying disk)
OverlayFS images filling disk on every node, bloating IOPS bills

Kubecost / OpenCost surface idle workloads. Set policy: namespaces below $10/month of usage for 90 days flagged for review.

Shared Cluster Trade-offs

Should each team have its own cluster? Trade-offs:

Shared cluster	Cluster per team
Better utilisation (40-70% common)	Lower utilisation (10-30% common)
Allocation is non-trivial (needs Kubecost)	Allocation is the cloud bill — trivial
Noisy-neighbour risk; needs ResourceQuotas, PriorityClasses	Strong isolation by default
One control plane to upgrade	Many control planes — significant ops
One blast radius	Smaller blast radius

Most mature platform teams settle on a small number of large shared clusters per environment, with namespaces and quotas for tenants. Total cost is usually lower; ops overhead is lower; the cost-allocation tooling becomes essential.

What Mature Looks Like

OpenCost / Kubecost in every cluster, exporting metrics to Prometheus.
Per-namespace cost dashboards available to each team.
Required labels enforced by admission policy.
Goldilocks / VPA recommendations reviewed monthly per workload.
Karpenter (or equivalent) with Spot pools; on-demand pools for stateful.
HPA / KEDA on every stateless workload; scale-to-zero for batch.
Idle-namespace alerts auto-routed to owners.
Compute Savings Plan / CUD coverage on the on-demand floor.

That stack — visibility + rightsizing + Karpenter/Spot + commitments — routinely delivers 50-70% reduction vs naive Kubernetes. It is also the single richest area in FinOps for engineers to specialise in.

The next lesson covers the other category that often dominates the bill once compute is optimised: data storage and egress.