The first rule of cloud optimisation: most things are too big. CPU utilisation across enterprise cloud fleets typically averages 10-20%. Memory averages 30-40%. Storage usage often falls below 50% of provisioned. The gap between provisioned and used is the largest single category of cloud waste.
Rightsizing means closing that gap. It is the second optimisation after killing the obviously idle.
Step 1: Kill the Idle
Before rightsizing, eliminate resources that are not used at all:
- EC2 instances with <5% CPU and <5% network for 7+ days
- Unattached EBS volumes — they bill regardless of attachment
- Old EBS snapshots beyond retention policy
- Idle load balancers (zero traffic)
- Unused Elastic IPs (billable when not associated)
- RDS / databases with no connections
- Empty S3 buckets in expensive classes
- NAT Gateways in subnets with no outbound traffic
- VM scale sets / Auto Scaling Groups at min=N when usage justifies min=0
Each provider has native tooling: AWS Trusted Advisor and AWS Cost Optimization Hub, Azure Advisor, GCP Recommender. They will surface idle resources in minutes. Third-party tools (Vantage, CloudHealth, ProsperOps, Kion, Apptio Cloudability) add automation.
Make this an automated weekly job, not a one-off campaign. Idle resources reappear constantly.
Step 2: Schedule Non-Prod
Development, staging, QA, and demo environments are typically used 8 hours a day, 5 days a week — 40 hours out of 168, or 24% of the week. Running them 24/7 wastes 76%.
Solutions:
- Auto-shutdown. AWS Instance Scheduler, Azure DevTest Labs auto-shutdown, GCP scheduled instance group resizing. Tag-driven:
schedule=weekday-9-7shuts down outside hours. - Spin-up on demand. Ephemeral preview environments per PR; tear down on merge. Qovery, Shipyard, GitHub Codespaces and similar make this practical.
- Spot/preemptible for non-prod. Acceptable if interruption is recoverable.
Expected savings on non-prod: 60-70%. Often the largest single FinOps win.
Step 3: Rightsize the Survivors
For workloads that genuinely run continuously, match instance size to observed usage.
The methodology
- Capture 2-4 weeks of CloudWatch / Azure Monitor / Cloud Monitoring metrics: CPU, memory, network, disk IO, IOPS.
- Compute P95 of each metric.
- Pick the smallest instance type whose limits exceed P95 by a target buffer (commonly 25-40%).
- Check instance family — newer generations are usually cheaper and faster.
- Consider Graviton (AWS), Ampere (Azure / Oracle), Tau (GCP) ARM instances for 20-40% additional savings.
Recommendation engines
Use them but verify:
- AWS Compute Optimizer
- Azure Advisor cost recommendations
- GCP Recommender
- Third-party: Vantage, ProsperOps, Cloudability, Kubecost (Kubernetes)
They are accurate for steady-state workloads. They miss burst patterns, batch jobs, periodic load tests, and warm pools for failover. Always overlay product-team knowledge.
Don't forget memory and IO
Many recommendations focus on CPU. A right-CPU but wrong-memory choice still wastes money. R instance families exist because some workloads are memory-bound; M-family is general-purpose; C-family is CPU-bound. Pick the family that matches the dominant resource.
RDS, ElastiCache, Redshift
Same approach. Many database instances are sized for peak; consider provisioned IOPS to general-purpose, aurora-serverless v2 for variable workloads, and reader/writer split rather than upsizing the primary.
Step 4: Storage Class and Lifecycle
Storage tiers offer dramatic cost differences:
| S3 class | Approx $/GB-month | Access pattern |
|---|---|---|
| S3 Standard | $0.023 | Active |
| S3 Standard-IA | $0.0125 | Infrequent |
| S3 One Zone-IA | $0.01 | Infrequent, lower availability OK |
| S3 Glacier Instant Retrieval | $0.004 | Archived, occasional access |
| S3 Glacier Flexible Retrieval | $0.0036 | Archived, hours to retrieve |
| S3 Glacier Deep Archive | $0.00099 | Cold, 12-hour retrieve |
| S3 Intelligent-Tiering | tiered | Auto-moves; small monitoring fee |
Equivalent tiers exist on Azure (Hot/Cool/Cold/Archive) and GCP (Standard/Nearline/Coldline/Archive).
- Enable lifecycle rules on every bucket: transition to IA after 30 days, Glacier after 90, Deep Archive after 365.
- Use S3 Intelligent-Tiering for unknown access patterns — almost always net savings, only $0.0025/1000 objects monitoring fee.
- Set incomplete-multipart-upload abort rules — orphaned multi-part uploads accumulate invisibly.
- Delete old object versions if versioning is enabled.
Step 5: Modernise the Architecture
Bigger gains come from changing what you run, not how big it is.
- Spot / preemptible instances — up to 90% off for interruptible workloads (batch, CI, stateless workers, fault-tolerant training). Karpenter and Cluster Autoscaler make this routine on Kubernetes.
- Serverless — pay per request. For low-volume or bursty endpoints often cheaper than always-on containers. Lambda, Cloud Run, Container Apps.
- Managed services over self-hosted — DynamoDB vs Cassandra on EC2; managed Redis vs self-managed. Usually higher unit cost but lower total cost when ops, patching, and on-call are included.
- Consolidation — multi-tenant where it fits; one well-utilised cluster beats five half-empty ones.
- Region choice — same workload can be 20% cheaper in one region than another; especially relevant for batch / training.
Step 6: Continuous Rightsizing
Rightsizing once is a campaign; rightsizing continuously is operating discipline:
- Monthly review of top 50 spenders.
- Automated reports of P95 utilisation per resource flagged when below 30%.
- Kubernetes Vertical Pod Autoscaler in recommendation mode; HPA on the right metric.
- Storage lifecycle rules in place by default in your IaC modules.
- Spot adoption tracked as a percentage of compute hours.
What to Expect
| Optimisation | Typical savings | Effort |
|---|---|---|
| Idle resource cleanup | 5-15% | Low |
| Non-prod scheduling | 10-20% | Medium |
| Storage tiering | 5-10% | Low |
| Rightsizing | 10-20% | Medium |
| Spot adoption | 10-25% | Medium-High |
| ARM / Graviton | 10-20% of compute | Medium (compat testing) |
| Architecture modernisation | varies, often largest | High |
These compound. Combined, a first-year FinOps programme commonly delivers 25-40% reduction on like-for-like workloads, with the gains compounding as new workloads land in the optimised pattern by default.
One major rate-based lever remains: commitments. That is the next lesson.