A real cost audit uncovered idle load balancers, oversized RDS instances, and forgotten snapshots. Here's what we found and how we fixed each one.
After our AWS bill crossed $18,000/month for a 15-person startup, we did a proper audit. We found $6,200 in monthly waste. Here's every item.
Three ALBs were still running from decommissioned staging environments. Each costs ~$16/month base plus LCU charges.
Fix: We added a Terraform lifecycle check that tags ALBs with the owning team and a TTL. A weekly Lambda deletes anything past its TTL with zero healthy targets.
Our production database was on db.r6g.2xlarge. CloudWatch showed average CPU at 12% and memory at 35%.
Fix: Downgraded to db.r6g.large during a maintenance window. Set up a CloudWatch alarm for CPU > 70% so we'll know when to scale back up.
14 EBS volumes were sitting with status "available"—leftovers from terminated EC2 instances.
Fix: Scripted a check:
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].{ID:VolumeId,Size:Size,Created:CreateTime}' \
--output table
Snapshot anything older than 30 days, then delete.
We had 2,400 EBS snapshots going back 3 years. Most were from AMIs we no longer use.
Fix: Implemented AWS Data Lifecycle Manager with a 90-day retention policy.
Our NAT Gateway was processing 800GB/month. Much of it was S3 traffic from private subnets.
Fix: Added a VPC Gateway Endpoint for S3. Free, and it cut NAT traffic by 60%.
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.s3"
route_table_ids = [aws_route_table.private.id]
}
Every Lambda was set to 1024MB by default. AWS Power Tuning showed most needed 256MB.
Fix: Ran Power Tuning on our top 10 functions and right-sized them.
We were paying on-demand for 4 EC2 instances that had been running for 2 years.
Fix: Purchased 1-year no-upfront reserved instances for predictable workloads.
The $6,200/month we saved required about 8 hours of work. That's an annualized return of $74,400 for one day of effort.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
A real walkthrough of shrinking bloated Docker images from 1.2GB to 240MB using multi-stage builds, Alpine, and dependency auditing.
Battle-tested prompt patterns from running LLM features in production: structured output, chain-of-thought, and graceful failure handling.
Explore more articles in this category
A working mental model for AWS VPCs — what each piece does, how they connect, and why "VPC" is the wrong mental model if you came from physical networks.
Create your first S3 bucket, upload and download files, and set up the right access controls — without accidentally making everything public.
Write, package, and deploy a Lambda function using only the AWS CLI. Trigger it via a public URL. Understand what serverless actually means.