We run ~200 Lambda functions. Cold starts, memory tuning, and the cost-vs-latency trade-offs that actually move the bill.
We run ~200 Lambda functions across various pipelines, event handlers, and APIs. Lambda is great when it fits and frustrating when it doesn't. After a couple of years of tuning, this is the working playbook for the optimizations that actually moved our metrics — and the ones that turned out to be folklore.
A Lambda invocation costs:
For most workloads, memory-time dominates. The request charge only matters at very high invocation rates (millions/day).
Lambda gives you a single knob: memory. CPU and network bandwidth scale linearly with memory. So a 1024MB Lambda has roughly 2x the CPU of a 512MB one.
Two patterns:
For CPU-bound work, more memory is often cheaper. Sounds backwards. Example: a function that does PDF rendering. At 512MB it took 8s; at 2048MB it took 1.5s. The GB-seconds:
The 2GB version is 25% cheaper despite more memory, because it finishes much faster. Plus, the user-facing latency is 5x better.
For I/O-bound work, less memory is often cheaper. A function that mostly waits for an external API: more memory doesn't speed it up. The 128MB version costs as much as 512MB version when the bottleneck is network round-trip time.
We use AWS Lambda Power Tuning (a step-function-based tool) to find optimal memory per function. It runs each function at multiple memory sizes and reports the cost-vs-latency curve. We re-run it quarterly per function.
Result: ~$1,200/month savings from memory tuning across our fleet.
A cold start is when a new Lambda execution environment is created. It takes time:
For a Node.js function with 100MB of dependencies, cold start can be 2-5 seconds. For a small Go function, often < 200ms.
Things that help:
Smaller deployment package. Less code = less to load. We tree-shake and bundle (esbuild for Node, dataclass-only Python where possible). One function went from 80MB → 6MB → cold start dropped 1.8s.
Provisioned concurrency for latency-sensitive functions. Pre-warmed environments. Costs ~$8.50/month per provisioned instance, but cold starts go to ~10ms. We use this for ~5 functions where p99 latency matters.
SnapStart (Java) if you're on Java 11/17. Snapshotted JVM, drops cold starts from seconds to milliseconds. We have a few Java functions; SnapStart was a significant improvement.
Avoid heavy module-level work. Code that runs at the top level of the file runs once per cold start. Move expensive setup (DB connection pools, large config loads) to lazy initialization where possible.
What doesn't help much (from our testing):
Lambda is great for:
Lambda is bad for:
We've moved a few functions back to ECS when they outgrew Lambda's economics. The crossover for our shape of services is around 100 req/s sustained. Below that, Lambda is cheaper. Above, ECS wins.
Specific cases that cost us money:
Lambda invoking Lambda invoking Lambda. Each layer of indirection multiplies cost. We had a chain where one event triggered three Lambdas in series, billed in full for each. Restructured to run the work in one Lambda. Saved $400/month.
API Gateway in front of low-traffic functions. API Gateway is ~$3.50 per million requests. For a Lambda doing 50 req/s, the API Gateway bill is bigger than the Lambda bill. We use Lambda Function URLs (cheaper, no API Gateway features) for some internal APIs.
Forgotten Lambdas left running. Old experiments, deprecated features. We have a quarterly review: any Lambda with < 100 invocations in 30 days gets its owner asked, "still needed?" About 20% get deleted each round.
SQS triggers without batching. Each SQS message triggers a Lambda invocation. For high-volume queues, batching messages (the BatchSize setting on the trigger) reduces invocation count by 10x.
CloudWatch Logs retention. Default retention is forever. Some Lambdas had years of logs at $0.50/GB-month. We set 30-day retention by policy. ~$200/month savings from that single change.
Lambda has account-wide concurrency limits (default 1000 simultaneous executions). When you hit it, Lambdas get throttled — invocations fail with "TooManyRequestsException."
For burst-prone workloads, this matters. Mitigations:
We ran into this once with a fanout that fired 5,000 Lambdas in one second. Half failed. We added reserved concurrency on the source function and SQS for the downstream.
Lambda's built-in observability is minimal — CloudWatch Logs and a few metrics. For real visibility:
Structured logging. Every Lambda emits JSON logs with consistent fields (function, request_id, duration, status). We use the AWS Lambda Powertools libraries (Python, TypeScript, etc.) which do this well.
X-Ray tracing. Optional. We turn it on for functions in critical paths. Adds visibility into what's slow within an invocation. Cost is real — X-Ray traces aren't free — so we don't enable it by default.
Metric filters on logs. Custom metrics derived from log patterns. E.g., "count of error logs containing 'database timeout'" becomes a metric we can alert on.
Lambda Insights (an AWS feature). Enhanced metrics including CPU steal, memory utilization, network. We enable it on Lambdas where we suspect resource contention.
Init outside the handler. Heavy setup (DB clients, SDK clients) goes outside the handler so it's reused across invocations within the same execution environment.
# Top of file - runs once per cold start
db_client = boto3.client("dynamodb")
def handler(event, context):
# Reuses db_client across warm invocations
return db_client.get_item(...)
Connection pooling carefully. Each Lambda execution environment is independent. A connection pool inside the Lambda doesn't share across instances; if you have 100 concurrent Lambdas each with 10 connections, that's 1000 connections to your database. RDS Proxy helps mitigate this for relational databases.
Idempotency keys. Lambda might invoke your handler twice for the same event (rarely, but it happens — SQS at-least-once delivery, retries on failure). We pass an idempotency key (typically the event ID) and use it to dedupe at the destination (DynamoDB conditional write, etc.).
Graceful failure handling. Failed Lambda invocations go to dead-letter queues (or destinations, the newer feature). DLQs collect the failures so we can investigate without losing data.
Run Power Tuning on every function. The memory sweet spot isn't intuitive. Tune empirically.
Watch for "Lambda invoking Lambda" patterns. They multiply cost and latency. Restructure to do the work in fewer invocations.
Set CloudWatch Logs retention by policy. 30 days for most, 7 days for noisy ones. Don't pay to store logs you'll never read.
Track cost per function, not just total Lambda cost. Tag every function with team and feature; the AWS bill becomes much more actionable.
Lambda's not always the answer. When traffic outgrows Lambda economics or latency requirements, move to ECS/EKS. The crossover is real.
Lambda done well is one of AWS's best products. Lambda done badly is a cost surprise waiting to happen. The difference is whether you spent a couple of hours tuning each function or just deployed defaults. The tuning pays for itself quickly.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Concrete systemd unit patterns that reduced flakiness: restart policies, resource limits, and structured logs.
A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.
Explore more articles in this category
There are two hard problems in computer science." We've worked on the cache-invalidation one for a while. The patterns that hold up at scale and the ones that look clean and aren't.
We use Step Functions for batch processing, document ingestion, and a few agentic workflows. The patterns that work, the limits we hit, and where we'd reach for something else.
After two years of running Karpenter on production EKS clusters, the NodePool patterns that survived, the ones we replaced, and the tuning that matters.