We run both ECS and EKS in production. Which we use for what, and the actual decision criteria — not the marketing comparison.
We have services running on both ECS and EKS. The two platforms can both do the basic job: run containerized services with rolling deploys, autoscaling, and load balancing. They differ in operational shape, ecosystem, and where the rough edges show up. This is the comparison from someone who's run both for a few years.
ECS is the AWS-native container platform. Things it does well:
Tight AWS integration. ECS task roles are IAM roles attached to tasks. CloudWatch Logs come for free. Service discovery via Cloud Map is built in. ALB integration is one config block.
Lower operational burden. No control plane to manage. No upgrades to schedule. ECS just exists; you deploy services to it.
Fargate. Run tasks without managing nodes. You pay per-task CPU/memory. We use this for bursty workloads where node management would be overhead.
Predictable cost. Per-task pricing on Fargate, EC2 costs on EC2 mode. No "control plane fee" beyond what's bundled with Fargate. Easy to model and explain to finance.
The flip side:
The ecosystem is small. Outside AWS, ECS-specific tools barely exist. There's no "ECS operator" community, no helm-charts equivalent, no rich set of CRDs.
Configuration is verbose. Task definitions are JSON with lots of repeated boilerplate. We use Terraform modules to abstract this; the underlying definitions are still verbose.
Service-to-service communication is awkward. Service discovery via Cloud Map works but requires DNS-based resolution. There's no native concept of a service mesh or mTLS without bolt-on tools.
Limited scheduler flexibility. ECS's scheduler is OK but not as sophisticated as Kubernetes. Pod affinity/anti-affinity equivalent is bin-packing strategies, which are coarser.
Vendor lock-in. ECS task definitions don't translate to anything else. If you wanted to move off AWS, you'd be rewriting deployment configs.
EKS is AWS's managed Kubernetes:
The ecosystem is huge. Helm charts, operators, CRDs for everything. New tools target Kubernetes first; ECS support is afterthought or absent.
Portable. Workloads on EKS run on GKE / AKS / on-prem K8s with minor modifications. Real exit ramp.
Mature scheduling. Pod affinity, taints, tolerations, topology constraints. Can express complex placement requirements.
Service mesh, GitOps, observability all have rich tools that work across K8s clusters.
The K8s API itself. Once you know it, it's a powerful abstraction across many clouds and on-prem.
The cost:
Operational complexity. Even with managed control plane, you operate worker nodes, networking (CNI tuning, load balancers, ingress), storage classes, RBAC, namespaces, etc. There's a lot to know.
Upgrade cadence. Every 4 months a new K8s minor version. AWS deprecates old versions after ~14 months. We spend ~1 quarter per year on EKS / addon upgrades.
Resource overhead. Kubelet, kube-proxy, CNI agents, monitoring agents all run on every node. ~20-30% of node resources are platform overhead. ECS has less.
Cost is harder to attribute. With multi-tenant pods on shared nodes, "what does this service cost" requires kubecost or similar tooling. ECS Fargate's per-task pricing is simpler.
Failure modes are deeper. When something is wrong with K8s, the failure mode could be in API server, etcd, kubelet, scheduler, controllers, CNI, your CRD, your operator, your pod. Many places to look.
Our actual split:
On ECS Fargate:
On EKS:
The split is roughly: EKS for the "platform" workloads where we want consistency across many services and rich tooling; ECS Fargate for the workloads where simplicity wins.
When a new service comes up, the decision tree:
The decision tree handles ~95% of cases. The remaining 5% are judgment calls.
A common claim: "ECS Fargate is more expensive than EKS." It's complicated.
For a single service running 24/7 at known capacity:
For bursty / scale-to-zero workloads:
For a fleet of 50+ services with varying load:
Our calculation: at our scale (~600 pods on EKS, ~80 ECS tasks), EKS is ~30% cheaper for the EKS workloads, but ECS Fargate is ~5x cheaper for the bursty/intermittent workloads. We use both for different reasons.
Going from ECS to EKS:
Effort: ~1-2 days per service for the conversion, longer if the service has unusual patterns.
Going from EKS to ECS:
We've done a few migrations in each direction. ECS → EKS is more common (services outgrowing ECS's simplicity). EKS → ECS is rarer but happens for specific workloads (often Fargate for bursty isolated tasks).
Some teams run K8s on ECS-managed nodes (e.g., EKS-anywhere on EC2 + ECS for the control plane). This is unusual; we don't.
The point of ECS is "you don't need K8s." The point of EKS is "you do need K8s." Trying to combine is muddling the choice.
If you have one service or a small handful, use ECS. The lower setup cost wins. EKS is overkill until you have a fleet.
If you have 20+ services or anticipate K8s ecosystem dependencies, use EKS. Operator/CRD/service-mesh tooling exists on K8s; on ECS you'd be rebuilding from scratch.
If your team doesn't know K8s, ECS Fargate is friendlier. Less to learn, less to break.
If portability matters, use EKS. ECS task definitions don't go anywhere else.
Use both if they fit. Fargate for bursty/cron workloads; EKS for the platform. Don't force one to do everything.
The ECS vs EKS choice is mostly a fit question, not a quality question. Both work. Match the platform to the operational shape of your team and the workload patterns you have. The teams that struggle are the ones that picked one for ideological reasons (we love AWS / we love K8s) without checking the actual fit.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.
A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.
Explore more articles in this category
There are two hard problems in computer science." We've worked on the cache-invalidation one for a while. The patterns that hold up at scale and the ones that look clean and aren't.
We use Step Functions for batch processing, document ingestion, and a few agentic workflows. The patterns that work, the limits we hit, and where we'd reach for something else.
After two years of running Karpenter on production EKS clusters, the NodePool patterns that survived, the ones we replaced, and the tuning that matters.
Evergreen posts worth revisiting.