When You Don't Need K8s
Part 2 →
The Cluster You Actually Want
There is a pattern I see repeatedly: a team of four engineers, two microservices, and an ambitious CTO who just came back from KubeCon. Six months later, those four engineers are now principally occupied with cluster maintenance, Helm chart debugging, and arguing about whether to use Flux or Argo. The two microservices still exist. They are not meaningfully more reliable.
Kubernetes is genuinely excellent for specific problems. The mistake is treating it as the default answer for container orchestration rather than a tool with a sharp cost-benefit tradeoff.
The Honest Cost Inventory
Before you provision a cluster, the cost inventory needs to be honest. Kubernetes costs are not just the EC2 or GKE bill.
Operational surface area you now own:
- Control plane upgrades (at minimum once every 12–14 months before you hit EOL)
- Node pool management, AMI patches, kernel updates
- CNI, CSI, and admission webhook compatibility across every upgrade
- Certificate rotation and etcd health
- Ingress controller maintenance
- Debugging networking failures that are invisible from application code
A managed control plane (GKE Autopilot, EKS with Fargate, AKS) eliminates some of this. Not all of it. The application layer — deployments, services, ingress rules, RBAC, NetworkPolicies, resource quotas, PodDisruptionBudgets — remains yours entirely.
The Alternatives That Actually Work
AWS ECS (Fargate)
ECS with Fargate is underrated by engineers who have drunk the K8s Kool-Aid. You get:
- No node management. Zero. Fargate provisions compute per task.
- Native IAM task roles (no IRSA complexity)
- Service Connect for service-to-service discovery
- Deep CloudWatch integration with no custom metrics pipeline
- ALB weighted routing for canary deployments without Argo Rollouts
# ECS Task Definition — the K8s Pod equivalent, minus 80% of the fields
{
"family": "api-service",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"taskRoleArn": "arn:aws:iam::123456789:role/api-service-task-role",
"containerDefinitions": [
{
"name": "api",
"image": "123456789.dkr.ecr.us-east-1.amazonaws.com/api:latest",
"portMappings": [{ "containerPort": 8080 }],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/api-service",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}No Ingress resource, no Service object, no HPA YAML. ALB does the load balancing. CloudWatch does the alerting. IAM does the identity. You can be productive on day one.
Fly.io
Fly is what Heroku should have become. You deploy from a Dockerfile or a fly.toml, and it runs in Fly's global anycast network. For latency-sensitive APIs serving global users, this is genuinely hard to replicate in Kubernetes without a multi-region cluster setup that will consume your entire quarter.
# fly.toml — entire deployment config
app = "my-api"
primary_region = "iad"
[build]
dockerfile = "Dockerfile"
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
[[vm]]
memory = "512mb"
cpu_kind = "shared"
cpus = 1Scale to zero, automatic TLS, private networking between apps. The operational complexity is near zero.
Render
Render occupies the space between Heroku and full ECS. Zero-downtime deploys, managed PostgreSQL, Redis, cron jobs, and private services. If you are building an internal API or a B2B SaaS MVP, Render removes an entire category of infrastructure decisions.
When the Equation Flips
None of this is to say Kubernetes is always wrong. The inflection point comes when:
- You have more than 15–20 services and need consistent deployment patterns across teams
- You need fine-grained network policies between services
- You have GPU workloads or custom scheduling requirements
- You're running batch workloads alongside long-running services and need bin-packing
- Compliance requires you to run in your own VPC with locked-down node images
- You have a dedicated platform team (at minimum 2 engineers) whose job is the cluster
The Honest Conversation With Your CTO
The migration from a simple deployment model to Kubernetes is not free. There is a cost in engineering hours, a learning curve tax, and a sustained operational burden. If your team does not have the capacity to absorb that cost, you will end up with a half-configured cluster that provides neither the simplicity of ECS nor the power of a well-run Kubernetes deployment.
The question to ask is not "should we use Kubernetes?" The question is: "Do we have the people and time to run it well, or will it run us?"
If the honest answer is no — reach for ECS, Fly, Render, or Cloud Run. Ship product. Revisit the question when you have a platform team.
Key Takeaways
- Kubernetes has real operational costs beyond the compute bill: upgrades, CNI, RBAC, and networking complexity are ongoing.
- ECS Fargate eliminates node management and integrates natively with AWS IAM, ALB, and CloudWatch — it is a serious platform, not a stepping stone.
- Fly.io and Render solve global deployment and developer experience problems with near-zero operational overhead.
- The inflection point for Kubernetes is roughly 15–20 services, a dedicated platform team, and requirements that managed PaaS cannot satisfy.
- The worst outcome is a Kubernetes cluster that consumes your entire platform engineering capacity while your product team waits.
- Make the choice deliberately. The YAML can always come later.
Part 2 →
The Cluster You Actually Want