The Value Proposition, Honestly
Part 2 →
Golden Paths
Every few months a team somewhere rewrites their CI pipeline, packages it into a Slack bot, calls it a platform, and presents at a conference about developer experience. None of that is wrong, but it does explain why "platform engineering" means wildly different things in different organisations — and why the ROI conversation is so hard to have honestly.
Let me try.
What Platform Engineering Actually Promises
The pitch is simple: instead of every product team solving the same infrastructure, deployment, observability, and security problems independently, one team builds it once and everyone else consumes it through a clean interface. You get economies of scale. Product teams ship faster. The platform team compounds knowledge rather than scattering it.
The problem is that "clean interface" part. It costs real effort to build, and if you build the wrong interface you've just created a new bottleneck instead of removing one.
The diagram looks clean. The reality is messier — the platform itself needs design, development, testing, on-call, and roadmap management. That is overhead before a single product team ships a line of business logic.
When It Earns Its Keep
Platform investment pays off when you can say yes to most of these:
You have repeated toil. If more than two teams are solving the same problem independently, you have duplication that will drift apart. The third time someone writes a GitHub Actions workflow to build a Docker image and push to ECR, something has gone wrong.
Your org is growing. A 10-engineer startup does not need a platform team. A 100-engineer org where onboarding takes three weeks because nobody owns the golden path definitely does.
Compliance and security have teeth. Regulated industries — finance, healthcare, anything with SOC 2 — need consistent controls. A platform that bakes security policies into the deployment path is cheaper than auditing 12 different teams' implementations.
You can measure cognitive load. If product engineers regularly context-switch into Terraform, Kubernetes YAML, and IAM policies to ship a feature, that is lost throughput. Platform engineering's job is to make the pit of success the path of least resistance.
A rough break-even calculation worth doing:
# Back-of-envelope platform ROI
product_teams = 8
toil_hours_per_team_per_month = 20 # infra, CI, incidents they shouldn't own
platform_team_size = 3
platform_eng_cost_per_month = 3 * 15_000 # fully-loaded
toil_reclaimed = product_teams * toil_hours_per_team_per_month # 160 h/month
toil_value = toil_reclaimed * 150 # $150/h blended rate
net = toil_value - platform_eng_cost_per_month
print(f"Net monthly: ${net:,.0f}") # $24,000 at these numbers — but only if the platform actually gets adoptedThe "only if" is doing a lot of work in that comment.
When It Doesn't
Here is where conferences get dishonest.
The org is too small. Three teams do not need an IDP. They need a README and a shared Terraform module. Adding a platform layer adds coordination without adding value.
The platform team owns the critical path. If product engineers must file tickets and wait for the platform team to provision a database, you've replaced self-service cloud with a slower IT helpdesk. This is the most common failure mode.
Nobody dogfoods it. Platform teams that don't use their own platform lose feedback loops fast. Within six months they're building features their users don't need and missing the ones they do.
You're measuring outputs instead of outcomes. "We migrated 40 services to the new pipeline" is an output. "Deployment frequency increased 3x and incident rate dropped 40%" is an outcome. Teams that chase migration counts instead of developer experience metrics will ship the wrong platform.
The Honest Starting Point
Before you hire a platform engineer, answer three questions:
- What specific toil are you eliminating, and for whom?
- How will product teams consume the platform without filing a ticket?
- How will you know in six months whether it's working?
If you can answer all three concretely, you're ready. If the answers are "we'll figure it out," delay until they're not.
The rest of this series assumes you've decided to build. Each post covers a concrete layer of the platform — golden paths, scaffolding, service catalogs, self-service infra, secrets management, observability defaults, cost attribution, adoption metrics, and the failure modes that kill platform teams from the inside.
No hype. Just the work.
Key Takeaways
- Platform engineering pays off through economies of scale, but the break-even requires honest accounting of platform team cost versus toil reclaimed.
- The most common failure mode is replacing cloud self-service with a slower internal ticket queue — the opposite of the goal.
- Small organisations (fewer than ~50 engineers) are almost always better served by shared modules and documented conventions than by a dedicated platform team.
- Measure outcomes (deployment frequency, incident rate, onboarding time) not outputs (services migrated, tickets closed).
- Three questions must be answered before forming a platform team: what toil, how self-serve, how measured.
- The platform team that doesn't dogfood its own platform loses the feedback loop that keeps it relevant.
Part 2 →
Golden Paths