Platform Engineering

Adoption Metrics

Ravinder·February 26, 2026·7 min read

Platform EngineeringDevOpsIDPPlatform MetricsDeveloper Productivity

Series

Platform Engineering from Zero

Part 9 of 10

← Part 8

Cost Attribution

Part 10 →

Avoiding Becoming a Bottleneck

Platform teams that can't measure their impact have a harder time getting headcount, getting budget, and surviving the next reorg. But the obvious metrics — services migrated, features shipped, tickets closed — are easy to game and tell you nothing about whether engineers are actually better off.

This post is about building a measurement system that's honest, that reveals problems before they become crises, and that gives you the language to have a productive conversation with leadership about platform value.

The Measurement Framework

Think in three layers:

Platform health metrics — is the platform itself reliable and fast?
Adoption metrics — are teams using it, and how broadly?
Outcome metrics — are teams shipping better because of it?

Most platform teams measure the first layer and claim the third. The second layer — adoption depth and breadth — is where the real leading indicators live.

graph TD subgraph "Outcome (lagging)" O1[Deployment frequency] O2[Change failure rate] O3[Mean time to restore] O4[Lead time for changes] end subgraph "Adoption (leading)" A1[Golden path coverage %] A2[Services on supported stack] A3[Self-service infra usage] A4[Catalog completeness] end subgraph "Platform health" H1[Golden path uptime] H2[CI pipeline p95 duration] H3[Module registry availability] H4[Vault/secret read latency] end H1 & H2 & H3 & H4 -->|enables| A1 & A2 & A3 & A4 A1 & A2 & A3 & A4 -->|drives| O1 & O2 & O3 & O4

Platform Health Metrics

Before measuring adoption, measure reliability. If the CI pipeline is flaky, if the Terraform module registry goes down, if secret injection fails randomly — adoption will not grow no matter how good your marketing is.

Track:

Golden path uptime — treat your platform components as services with SLOs. If your CI reusable workflow has a 97% success rate on a given week, that's 3% of build runs failing for platform reasons. That's your problem to fix.
CI pipeline p50/p95 duration — slow CI is a tax on every developer every day. A PR pipeline that takes 25 minutes is a context-switch factory.
Module registry availability — if terraform init fails because your module registry is down, product teams are blocked.
Feedback loop speed — how quickly can a product team report a platform bug and get a fix? Track this as a service metric.

# SLO definition for golden path CI
slo:
  name: golden-path-ci-success-rate
  description: "Golden path CI workflow completes successfully"
  target: 0.98   # 98% of runs succeed
  window: 7d
  metric:
    name: github_workflow_run_success_rate
    filter:
      workflow: "service-ci.yml"
      type: "reusable"

Adoption Metrics: What Actually Matters

Golden path coverage. What percentage of services are using the golden path for each dimension (CI, observability, secrets, infra)? Track separately, not as a single score. A service can be on the golden path for CI but using static credentials — that's useful to know.

-- Pseudo-query against your catalog data
SELECT
    dimension,
    COUNT(CASE WHEN on_golden_path THEN 1 END) AS on_path,
    COUNT(*) AS total_services,
    ROUND(100.0 * COUNT(CASE WHEN on_golden_path THEN 1 END) / COUNT(*), 1) AS coverage_pct
FROM service_platform_adoption
WHERE lifecycle = 'production'
GROUP BY dimension
ORDER BY coverage_pct;

Expected output might look like:

dimension	on_path	total_services	coverage_pct
ci_pipeline	45	52	86.5
observability	38	52	73.1
secrets_management	29	52	55.8
self_service_infra	21	52	40.4

This tells you where to focus platform investment. Secrets management coverage at 55% is the current gap, not CI.

Time to first deploy for new services. How long from "engineer creates service repo" to "service is running in staging"? Track this as a cohort metric — services created in Q1 vs Q2. Platform improvements should show up as a shrinking median.

Escape hatch usage. Post 2 covered escape hatches. Track which ones are used, how often, and whether the same teams keep using them. High escape hatch usage in one area is a signal that the golden path doesn't fit that use case — the path needs updating, not the team.

Module version lag. For services using platform Terraform modules, what's the distribution of module versions in use? If 60% of services are on v1.x and the current version is v3.x, your upgrade communication isn't working.

Outcome Metrics: DORA and Beyond

DORA's four metrics (deployment frequency, lead time for changes, change failure rate, mean time to restore) are the standard for measuring software delivery performance. They're lagging indicators — they reflect past behaviour, not current platform adoption. But they're the metrics leadership cares about.

# Simplified DORA metric calculation from deployment events
from datetime import datetime, timedelta
from collections import defaultdict
 
def deployment_frequency(deployments: list[dict], days: int = 30) -> dict:
    """
    Returns deployments per day per service for the given window.
    deployments: list of {service, env, timestamp, success}
    """
    cutoff = datetime.utcnow() - timedelta(days=days)
    prod_deploys = [
        d for d in deployments
        if d["env"] == "production"
        and d["timestamp"] >= cutoff
        and d["success"]
    ]
 
    by_service = defaultdict(int)
    for d in prod_deploys:
        by_service[d["service"]] += 1
 
    return {
        service: round(count / days, 2)
        for service, count in by_service.items()
    }
 
def change_failure_rate(deployments: list[dict], days: int = 30) -> float:
    cutoff = datetime.utcnow() - timedelta(days=days)
    prod = [d for d in deployments
            if d["env"] == "production" and d["timestamp"] >= cutoff]
    if not prod:
        return 0.0
    failures = sum(1 for d in prod if not d["success"])
    return round(failures / len(prod), 3)

Connect DORA metrics to platform adoption cohorts. Services on the golden path should show higher deployment frequency and lower change failure rate than services not on it. If that correlation doesn't exist, your platform isn't delivering the value you think it is.

The Dashboard That Tells the Story

A single dashboard for platform leadership conversations:

graph LR subgraph "Monthly Platform Review" M1["Golden path coverage: 74% (+8% vs last month)"] M2["New service time-to-deploy: 4h median (was 2 days)"] M3["Escape hatches filed: 3 (down from 11)"] M4["DORA - On-path services: Elite/High"] M5["DORA - Off-path services: Medium/Low"] M6["Incidents caused by platform: 1 (SLO breach)"] end

That last metric — incidents caused by the platform — is the one platform teams resist tracking. Track it anyway. A platform team that doesn't count the outages it causes has no credibility when claiming the productivity gains.

Vanity Metrics to Avoid

Tickets closed. Closing tickets is output, not outcome.
Services migrated. A migration with no improvement in DORA metrics is just churn.
Features shipped. Unless you have a credible comparison baseline, this is noise.
NPS score alone. Useful directionally but easy to game and hard to act on.

Key Takeaways

Measure in three layers: platform health (is it reliable?), adoption (are teams using it?), and outcomes (are teams shipping better?) — most teams skip the middle layer, which is where leading indicators live.
Golden path coverage per dimension (CI, observability, secrets, infra) tells you where to invest, not just how far along you are overall.
Time-to-first-deploy for new services is the most direct measure of onboarding friction; track it as a cohort metric so improvements show up as trends.
Connect DORA metrics to adoption cohorts: on-path services should outperform off-path services, and if they don't, the platform isn't delivering the claimed value.
Track incidents caused by the platform alongside incidents prevented — a team that only claims credit for wins and ignores failures has no credibility.
Vanity metrics (tickets closed, services migrated) are easy to report and hard to act on; resist the temptation to report them instead of outcome metrics.