Cost Attribution
Cloud costs are one of the few engineering problems that get harder as you scale. When you're small, the bill is visible and small enough that everyone informally knows who's spending what. At 50 engineers, the bill is large and nobody really knows. At 200 engineers, the bill is enormous, finance is asking questions, and the answer "we'll look into it" stops working.
Cost attribution is the practice of connecting cloud spend to the teams and services that generate it. Done well, it creates accountability without blame, surfaces waste before it compounds, and makes budget conversations fact-based instead of political. Done poorly, it becomes a monthly finger-pointing exercise that breeds resentment and still doesn't reduce the bill.
Why Attribution Is Hard
The technical problem is straightforward: tag every resource, aggregate costs by tag. The actual problems are:
Shared costs. The Kubernetes control plane costs money. The NAT gateway costs money. The platform team's own tooling costs money. How do you split these across product teams fairly?
Incomplete tagging. Resources created outside your Terraform modules (someone used the console, a third-party operator created resources, the cloud provider created resources on your behalf) don't have your cost tags.
Granularity mismatch. AWS bills per resource-hour. You want to know cost per service per day. The mapping between the two is non-trivial, especially in Kubernetes where multiple services share nodes.
Lagged data. AWS Cost Explorer data is 24–48 hours behind. You're looking at what happened, not what's happening.
None of these are reasons not to do attribution. They're reasons to be honest with your stakeholders about what the numbers mean and what margin of error to expect.
The Tagging Strategy
Start with the minimum tag set that enables attribution. Every resource that can be tagged should have:
| Tag | Value | Purpose |
|---|---|---|
service |
payments-api |
Primary attribution unit |
team |
payments-team |
Rollup to team level |
environment |
production |
Separate prod/staging spend |
managed-by |
terraform |
Identify unmanaged resources |
cost-center |
CC-1042 |
Finance integration |
Enforce these tags in your Terraform modules (as shown in post 5) and in CI policy checks. Untagged resources should fail the plan.
# modules/common/tagging.tf — required tags enforced by every module
variable "required_tags" {
description = "Required tags applied to all resources"
type = object({
service = string
team = string
environment = string
cost_center = string
})
}
locals {
standard_tags = merge(var.required_tags, {
managed-by = "terraform"
})
}# scripts/tag-audit.py — weekly job to find untagged resources
import boto3
REQUIRED_TAGS = {"service", "team", "environment", "managed-by"}
def audit_untagged_ec2():
ec2 = boto3.client("ec2")
instances = ec2.describe_instances()
untagged = []
for reservation in instances["Reservations"]:
for instance in reservation["Instances"]:
tags = {t["Key"] for t in instance.get("Tags", [])}
missing = REQUIRED_TAGS - tags
if missing:
untagged.append({
"id": instance["InstanceId"],
"missing_tags": list(missing),
"launch_time": instance["LaunchTime"].isoformat(),
})
return untagged
if __name__ == "__main__":
import json
results = audit_untagged_ec2()
print(json.dumps(results, indent=2))
if results:
exit(1) # fail CI, alert on-callKubernetes: The Hard Part
Container workloads complicate attribution because a node runs multiple pods from multiple teams. You need to allocate node cost to pods, and pod cost to services.
The practical approach combines two tools:
Kubecost (or OpenCost, its open-source core) analyzes actual CPU/memory requests and actual usage per namespace and deployment. It produces per-workload cost estimates.
Namespace-to-team mapping in your service catalog connects the Kubecost output to your team structure.
# Cost allocation annotations on namespaces
apiVersion: v1
kind: Namespace
metadata:
name: payments
annotations:
cost-center: "CC-1042"
team: "payments-team"
cost.platform.io/allocation-group: "payments"Showback vs Chargeback
Two models for communicating costs to teams:
Showback: "Here is how much your services cost. This is for your information." No money actually moves. Teams see the data, feel some accountability, but have no hard incentive to optimise.
Chargeback: "Your team's cloud budget is $X. Your actual spend is $Y. The delta affects your budget for next quarter." Money moves (at least notionally). Much stronger incentive. Also much more political.
Start with showback. It builds the muscle of looking at cost data and developing intuition for what's expensive. After six months, most teams will have optimised voluntarily because engineers dislike waste once they can see it.
If showback doesn't move behaviour after two quarters, move to chargeback — but only with a fair shared-cost model agreed in advance.
Shared Cost Allocation
The hard question: how do you split the Kubernetes control plane, the NAT gateway, the platform's own tooling across product teams?
Common models:
Equal split. Every team pays 1/N of shared costs. Simple, feels fair, wrong — a team running five microservices pays the same as a team running one.
Proportional to direct spend. Each team's share of shared costs equals their share of total direct spend. If payments-team generates 30% of direct compute cost, they pay 30% of shared costs.
Tiered. Costs are bucketed. Teams in higher tiers (more services, more traffic) pay more. Requires defining tiers, which requires a conversation.
Proportional is the most defensible. Document the model before you show the first bill. "Here is how shared costs are allocated, and here is the calculation" preempts most objections.
# Monthly showback report generator
def calculate_team_costs(direct_costs: dict, shared_costs: float) -> dict:
total_direct = sum(direct_costs.values())
report = {}
for team, direct in direct_costs.items():
share_of_shared = (direct / total_direct) * shared_costs
report[team] = {
"direct_cost": direct,
"shared_cost_allocation": round(share_of_shared, 2),
"total": round(direct + share_of_shared, 2),
"pct_of_total": round((direct + share_of_shared) / (total_direct + shared_costs) * 100, 1),
}
return reportKey Takeaways
- Cost attribution connects cloud spend to accountable teams — without it, the bill grows and nobody knows why, or who to ask.
- A minimum tag set (
service,team,environment,managed-by) enforced by Terraform modules and CI policy is the foundation. Untagged resources should fail the build. - Kubernetes attribution requires dedicated tooling (Kubecost/OpenCost) to allocate node cost to workloads; namespace annotations bridge workload costs to team ownership.
- Start with showback before chargeback — teams that can see their costs reduce them voluntarily; chargeback adds political complexity that requires agreed shared-cost models.
- Shared cost allocation using proportional spend (each team pays their share of total direct spend as shared cost allocation) is the most defensible model.
- Lagged data and incomplete tagging mean attribution numbers always have a margin of error — communicate this to finance upfront rather than defending individual data points.