Tags, Tags, Tags
Series
Cloud Cost EngineeringTags are the load-bearing columns of cost allocation. When they are missing or inconsistent, every cost conversation devolves into finger-pointing. When they are right, showback reports run themselves and rightsizing recommendations land in the correct Slack channel. The difference is architecture, not discipline.
Design the Taxonomy First
A tag taxonomy is a contract between engineering and finance. Design it before you enforce it.
Minimum required keys for any non-trivial cloud footprint:
| Tag Key | Format | Example | Required? |
|---|---|---|---|
env |
lowercase enum | prod, staging, dev |
Yes |
team |
slug | platform, payments, data |
Yes |
product |
slug | checkout, analytics |
Yes |
cost-center |
numeric string | CC-1042 |
Yes |
managed-by |
enum | terraform, cdk, manual |
Yes |
repo |
short repo name | infra-core |
Recommended |
Keep keys lowercase with hyphens. Mixed cases and underscores create duplicate dimensions in Cost Explorer. Enforce a short allowlist of values for high-cardinality keys like team and env — free-text values are unusable for grouping.
Enforcement via Service Control Policies
SCPs block resource creation when mandatory tags are absent. This is the enforcement layer. Add it to every non-root OU.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RequireCostAllocationTags",
"Effect": "Deny",
"Action": [
"ec2:RunInstances",
"rds:CreateDBInstance",
"elasticloadbalancing:CreateLoadBalancer",
"eks:CreateCluster",
"s3:CreateBucket"
],
"Resource": "*",
"Condition": {
"Null": {
"aws:RequestTag/team": "true",
"aws:RequestTag/env": "true",
"aws:RequestTag/cost-center": "true"
}
}
}
]
}SCPs are preventive controls. They do not fix the past — that is a separate problem.
Retroactive Cleanup with AWS Config
AWS Config can identify untagged resources. Pair it with a Lambda remediation.
import boto3
import json
def lambda_handler(event, context):
"""
Config remediation: tag resources that are missing required keys.
Triggered by AWS Config rule 'required-tags'.
"""
config = boto3.client('config')
ec2 = boto3.client('ec2')
# Parse the Config event
invoking_event = json.loads(event['invokingEvent'])
resource = invoking_event['configurationItem']
resource_id = resource['resourceId']
resource_type = resource['resourceType']
account_id = resource['awsAccountId']
if resource_type != 'AWS::EC2::Instance':
return {'status': 'skipped', 'reason': 'unsupported resource type'}
existing_tags = {t['key']: t['value'] for t in resource.get('tags', [])}
required_keys = ['team', 'env', 'cost-center']
missing = [k for k in required_keys if k not in existing_tags]
if not missing:
return {'status': 'compliant'}
# Apply placeholder tags — a human must fill in real values
placeholder_tags = [
{'Key': k, 'Value': 'UNTAGGED-NEEDS-REMEDIATION'}
for k in missing
]
ec2.create_tags(Resources=[resource_id], Tags=placeholder_tags)
print(f"Tagged {resource_id} with placeholders for: {missing}")
return {'status': 'remediated', 'resource': resource_id, 'keys': missing}Placeholder values are intentional. They surface in Cost Explorer as UNTAGGED-NEEDS-REMEDIATION, which is embarrassing enough to trigger human action.
Tag Propagation Architecture
Terraform: Default Tags at Provider Level
Stop tagging every resource manually. Set defaults at the provider.
# providers.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
env = var.environment
team = var.team
cost-center = var.cost_center
managed-by = "terraform"
repo = var.repo_name
}
}
}# variables.tf
variable "environment" { type = string }
variable "team" { type = string }
variable "cost_center" { type = string }
variable "repo_name" { type = string }Every resource created by this provider inherits the default tags. Individual resources can add keys but cannot remove required ones without triggering the SCP.
Measuring Tag Coverage
Query CUR to measure coverage — run this weekly and track the trend.
SELECT
DATE_TRUNC('week', line_item_usage_start_date) AS week,
COUNT(DISTINCT line_item_resource_id) AS total_resources,
COUNT(DISTINCT CASE
WHEN resource_tags_user_team IS NOT NULL
AND resource_tags_user_env IS NOT NULL
AND resource_tags_user_cost_center IS NOT NULL
THEN line_item_resource_id
END) AS tagged_resources,
ROUND(
100.0 * COUNT(DISTINCT CASE
WHEN resource_tags_user_team IS NOT NULL
AND resource_tags_user_env IS NOT NULL
AND resource_tags_user_cost_center IS NOT NULL
THEN line_item_resource_id END)
/ NULLIF(COUNT(DISTINCT line_item_resource_id), 0)
, 1) AS coverage_pct
FROM cur_db.cur_table
WHERE line_item_usage_start_date >= CURRENT_DATE - INTERVAL '90' DAY
AND line_item_line_item_type = 'Usage'
GROUP BY 1
ORDER BY 1;Set a team OKR: tag coverage above 95 %. Below 80 % means allocation reports are unreliable.
Key Takeaways
- A tag taxonomy is a contract — finalize keys and allowed values in a shared document before enforcement, not after.
- SCPs are the only reliable preventive enforcement mechanism; rely on convention alone and coverage will drift within weeks.
- Retroactive cleanup with placeholder values is better than no cleanup — visible badness motivates teams faster than invisible badness.
- Terraform
default_tagsat the provider level eliminates the most common source of missing tags: engineers forgetting per-resource blocks. - Tag coverage below 80 % makes cost allocation guesswork; treat it as a P2 incident with an owner and an SLA.
- Measure coverage weekly from CUR, not from AWS Config alone — CUR reflects what is actually being billed, which is the only number that matters.