Kubernetes Cost Visibility
Series
Cloud Cost EngineeringKubernetes is an excellent way to lose track of where your money goes. Nodes are shared. Pods are ephemeral. The AWS bill shows EC2 instance cost — not which application consumed it. Without an attribution layer, a $200k/month EKS bill is a single line item that nobody owns. The fix is cost allocation at the cluster level, not the cloud-account level.
The Allocation Problem
The AWS bill knows the node cost. OpenCost knows the pod cost. Joining them gives you namespace-level allocation.
OpenCost: Open Standard Cost Attribution
OpenCost is a CNCF project that implements the OpenCost specification — a vendor-neutral model for Kubernetes cost allocation. It runs as a deployment in your cluster and exposes cost metrics via Prometheus.
# OpenCost installation via Helm
# helm repo add opencost https://opencost.github.io/opencost-helm-chart
apiVersion: v1
kind: Namespace
metadata:
name: opencost
---
# values-opencost.yaml
opencost:
exporter:
defaultClusterId: "eks-prod-us-east-1"
cloudProviderApiKey: "" # optional for enhanced cloud pricing
prometheus:
internal:
enabled: false # use existing Prometheus
external:
enabled: true
url: "http://prometheus-operated.monitoring.svc.cluster.local:9090"
ui:
enabled: true
resources:
requests:
cpu: 10m
memory: 55Mi
metrics:
serviceMonitor:
enabled: true
namespace: monitoringhelm install opencost opencost/opencost \
--namespace opencost \
--create-namespace \
--values values-opencost.yamlQuerying OpenCost via the API
OpenCost exposes a REST API. Query it from your cost reporting pipeline:
import requests
from datetime import datetime, timedelta
import json
OPENCOST_URL = "http://opencost.opencost.svc.cluster.local:9003"
def get_namespace_costs(window_days=30):
"""
Returns cost breakdown by namespace for the given window.
"""
end = datetime.utcnow()
start = end - timedelta(days=window_days)
params = {
"window": f"{start.strftime('%Y-%m-%dT%H:%M:%SZ')},{end.strftime('%Y-%m-%dT%H:%M:%SZ')}",
"aggregate": "namespace",
"accumulate": "true",
}
resp = requests.get(f"{OPENCOST_URL}/allocation", params=params, timeout=30)
resp.raise_for_status()
data = resp.json()["data"][0]
results = []
for namespace, alloc in data.items():
results.append({
"namespace": namespace,
"cpu_cost": round(alloc.get("cpuCost", 0), 2),
"memory_cost": round(alloc.get("ramCost", 0), 2),
"storage_cost": round(alloc.get("pvCost", 0), 2),
"network_cost": round(alloc.get("networkCost", 0), 2),
"total_cost": round(alloc.get("totalCost", 0), 2),
"efficiency": round(alloc.get("totalEfficiency", 0) * 100, 1),
})
return sorted(results, key=lambda x: x["total_cost"], reverse=True)
if __name__ == "__main__":
for ns in get_namespace_costs():
print(f"{ns['namespace']:30s} ${ns['total_cost']:8.2f} efficiency={ns['efficiency']}%")Prometheus Alerts for Namespace Overspend
Wire OpenCost metrics into your alerting stack:
# PrometheusRule for namespace cost alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: opencost-namespace-budget
namespace: monitoring
spec:
groups:
- name: namespace-cost-budget
interval: 1h
rules:
# Alert if a namespace exceeds its monthly budget
- alert: NamespaceCostBudgetExceeded
expr: |
sum by (namespace) (
opencost_allocation_cost_total{window="30d"}
) > on(namespace) group_left()
kube_namespace_labels * 0 # placeholder — join with budget ConfigMap
for: 1h
labels:
severity: warning
annotations:
summary: "Namespace {{ $labels.namespace }} has exceeded budget"
description: "Current cost: ${{ $value | humanize }}"
# Alert on high idle cost ratio
- alert: HighIdleCostRatio
expr: |
sum by (namespace) (opencost_allocation_idle_cost_total)
/ sum by (namespace) (opencost_allocation_cost_total) > 0.40
for: 6h
labels:
severity: warning
annotations:
summary: "Namespace {{ $labels.namespace }} has >40% idle cost"Kubecost: Enhanced Attribution with Business Context
Kubecost extends OpenCost with team/department attribution, anomaly detection, and Savings Plans/RI coverage visibility. For multi-cluster environments it is the practical choice.
Key Kubecost features beyond OpenCost:
- Cluster-level aggregation across multiple EKS clusters
- Network cost breakdown by pod pair
- Container request/limit rightsizing recommendations
- Savings Plans and RI allocation to namespaces
# values-kubecost.yaml (key settings)
kubecostProductConfigs:
clusterName: "eks-prod-us-east-1"
currencyCode: "USD"
labelMappingConfigs:
enabled: true
owner_label: "team" # maps k8s label 'team' to Kubecost owner
product_label: "product"
environment_label: "env"
global:
prometheus:
fqdn: "http://prometheus-operated.monitoring.svc.cluster.local:9090"
enabled: false
networkCosts:
enabled: true
podMonitor:
enabled: trueCost Efficiency Benchmarks
Use efficiency as a signal, not just raw cost:
Efficiency = (actual CPU + memory usage) / (requested CPU + memory). Below 40 % efficiency means requests are over-provisioned relative to actual workload — a signal to reduce resource requests and allow the cluster autoscaler to schedule tighter.
Key Takeaways
- The AWS bill alone cannot tell you which application in a shared cluster owns which dollar; an in-cluster attribution layer like OpenCost is the minimum viable cost visibility for Kubernetes.
- OpenCost is CNCF-graduated, vendor-neutral, and free; deploy it in every cluster before you invest in commercial tooling.
- Namespace efficiency below 40 % indicates over-provisioned resource requests — fix this before purchasing more nodes or Reserved Instances.
- Prometheus PrometheusRules can wire OpenCost metrics directly into your alerting stack without any additional tooling; namespace budget alerts should be standard in every engineering team's runbook.
- Label-to-team mapping in Kubecost is the Kubernetes equivalent of tag-based cost allocation; establish the mapping at cluster provisioning time, not retroactively.
- Network cost breakdown by pod pair reveals cross-AZ traffic at the application level — the data needed to make topology-aware scheduling decisions concrete.