Skip to main content
Cloud Cost Engineering

Kubernetes Cost Visibility

Ravinder··5 min read
Cloud CostFinOpsAWSKubernetesOpenCostKubecost
Share:
Kubernetes Cost Visibility

Kubernetes is an excellent way to lose track of where your money goes. Nodes are shared. Pods are ephemeral. The AWS bill shows EC2 instance cost — not which application consumed it. Without an attribution layer, a $200k/month EKS bill is a single line item that nobody owns. The fix is cost allocation at the cluster level, not the cloud-account level.

The Allocation Problem

flowchart TD subgraph AWS["AWS Bill"] EC2[EC2 Node Cost] LB[Load Balancer Cost] EBS[EBS Volume Cost] DT[Data Transfer Cost] end subgraph EKS["EKS Cluster"] N1[Node 1 - 8 vCPU, 32 GiB] N2[Node 2 - 8 vCPU, 32 GiB] N3[Node 3 - 8 vCPU, 32 GiB] subgraph NS1["namespace: payments"] P1[payment-api - 2 vCPU, 4 GiB] P2[worker - 1 vCPU, 2 GiB] end subgraph NS2["namespace: analytics"] P3[spark-driver - 4 vCPU, 16 GiB] end subgraph NS3["namespace: platform"] P4[prometheus - 1 vCPU, 8 GiB] end end EC2 -->|no native mapping| EKS N1 --> NS1 N2 --> NS2 N3 --> NS3

The AWS bill knows the node cost. OpenCost knows the pod cost. Joining them gives you namespace-level allocation.

OpenCost: Open Standard Cost Attribution

OpenCost is a CNCF project that implements the OpenCost specification — a vendor-neutral model for Kubernetes cost allocation. It runs as a deployment in your cluster and exposes cost metrics via Prometheus.

# OpenCost installation via Helm
# helm repo add opencost https://opencost.github.io/opencost-helm-chart
apiVersion: v1
kind: Namespace
metadata:
  name: opencost
---
# values-opencost.yaml
opencost:
  exporter:
    defaultClusterId: "eks-prod-us-east-1"
    cloudProviderApiKey: ""  # optional for enhanced cloud pricing
 
  prometheus:
    internal:
      enabled: false  # use existing Prometheus
    external:
      enabled: true
      url: "http://prometheus-operated.monitoring.svc.cluster.local:9090"
 
  ui:
    enabled: true
    resources:
      requests:
        cpu: 10m
        memory: 55Mi
 
  metrics:
    serviceMonitor:
      enabled: true
      namespace: monitoring
helm install opencost opencost/opencost \
  --namespace opencost \
  --create-namespace \
  --values values-opencost.yaml

Querying OpenCost via the API

OpenCost exposes a REST API. Query it from your cost reporting pipeline:

import requests
from datetime import datetime, timedelta
import json
 
OPENCOST_URL = "http://opencost.opencost.svc.cluster.local:9003"
 
def get_namespace_costs(window_days=30):
    """
    Returns cost breakdown by namespace for the given window.
    """
    end   = datetime.utcnow()
    start = end - timedelta(days=window_days)
 
    params = {
        "window":     f"{start.strftime('%Y-%m-%dT%H:%M:%SZ')},{end.strftime('%Y-%m-%dT%H:%M:%SZ')}",
        "aggregate":  "namespace",
        "accumulate": "true",
    }
 
    resp = requests.get(f"{OPENCOST_URL}/allocation", params=params, timeout=30)
    resp.raise_for_status()
    data = resp.json()["data"][0]
 
    results = []
    for namespace, alloc in data.items():
        results.append({
            "namespace":    namespace,
            "cpu_cost":     round(alloc.get("cpuCost", 0), 2),
            "memory_cost":  round(alloc.get("ramCost", 0), 2),
            "storage_cost": round(alloc.get("pvCost", 0), 2),
            "network_cost": round(alloc.get("networkCost", 0), 2),
            "total_cost":   round(alloc.get("totalCost", 0), 2),
            "efficiency":   round(alloc.get("totalEfficiency", 0) * 100, 1),
        })
 
    return sorted(results, key=lambda x: x["total_cost"], reverse=True)
 
if __name__ == "__main__":
    for ns in get_namespace_costs():
        print(f"{ns['namespace']:30s}  ${ns['total_cost']:8.2f}  efficiency={ns['efficiency']}%")

Prometheus Alerts for Namespace Overspend

Wire OpenCost metrics into your alerting stack:

# PrometheusRule for namespace cost alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: opencost-namespace-budget
  namespace: monitoring
spec:
  groups:
    - name: namespace-cost-budget
      interval: 1h
      rules:
        # Alert if a namespace exceeds its monthly budget
        - alert: NamespaceCostBudgetExceeded
          expr: |
            sum by (namespace) (
              opencost_allocation_cost_total{window="30d"}
            ) > on(namespace) group_left()
            kube_namespace_labels * 0  # placeholder — join with budget ConfigMap
          for: 1h
          labels:
            severity: warning
          annotations:
            summary: "Namespace {{ $labels.namespace }} has exceeded budget"
            description: "Current cost: ${{ $value | humanize }}"
 
        # Alert on high idle cost ratio
        - alert: HighIdleCostRatio
          expr: |
            sum by (namespace) (opencost_allocation_idle_cost_total)
            / sum by (namespace) (opencost_allocation_cost_total) > 0.40
          for: 6h
          labels:
            severity: warning
          annotations:
            summary: "Namespace {{ $labels.namespace }} has >40% idle cost"

Kubecost: Enhanced Attribution with Business Context

Kubecost extends OpenCost with team/department attribution, anomaly detection, and Savings Plans/RI coverage visibility. For multi-cluster environments it is the practical choice.

Key Kubecost features beyond OpenCost:

  • Cluster-level aggregation across multiple EKS clusters
  • Network cost breakdown by pod pair
  • Container request/limit rightsizing recommendations
  • Savings Plans and RI allocation to namespaces
# values-kubecost.yaml (key settings)
kubecostProductConfigs:
  clusterName: "eks-prod-us-east-1"
  currencyCode: "USD"
  labelMappingConfigs:
    enabled: true
    owner_label: "team"          # maps k8s label 'team' to Kubecost owner
    product_label: "product"
    environment_label: "env"
 
global:
  prometheus:
    fqdn: "http://prometheus-operated.monitoring.svc.cluster.local:9090"
    enabled: false
 
networkCosts:
  enabled: true
  podMonitor:
    enabled: true

Cost Efficiency Benchmarks

Use efficiency as a signal, not just raw cost:

quadrantChart title Namespace Cost vs Efficiency x-axis Low Cost --> High Cost y-axis Low Efficiency --> High Efficiency quadrant-1 Optimize Requests quadrant-2 Well Tuned quadrant-3 Investigate and Shrink quadrant-4 Rightsize Aggressively payments: [0.3, 0.75] analytics: [0.8, 0.4] platform: [0.4, 0.8] staging: [0.2, 0.2] feature-flags: [0.1, 0.6]

Efficiency = (actual CPU + memory usage) / (requested CPU + memory). Below 40 % efficiency means requests are over-provisioned relative to actual workload — a signal to reduce resource requests and allow the cluster autoscaler to schedule tighter.

Key Takeaways

  • The AWS bill alone cannot tell you which application in a shared cluster owns which dollar; an in-cluster attribution layer like OpenCost is the minimum viable cost visibility for Kubernetes.
  • OpenCost is CNCF-graduated, vendor-neutral, and free; deploy it in every cluster before you invest in commercial tooling.
  • Namespace efficiency below 40 % indicates over-provisioned resource requests — fix this before purchasing more nodes or Reserved Instances.
  • Prometheus PrometheusRules can wire OpenCost metrics directly into your alerting stack without any additional tooling; namespace budget alerts should be standard in every engineering team's runbook.
  • Label-to-team mapping in Kubecost is the Kubernetes equivalent of tag-based cost allocation; establish the mapping at cluster provisioning time, not retroactively.
  • Network cost breakdown by pod pair reveals cross-AZ traffic at the application level — the data needed to make topology-aware scheduling decisions concrete.
Share: