Skip to main content
Kubernetes Pragmatic

Cost Visibility

Ravinder··6 min read
KubernetesCloud NativeDevOpsFinOpsOpenCostCost Optimization
Share:
Cost Visibility

A cluster without cost visibility is a cluster where engineering teams request whatever resources they want, the cloud bill arrives at the end of the month, and finance asks a question that nobody can answer: which team owns that $40,000 line item?

The default Kubernetes experience gives you a total cluster bill from your cloud provider. It tells you nothing about which namespace, which workload, or which team drove that cost. Cloud Cost Explorer is not much better — it shows you EC2 instance costs without any mapping to the Kubernetes workloads running on top of them.

Cost visibility in Kubernetes requires deliberate setup. Here is what that looks like in practice.

The Core Problem: Shared Nodes

The reason Kubernetes cost attribution is hard is that multiple workloads share the same nodes. A node running 15 pods from 6 different namespaces has one cloud bill line item — the EC2 instance — but the cost should be distributed across all 15 workloads proportionally.

graph TD NODE["m6i.2xlarge node — $0.384/hr"] --> P1["payments-api: 1 CPU, 2Gi"] NODE --> P2["orders-api: 500m CPU, 1Gi"] NODE --> P3["notifications: 200m CPU, 512Mi"] NODE --> P4["audit-logger: 100m CPU, 256Mi"] NODE --> P5["metrics-agent: 100m CPU, 128Mi"] NODE --> P6["...10 more pods"] P1 --> C1["Cost share: ~$0.11/hr"] P2 --> C2["Cost share: ~$0.055/hr"] P3 --> C3["Cost share: ~$0.022/hr"]

OpenCost and Kubecost both solve this problem the same way: they model cost by allocating node cost proportionally to pod resource requests, then aggregate by namespace, label, or annotation. The difference is tooling and commercial support.

OpenCost: The Open Source Option

OpenCost is a CNCF project (sandbox level) that runs as a pod in your cluster and exposes cost allocation via a REST API and a basic UI. For most teams, it is the right starting point.

# Install OpenCost
kubectl create namespace opencost
kubectl apply --namespace opencost \
  -f https://raw.githubusercontent.com/opencost/opencost/develop/kubernetes/opencost.yaml
 
# Expose the UI locally
kubectl port-forward -n opencost svc/opencost 9090:9090

OpenCost uses cloud provider pricing APIs to get current on-demand and spot prices for node instance types. It then allocates those costs to workloads based on CPU and memory requests.

The REST API is useful for building cost reports in your own tooling:

# Cost allocation for the last 7 days, aggregated by namespace
curl -G http://localhost:9090/model/allocation \
  --data-urlencode "window=7d" \
  --data-urlencode "aggregate=namespace" \
  --data-urlencode "step=1d" | jq '.data[] | to_entries[] | {namespace: .key, totalCost: .value.totalCost}'

Label and Tag Discipline: The Foundation

Cost visibility is only as good as your labeling strategy. OpenCost and Kubecost can aggregate by any label on your pods. Without consistent labels, you get namespace-level attribution at best.

Define the label schema once, enforce it with an admission controller, and never deviate.

# Required labels on every workload
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
  namespace: payments
  labels:
    # Billing dimensions — every workload must have these
    team: "payments"               # Owning team
    product: "payments-platform"  # Product line
    environment: "production"     # Environment tier
    cost-center: "eng-payments"   # Finance cost center code
spec:
  template:
    metadata:
      labels:
        app: payments-api
        team: "payments"
        product: "payments-platform"
        environment: "production"
        cost-center: "eng-payments"

Enforce these labels with Kyverno:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-cost-labels
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-required-labels
      match:
        any:
          - resources:
              kinds: ["Deployment", "StatefulSet", "DaemonSet"]
              namespaces: ["*"]
      exclude:
        any:
          - resources:
              namespaces: ["kube-system", "kube-public", "monitoring", "opencost"]
      validate:
        message: "Workloads must have team, product, environment, and cost-center labels"
        pattern:
          metadata:
            labels:
              team: "?*"
              product: "?*"
              environment: "?*"
              cost-center: "?*"

This policy blocks any Deployment, StatefulSet, or DaemonSet that is missing the required labels from being applied to the cluster. Non-compliant workloads get an admission error with a clear message.

Showback: The Report That Changes Behavior

The goal of cost visibility is not the dashboard — it is the behavioral change that comes from showing teams their actual spend. Showback is internal billing: you report costs to teams without actually charging back to department budgets. Chargeback (actual budget transfers) is the next step and usually requires organizational alignment that takes longer to achieve.

graph LR OC[OpenCost API] --> ETL[Cost ETL Job - weekly] ETL --> DS[Data Store - S3 / BigQuery] DS --> RPT[Weekly Report - Slack / Email] RPT --> T1[payments-team: $3,420/wk ↑12%] RPT --> T2[orders-team: $1,890/wk ↓5%] RPT --> T3[platform-team: $2,100/wk →0%] RPT --> T4[unattributed: $890/wk ⚠️]

The unattributed line — workloads without cost labels — is the most important number. It is the measure of how much of your cluster cost you cannot explain. Drive it toward zero.

Right-Sizing: Where the Money Actually Is

Cost visibility reveals right-sizing opportunities. Most clusters have significant over-provisioning because teams set resource requests defensively and never revisit them.

# Find pods with CPU request >> CPU usage (candidates for right-sizing)
kubectl top pod -A --sort-by=cpu | head -30
 
# More useful: VPA recommendations for current allocations
kubectl get vpa -A -o json | jq -r '
  .items[] |
  {
    name: .metadata.name,
    namespace: .metadata.namespace,
    cpu_recommended: .status.recommendation.containerRecommendations[0].target.cpu,
    memory_recommended: .status.recommendation.containerRecommendations[0].target.memory
  }
'

If a pod requests 2 CPU and VPA recommends 200m, it is requesting 10x what it uses. On a node with 8 CPU, you are effectively wasting capacity that could run 9 more replicas of that pod or reduce your node count.

Namespace Quotas: The Governance Layer

Once you have visibility, add governance. ResourceQuotas prevent any single namespace from consuming unlimited cluster capacity.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: payments-team-quota
  namespace: payments
spec:
  hard:
    # Compute
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    # Workload counts
    count/deployments.apps: "20"
    count/pods: "100"
    # Storage
    persistentvolumeclaims: "10"
    requests.storage: "500Gi"

When a team hits their quota, they get a clear error from the Kubernetes API. This is a forcing function for the conversation about whether they need more capacity or whether they are over-provisioned and need to right-size first.

Key Takeaways

  • Kubernetes cost attribution requires explicit tooling because the cloud bill shows node costs, not workload costs. OpenCost and Kubecost both solve this by allocating node cost proportionally to pod resource requests.
  • Label discipline is the foundation of meaningful cost visibility. Define a required label schema (team, product, environment, cost-center) and enforce it with an admission controller before workloads proliferate.
  • Showback — reporting costs to teams without charging back — is the behavioral lever. The unattributed cost line is the most important number; drive it toward zero.
  • VPA recommendations in Off mode are the fastest way to identify over-provisioned workloads. A pod requesting 10x its actual usage is a right-sizing ticket.
  • ResourceQuotas per namespace add governance: teams get clear errors when they exceed their allocation, which starts the right conversation about capacity versus over-provisioning.
  • The cloud bill arriving at month-end with no workload attribution is a solved problem. The cost is installing OpenCost and enforcing labels. The return is visibility that actually drives engineering behavior.
Share: