Cloud Cost Engineering

Compute: Rightsizing and Graviton

Ravinder·January 29, 2026·5 min read

Cloud CostFinOpsAWSEC2GravitonRightsizing

Series

Cloud Cost Engineering

Part 3 of 10

← Part 2

Tags, Tags, Tags

Part 4 →

Storage Tiers

Compute is typically 40–60 % of an AWS bill. It is also the easiest category to overprovision because engineers default to "same as prod" for every environment and "one size up" when anything is slow. The result is a fleet where average CPU utilization sits at 8 % and the bill reflects 100 % reservation.

Finding Oversized Instances

The signal is simple: CPU utilization over a 14-day window. Anything averaging below 20 % and peaking below 50 % is a candidate for downsizing.

import boto3
from datetime import datetime, timedelta, timezone
 
def get_low_utilization_instances(threshold_avg=20, threshold_max=50):
    ec2 = boto3.client('ec2')
    cw  = boto3.client('cloudwatch')
 
    paginator = ec2.get_paginator('describe_instances')
    instances = []
    for page in paginator.paginate(
        Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
    ):
        for reservation in page['Reservations']:
            for inst in reservation['Instances']:
                instances.append({
                    'id':   inst['InstanceId'],
                    'type': inst['InstanceType'],
                    'name': next((t['Value'] for t in inst.get('Tags', [])
                                  if t['Key'] == 'Name'), 'unnamed'),
                })
 
    end   = datetime.now(timezone.utc)
    start = end - timedelta(days=14)
    candidates = []
 
    for inst in instances:
        metrics = cw.get_metric_statistics(
            Namespace='AWS/EC2',
            MetricName='CPUUtilization',
            Dimensions=[{'Name': 'InstanceId', 'Value': inst['id']}],
            StartTime=start, EndTime=end,
            Period=86400, Statistics=['Average', 'Maximum'],
        )
        if not metrics['Datapoints']:
            continue
        avg_cpu = sum(d['Average'] for d in metrics['Datapoints']) / len(metrics['Datapoints'])
        max_cpu = max(d['Maximum'] for d in metrics['Datapoints'])
 
        if avg_cpu < threshold_avg and max_cpu < threshold_max:
            candidates.append({**inst, 'avg_cpu': round(avg_cpu, 1), 'max_cpu': round(max_cpu, 1)})
 
    return sorted(candidates, key=lambda x: x['avg_cpu'])
 
if __name__ == '__main__':
    for c in get_low_utilization_instances():
        print(f"{c['id']:20s}  {c['type']:15s}  avg={c['avg_cpu']:5.1f}%  max={c['max_cpu']:5.1f}%  ({c['name']})")

Run this across all accounts. The output is your rightsizing backlog.

Instance Family Decision Tree

flowchart TD A[Instance in scope] --> B{Workload type?} B -->|General purpose| C{Memory pressure?} B -->|Memory intensive| D[r7g / r8g family] B -->|Compute intensive| E[c7g / c8g family] B -->|GPU / ML| F[Stay x86 or use Inferentia] C -->|No| G[m7g / m8g family] C -->|Yes| D G --> H{avg CPU < 20%?} H -->|Yes| I[Downsize within family] H -->|No| J{Burstable OK?} J -->|Yes| K[t4g — spot candidate] J -->|No| G D --> L[Validate with load test] E --> L G --> L I --> L

The Graviton Math

Graviton3 (m7g) is ~10 % cheaper than equivalent x86 (m6i) at On-Demand price. With Compute Savings Plan on top, the effective discount reaches 35–40 % against On-Demand x86.

Instance	vCPU	RAM	On-Demand/hr	vs m6i.xlarge
m6i.xlarge (x86)	4	16 GiB	$0.192	baseline
m7g.xlarge (Graviton3)	4	16 GiB	$0.1632	−15 %
m7g.xlarge + 1yr no-upfront CSP	4	16 GiB	~$0.103	−46 %

For a fleet of 100 m6i.xlarge running continuously:

x86 baseline:  100 × $0.192 × 8760 = $168,192/yr
Graviton3 CSP: 100 × $0.103 × 8760 = $90,228/yr
Annual saving:  $77,964  (~46%)

That is not a rounding error. That is a hiring decision.

Migration Playbook

A safe Graviton migration follows four stages.

sequenceDiagram participant Dev as Dev Environment participant CI as CI Pipeline participant Staging as Staging (Graviton) participant Prod as Production Dev->>CI: Build multi-arch Docker image (amd64 + arm64) CI->>CI: Run test suite on both architectures CI->>Staging: Deploy arm64 image to Graviton staging Staging->>Staging: 7-day soak — latency, error rate, memory Staging->>Prod: Canary 5% traffic on Graviton nodes Prod->>Prod: Monitor 24h — auto-rollback on error spike Prod->>Prod: Promote to 100% Graviton

Build the multi-arch image in CI. The Dockerfile changes nothing — the build pipeline adds the platform flag:

# Dockerfile (unchanged — works on both architectures)
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

# GitHub Actions — multi-arch build
- name: Build and push multi-arch image
  uses: docker/build-push-action@v5
  with:
    platforms: linux/amd64,linux/arm64
    push: true
    tags: ${{ env.IMAGE_URI }}:${{ github.sha }}

Terraform: EKS Node Group with Graviton

resource "aws_eks_node_group" "graviton" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "graviton-general"
  node_role_arn   = aws_iam_role.node.arn
  subnet_ids      = var.private_subnet_ids
 
  ami_type       = "AL2023_ARM_64_STANDARD"
  instance_types = ["m7g.xlarge", "m7g.2xlarge"]  # multi-type for Spot fallback
 
  scaling_config {
    desired_size = 3
    min_size     = 1
    max_size     = 20
  }
 
  labels = {
    "kubernetes.io/arch" = "arm64"
    "node.kubernetes.io/instance-family" = "graviton"
  }
 
  tags = {
    managed-by = "terraform"
    team       = var.team
    env        = var.environment
  }
}

Common Migration Blockers

Blocker	Resolution
Native x86 binaries in Docker image	Rebuild from source; remove pre-built amd64 wheels
JVM running 32-bit mode	Pass `-XX:+UseCompressedOops` (on by default in modern JDKs)
Node.js native addons	`npm rebuild` on arm64 base image
Python C extensions	Use multi-arch wheels from PyPI or build in CI
Lambda functions	Change architecture to `arm64` in function config — no rebuild needed for interpreted runtimes

Key Takeaways

Average CPU below 20 % over 14 days is a reliable rightsizing signal; add memory utilization from CloudWatch agent for a complete picture.
Graviton3 is 10–15 % cheaper at On-Demand and 35–46 % cheaper when combined with Compute Savings Plans — this is the single largest levers on the compute line.
Multi-arch Docker images are the prerequisite; build them in CI before touching any production infrastructure.
Canary deployment with automatic rollback eliminates the migration risk that teams fear; the technical risk of Graviton is near zero for containerized workloads.
Lambda arm64 is the easiest Graviton win — one config change, no Dockerfile, immediate 20 % cost reduction.
Rightsizing and Graviton are independent optimizations that compound; apply both and the savings are multiplicative, not additive.

Series

Cloud Cost Engineering

Part 3 of 10

← Part 2

Tags, Tags, Tags

Part 4 →

Storage Tiers