Spot, Savings Plans, RIs
Series
Cloud Cost EngineeringAWS gives you four ways to pay for compute: On-Demand, Spot, Savings Plans, and Reserved Instances. Most teams use one — On-Demand — and overpay by 40–70 % for the privilege of flexibility they do not need. The correct approach is a layered coverage strategy where each pricing model covers a different band of workload characteristics.
The Pricing Model Hierarchy
The target: Spot covers batch and stateless burst. Savings Plans cover the predictable baseline. On-Demand covers peak and anything not yet characterized. Reserved Instances are increasingly narrow in scope — use them only when you need specific instance-type guarantees (e.g., bare metal, specific tenancy).
Spot Instances: Workloads That Qualify
Spot works when interruption is tolerable. Interruption typically occurs with 2 minutes notice.
| Workload | Spot-suitable? | Strategy |
|---|---|---|
| EKS stateless pods | Yes | Karpenter with multi-family spot pools |
| Batch / EMR / Glue | Yes | Spot with On-Demand master |
| CI/CD runners | Yes | Spot with EBS-backed state |
| WebSocket / stateful sessions | No | On-Demand or Reserved |
| Databases | No | RDS Reserved Instances |
| Leader-elected services | Partial | Spot workers, On-Demand leader |
Karpenter makes Spot operationally simple. Configure multiple instance families to minimize interruption probability:
# Karpenter NodePool — Spot with fallback to On-Demand
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot-general
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"] # spot preferred, on-demand fallback
- key: kubernetes.io/arch
operator: In
values: ["arm64", "amd64"] # Graviton preferred
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["m7g", "m7i", "m6g", "m6i", "c7g", "c7i"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["xlarge", "2xlarge", "4xlarge"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s
limits:
cpu: "1000"
memory: 4000GiSavings Plans: The Math
Compute Savings Plans apply to EC2, Fargate, and Lambda. You commit to a dollar-per-hour spend, not a specific instance type.
Example: a production fleet averaging $8,000/month in EC2 On-Demand.
def savings_plan_analysis(
monthly_on_demand: float,
commitment_pct: float = 0.70, # commit 70% of baseline
discount_1yr: float = 0.33, # ~33% off with 1-year no-upfront
discount_3yr: float = 0.50, # ~50% off with 3-year no-upfront
):
"""
Simple Savings Plan sizing model.
Commit to predictable baseline, leave headroom for On-Demand burst.
"""
annual_on_demand = monthly_on_demand * 12
committed_monthly = monthly_on_demand * commitment_pct
hourly_commitment = committed_monthly / 730 # hours/month
cost_1yr = (committed_monthly * (1 - discount_1yr) * 12
+ (monthly_on_demand - committed_monthly) * 12)
cost_3yr = (committed_monthly * (1 - discount_3yr) * 12 * 3
+ (monthly_on_demand - committed_monthly) * 12 * 3)
saving_1yr = annual_on_demand - cost_1yr
saving_3yr_annual = (annual_on_demand * 3 - cost_3yr) / 3
print(f"On-Demand baseline: ${monthly_on_demand:,.0f}/mo ${annual_on_demand:,.0f}/yr")
print(f"Hourly commitment: ${hourly_commitment:.2f}/hr")
print(f"1-yr Savings Plan saving: ${saving_1yr:,.0f}/yr")
print(f"3-yr Savings Plan saving: ${saving_3yr_annual:,.0f}/yr (annualized)")
return {"hourly_commitment": hourly_commitment, "saving_1yr": saving_1yr}
savings_plan_analysis(monthly_on_demand=8000)
# Output:
# On-Demand baseline: $8,000/mo $96,000/yr
# Hourly commitment: $7.67/hr
# 1-yr Savings Plan saving: $21,912/yr
# 3-yr Savings Plan saving: $33,600/yr (annualized)Do not commit 100 % of baseline. Commit 70–80 % and let On-Demand cover variance. Overcommitment is worse than undercommitment — unused commitment is still charged.
Reserved Instances: When They Still Make Sense
RIs outperform Savings Plans in specific cases:
- RDS databases — Savings Plans do not cover RDS. RDS RIs save 40–60 %.
- Specific instance-type requirement — If you need
r6i.32xlargeand will always need exactly that, an RI is more predictable than a CSP. - Convertible RIs — When you want commitment flexibility without hourly-dollar commitment.
def ri_payback_analysis(
on_demand_hourly: float,
ri_upfront: float,
ri_hourly: float,
term_years: int = 1,
):
hours = term_years * 8760
on_demand_total = on_demand_hourly * hours
ri_total = ri_upfront + (ri_hourly * hours)
saving = on_demand_total - ri_total
payback_months = ri_upfront / ((on_demand_hourly - ri_hourly) * 730) if ri_upfront > 0 else 0
print(f"On-Demand {term_years}yr: ${on_demand_total:,.0f}")
print(f"RI {term_years}yr total: ${ri_total:,.0f}")
print(f"Net saving: ${saving:,.0f} ({saving/on_demand_total*100:.0f}%)")
if payback_months:
print(f"Upfront payback: {payback_months:.1f} months")
# RDS db.r6g.2xlarge: On-Demand $0.96/hr, 1yr partial-upfront RI
ri_payback_analysis(on_demand_hourly=0.96, ri_upfront=1500, ri_hourly=0.506)Coverage Target Framework
Optimal coverage rates by workload maturity:
| Maturity | Spot % | CSP Coverage | On-Demand % |
|---|---|---|---|
| Early / unpredictable | 0 | 0 | 100 |
| Growing, 3 months data | 10 | 30 | 60 |
| Stable, 6+ months data | 25 | 55 | 20 |
| Optimized | 35 | 55 | 10 |
CUR Query: Current Coverage
SELECT
line_item_line_item_type,
ROUND(SUM(line_item_unblended_cost), 0) AS cost_usd,
ROUND(
100.0 * SUM(line_item_unblended_cost)
/ SUM(SUM(line_item_unblended_cost)) OVER ()
, 1) AS pct_of_compute
FROM cur_db.cur_table
WHERE line_item_product_code = 'AmazonEC2'
AND line_item_usage_type LIKE '%BoxUsage%'
AND line_item_usage_start_date >= DATE_TRUNC('month', CURRENT_DATE)
GROUP BY 1
ORDER BY cost_usd DESC;If OnDemand is above 60 %, you have an immediate Savings Plan opportunity. If SavingsPlanCoveredUsage approaches 100 % of your baseline, stop buying — you are at risk of overcommitment.
Key Takeaways
- Spot, Savings Plans, and On-Demand are complementary layers — applying all three to appropriate workload bands is how mature teams achieve 50–65 % effective discounts on compute.
- Commit 70–80 % of your stable baseline to Savings Plans, not 100 % — over-commitment means paying for compute you are not using, which is worse than staying On-Demand.
- Karpenter with multi-family Spot pools reduces interruption probability below 5 % for most regions; this makes Spot operationally viable for stateless production workloads.
- RDS Reserved Instances remain the best way to discount database spend — Savings Plans do not apply to RDS, so this is not optional if databases represent significant spend.
- Use the CUR
lineItemLineItemTypefield weekly to track coverage trends; the goal is to watch theOnDemandpercentage decrease each month as commitments are added. - Never purchase a Savings Plan or RI during the first 60 days of a new workload; wait until utilization patterns stabilize before committing.