System Design Interviews

Capacity Estimation That Holds Up

Ravinder·April 8, 2025·6 min read

System DesignInterviewsArchitectureCapacity Planning

Series

System Design Interviews, Real

Part 2 of 10

← Part 1

Reading the Prompt: What's Actually Being Asked

Part 3 →

The Five-Box Drawing and What Each Box Hides

Capacity estimation has a bad reputation. Candidates treat it as an arithmetic ritual — write big numbers, divide by other big numbers, nod sagely, move on. Interviewers see through this immediately. Real estimation is about building intuition, catching contradictions, and making numbers inform architecture. Get it right and your subsequent design choices look purposeful. Get it wrong and everything downstream is unmoored.

Why Numbers Shape Architecture

A system handling 1,000 requests per second is a different problem than one handling 100,000. At 1K RPS, a single well-tuned application server probably handles it. At 100K RPS, you are making choices about horizontal sharding, connection pooling, cache topology, and write amplification before you've drawn a single box. The number is not trivia — it is the architecture's first constraint.

The same applies to storage. An event log at 1KB per event at 10 million events per day is 10GB/day — around 3.6TB/year. That fits on a single NVMe drive. At 100 million events per day, you're at 36TB/year. Now you're talking distributed storage, tiered archival, and compaction strategies. Same design prompt, completely different systems.

The Numbers You Must Memorize

A short table of latency and throughput anchors that interviewers expect you to know:

Operation	Approximate Latency
L1 cache reference	~1 ns
Main memory read	~100 ns
SSD random read	~100 µs
HDD seek	~10 ms
Network round-trip (same DC)	~0.5 ms
Network round-trip (cross-continent)	~150 ms

Unit	Value
1 million seconds	~11.6 days
1 billion requests/day	~11,600 RPS
1 TB	10^12 bytes
Average web request	1–10 KB
Average image	100 KB–1 MB
HD video frame	~1 MB

Internalizing these means you can sanity-check on the fly rather than pausing to calculate from scratch.

Say you're designing a ride-sharing platform with 10 million daily active riders and a 20% peak-hour multiplier.

Step 1 — Daily requests: Assume each rider makes ~2 app interactions per trip and ~1 trip every 3 days on average. That's 10M × (1/3) trips/day = 3.3M trips. Each trip generates location pings, dispatch calls, status updates — call it 20 backend events per trip. That's 66M events/day.

Step 2 — QPS: 66M / 86,400 seconds ≈ 764 QPS average. With a 20% peak multiplier and a sharp 2-hour peak: multiply by ~3× (since traffic concentrates), giving ~2,300 QPS at peak.

Step 3 — Storage: Location pings at 200 bytes each, 66M per day = ~13 GB/day of raw event data. Retention for 90 days = ~1.17 TB. Manageable on a moderate cluster, but you'll want time-series partitioning.

flowchart TD A[DAU: 10M] --> B[Trips/day: 3.3M] B --> C[Events/trip: 20] C --> D[Total events/day: 66M] D --> E[Avg QPS: ~764] E --> F[Peak QPS ~2300 - 3x multiplier] D --> G[Storage: 200B/event] G --> H[Daily: ~13GB] H --> I[90-day: ~1.17TB]

Common Errors and How to Catch Them

Error 1 — Forgetting the replication factor: You calculate 10 TB of raw storage. But with 3x replication that's 30 TB. And if you're keeping 7-day backups, add another multiplier. Always state your replication assumption.

Error 2 — Confusing RPS and concurrent connections: 10,000 RPS does not mean 10,000 concurrent connections. If average request latency is 50ms, concurrent connections ≈ RPS × latency = 10,000 × 0.05 = 500 concurrent. This matters for connection pool sizing.

Error 3 — Ignoring write amplification: If your write path touches a primary DB, two replicas, a CDC stream, and a cache invalidation — your "1 write" is really 5 I/O operations. Factor this into server and network capacity.

Error 4 — Linear scaling assumptions: Doubling traffic does not mean halving your database capacity. Hotspots, lock contention, and index scan costs make databases non-linear. Plan for 2–3x headroom, not 1.1x.

Sanity Checks That Actually Work

After you produce a number, always check it against human intuition:

"Netflix has ~250M users. Our social app has 50M DAUs. So we're roughly 20% of Netflix's scale. Netflix reportedly serves billions of streaming minutes per day. Does my estimate put me in the right ballpark?"
"A modern server can handle 10,000–50,000 HTTP requests per second depending on the task. My peak QPS is 8,000. So a single server could technically handle it, but I'd want at least 3 for redundancy. Does that match the team size and operational complexity implied?"
"I calculated 50ms average read latency from my cache. But cache hit rates depend on working set size. If my working set is 100GB and I allocate 10GB of cache, my hit rate is maybe 60–70%, not 99%. Does that change my storage read QPS significantly?"

Presenting the Numbers

The presentation matters almost as much as the math. Good candidates write numbers on the whiteboard (or in the shared doc) rather than speaking them into the air. Structure it:

Assumptions:
- 100M DAU
- 5 reads per user per session, 1 write
- Average read payload: 2KB
- Average write payload: 500B
- Peak factor: 3x
 
Read QPS: 100M * 5 / 86400 ≈ 5,787 avg → ~17,000 peak
Write QPS: 100M * 1 / 86400 ≈ 1,157 avg → ~3,500 peak
Storage (writes only, 1 year, 3x replication):
  1,157 * 500B * 86400 * 365 * 3 ≈ 54.8 TB

Written down, it's reviewable. Verbal estimation is forgettable and harder to critique constructively.

Key Takeaways

Numbers anchor architecture — the same prompt produces radically different systems at 1K vs. 100K RPS.
Memorize a small table of latency anchors and unit conversions so you can sanity-check in real time without pausing.
Always state your replication factor, backup policy, and peak multiplier — they are multipliers on everything.
Concurrent connections and RPS are different; understand Little's Law: concurrency = RPS × latency.
Sanity-check estimates against known public systems (Netflix, Stripe, Twitter) to catch order-of-magnitude errors.
Write numbers on the board — reviewable math signals rigor; verbal-only estimation signals improvisation.