System Design Interviews

The Five-Box Drawing and What Each Box Hides

Ravinder·April 15, 2025·8 min read

System DesignInterviewsArchitectureInfrastructure

Series

System Design Interviews, Real

Part 3 of 10

← Part 2

Capacity Estimation That Holds Up

Part 4 →

Trade-off Vocabulary That Lands

The Five-Box Drawing and What Each Box Hides

Walk into any system design interview and within ten minutes you will see the same picture: client on the left, load balancer, app servers in the middle, cache, database on the right. Interviewers have seen this diagram ten thousand times. The diagram itself is not wrong — it is often correct. But candidates who draw it and move on are failing silently. The boxes are not the design; what you say about the inside of each box is.

The Standard Drawing and Its Lie

The five-box sketch is a starting sketch, not a final answer. It is the equivalent of writing "sort the array" as the solution to a complex algorithmic problem. The shape is correct. Everything that matters is missing.

flowchart LR A[Client] --> B[Load Balancer] B --> C[App Servers] C --> D[Cache] C --> E[Database]

Every senior candidate draws this and immediately starts excavating each box. What kind of load balancer? What algorithm? Stateless or sticky sessions? What is cached — full responses or objects? What is the cache eviction policy? What database — relational, document, wide-column, graph? What consistency level? Where are the replication replicas? This is where interviews are won.

Box 1: The Client

"Client" is usually shorthand for a browser, a mobile app, or another service. Each behaves radically differently under your architecture.

A browser enforces CORS, caches aggressively via service workers, and has a connection limit per host. A mobile client on a 3G network may have 150ms+ base latency and loses connections mid-request. A machine-to-machine service client may batch requests and handle retries differently.

What to surface: connection protocol (HTTP/2 vs WebSockets vs gRPC), retry logic ownership, timeout expectations, and whether the client can tolerate eventual consistency or demands linearizable reads.

The lie: "client" implies a single known entity. In practice, your client tier is heterogeneous — web, iOS, Android, internal services — each with different retry budgets, payload tolerances, and auth mechanisms.

Box 2: The Load Balancer

Load balancers hide multitudes. The relevant decisions:

Layer 4 vs Layer 7: L4 (TCP-level) is faster and simpler; it forwards packets based on IP/port. L7 (HTTP-level) can inspect headers, route by URL path, handle SSL termination, and make content-aware routing decisions. For most API systems, you want L7.

Algorithm: Round-robin is the default. Least-connections is better when requests have variable latency. Consistent hashing is essential when you need sticky routing to stateful backends (session servers, WebSocket handlers).

Health checks: Active vs passive. Active health checks poll backends; passive ones detect failure from real traffic failures. The interval and threshold matter — a 30-second health check interval means a failed node serves bad traffic for up to 30 seconds.

The lie: The load balancer box hides whether you have a single point of failure. An LB without a standby is itself a bottleneck and failure domain. In production, you run an LB cluster (HAProxy pairs, AWS ALB with built-in redundancy, etc.). In the interview, you should mention it.

Box 3: App Servers

"App servers" hides one of the most consequential decisions: stateless vs. stateful.

A stateless app server holds no session data. Any instance can serve any request. This is ideal — you can scale horizontally without coordination, replace instances freely, and route any LB algorithm. Build stateless wherever possible.

A stateful server is a trap. If session data lives in-process, you must use sticky sessions at the LB, which negates most scaling benefits. Move state to an external store (Redis for sessions, a DB for durable state) and reclaim your horizontal scale.

The lie: "app servers" does not reveal the deployment model. Are these containers in Kubernetes with autoscaling? EC2 instances with an ASG? Bare metal? The model affects cold-start latency, provisioning speed during traffic spikes, and operational toil during on-call.

flowchart TD A[Stateless App Server] -->|Session stored externally| B[Redis Session Store] A --> C[Any request routable to any instance] D[Stateful App Server] -->|Session in-process| E[Sticky LB routing required] E --> F[Scaling bottleneck]

Box 4: The Cache

The cache box hides the most dangerous assumptions in system design interviews.

What is being cached? Full HTTP responses (simple, coarse-grained), serialized objects (flexible, requires invalidation), or computed aggregates (fast reads, hard invalidation). Each has a different invalidation surface.

Cache-aside vs write-through vs write-behind: Cache-aside puts invalidation responsibility on the application — lazy population on miss. Write-through ensures the cache is always warm but adds write latency. Write-behind (write-back) acknowledges writes to cache first, persists asynchronously — fast writes, risk of data loss on cache crash.

Eviction policy: LRU (least recently used) is the default and suitable for most workloads. LFU (least frequently used) is better when access frequency matters more than recency — e.g., a hot product catalog. FIFO is usually wrong for caches.

Hit rate math: If your cache hit rate is 90% and you handle 10,000 RPS, you're absorbing 9,000 RPS at cache. The remaining 1,000 RPS hit your database. If your database can handle 2,000 RPS safely, you have headroom. But if hit rate drops to 70% (cold start, cache eviction storm, deployment), 3,000 RPS hit the DB — potential cascade.

The lie: The cache box does not reveal cache stampede risk. On a cache miss for a hot key, 100 simultaneous requests may all miss, all query the database, and all try to write back to cache. This thundering herd must be mitigated — use probabilistic early expiration, mutex locks on cache repopulation, or short negative TTLs.

Box 5: The Database

The database box is where most candidates leave the most value on the table.

The choice itself is a signal: "I'd use PostgreSQL here because the data is relational, we need ACID transactions for payment records, and our write volume of ~3,500 QPS is within a well-tuned PG primary's capacity" is an impressive answer. "I'd use a database" is not.

Primary/replica topology: A single primary with read replicas offloads read traffic but introduces replication lag. If your application reads immediately after a write and gets stale data, that is a consistency bug — mitigated by reading from primary for sensitive reads or using synchronous replication (with latency cost).

Sharding: When a single primary cannot absorb write volume, you shard. Key choices: partition key (user ID, tenant ID, geo region), shard count (affects rebalancing cost), and cross-shard query strategy (scatter-gather, denormalization, or avoiding cross-shard queries entirely).

flowchart TD W[Write] --> P[Primary DB] P -->|Async replication| R1[Read Replica 1] P -->|Async replication| R2[Read Replica 2] R[Read - non-critical] --> R1 R2 --> RC[Read traffic] RS[Read - post-write] --> P

The lie: The database box does not show the connection pool. App servers do not open raw TCP connections to the database per request — they use a connection pool. Pool exhaustion is a common production incident. A pool of 100 connections serving 10,000 concurrent requests queues 9,900 requests. Design the pool size: pool_size = (target_concurrent_requests * avg_query_latency_ms) / 1000.

Turning the Drawing Into a Design

The five-box diagram becomes a real design when you annotate each box with the decisions you made and why. Interviewers are evaluating your reasoning process, not just the picture. A good verbal pattern:

"I'm drawing the LB here — I'd go L7 with least-connections routing since our requests vary significantly in processing time. The app servers are stateless; sessions live in Redis here. For the cache I'm using cache-aside with a 5-minute TTL on user profile reads, which are high-frequency and low-staleness-sensitivity. The DB is PostgreSQL — we have strong consistency requirements for order records. I've got a primary with two read replicas; order confirmation reads go to primary, feed queries go to replicas."

That is a design. The other version — five boxes, no annotation — is a placeholder.

Key Takeaways

The five-box diagram is a starting sketch, not a design — what matters is what you say about the inside of each box.
The load balancer hides algorithm choice, health check strategy, and whether it is itself a single point of failure.
App servers should be stateless by default; state belongs in external stores to enable true horizontal scaling.
Cache design includes eviction policy, invalidation strategy, hit rate math, and thundering herd mitigation — not just "add a Redis."
The database box hides topology (primary/replica, sharding), connection pooling, and consistency model for post-write reads.
Annotate your diagram with reasoning as you draw — the annotation is the interview, not the boxes.