System Design Interviews

Chat Walkthrough

Ravinder·May 27, 2025·7 min read

System DesignInterviewsArchitectureReal-Time

Series

System Design Interviews, Real

Part 9 of 10

← Part 8

Distributed Search Walkthrough

Part 10 →

The Follow-ups They Don't Tell You About

Chat is one of the most common system design prompts and one of the most frequently underestimated. A simple diagram — client, WebSocket server, message store — appears to answer the question. It does not. Real chat systems are four distinct subsystems that must be composed carefully: a delivery layer (WebSocket management), a fanout layer (routing messages to the right recipients), a storage layer (ordering and durability), and a presence layer (online/offline signaling). Most candidates design one of these and gesture at the others.

Scope and Constraints

50M DAU. 1 billion messages/day ≈ 11,574 messages/second average.
Peak factor 3×: ~35,000 messages/second during peak hours.
Message types: 1:1 (DM) and group chat (up to 1,000 members).
Latency: delivery < 200ms p99 for online recipients.
Storage: 7-year message history. Compliance requirement: messages must be stored and retrievable.
Presence: online/offline/away, updated within 5 seconds of a status change.

Derived: 11,574 msg/sec × 365 × 7 years × 500 bytes/message ≈ 14.8 TB/year, 104 TB over 7 years. This requires a time-series-partitioned, append-only storage system — not a general-purpose relational database.

Delivery Layer: WebSocket Management

Users maintain persistent WebSocket connections to chat servers. The challenge: a message sent by User A must reach User B's WebSocket connection, which may be on a different server.

Connection routing: each chat server holds a map of {user_id → connection} for currently connected users. A central connection registry (Redis hash or a dedicated service) maps each user_id to the server that holds their connection.

user_connection_registry (Redis):
  HSET connections user_id_1234 server_id_7
  HSET connections user_id_5678 server_id_3

When User A sends a message:

Chat server receives message over A's WebSocket.
Server writes message to Message Store (Kafka → Cassandra).
Server looks up all recipients from the conversation roster.
For each recipient, looks up their server ID in the connection registry.
Sends the message via server-to-server internal RPC to the server holding each recipient's connection.
That server pushes the message over the recipient's WebSocket.

Fanout: 1:1 vs. Group Chat

1:1 messages: fanout factor is 1. Sender → single recipient. Simple.

Group chat with 1,000 members: fanout factor is 999. One message generates 999 delivery operations. At 35,000 messages/second with average group size of 10, that is ~315,000 delivery operations/second. For groups of 1,000, a single message generates 999 operations — you cannot do this synchronously in the request path.

Group message delivery must be asynchronous. The sender's write is acknowledged after writing to the message queue. Background fanout workers consume from the queue, look up group membership, and fan out to each member's delivery path.

Large group optimization: for groups above a threshold (e.g., 100 members), switch from push fanout to pull-on-connect. When a user connects or reconnects, they pull unread messages from a conversation store rather than receiving them via individual server push. This avoids 999 delivery operations per message for large groups.

Message Ordering: The Subtle Hard Problem

Messages must appear in a consistent order for all participants. In a distributed system, this is non-trivial.

Per-conversation sequence numbers: each conversation has a monotonically increasing sequence counter. Every message in a conversation receives the next sequence number atomically. Recipients render messages in sequence order, not arrival order.

conversation_sequences (Redis):
  INCR seq:conversation_id_456  → returns 1, 2, 3, ...
 
message record:
  {conversation_id, sequence_number, sender_id, content, timestamp}

For a single-region system, Redis INCR is atomic and provides total ordering per conversation. For multi-region, you need a distributed sequence generator — either a globally consistent store (Spanner, CockroachDB) or a timestamp-based approach with conflict resolution.

Why timestamps alone fail: two messages sent 1ms apart from different servers may arrive with the same millisecond timestamp, or in the wrong order due to clock skew. Always use logical sequence numbers, not wall clocks, for ordering.

Storage: Cassandra for Chat History

Cassandra is the standard choice for chat message storage, and for good reason:

Write throughput: Cassandra's LSM-tree storage handles extremely high write rates without write amplification.
Time-range queries: messages within a conversation are retrieved by conversation_id and sequence_number range — exactly the access pattern Cassandra's partition key + clustering key model supports.
Time-to-live (TTL): per-row TTL for automatic expiration of old messages without expensive deletes.

CREATE TABLE messages (
    conversation_id UUID,
    sequence_number BIGINT,
    sender_id UUID,
    content TEXT,
    sent_at TIMESTAMP,
    PRIMARY KEY (conversation_id, sequence_number)
) WITH CLUSTERING ORDER BY (sequence_number DESC);

For 7-year retention at 104 TB, you need a tiered storage strategy: hot data (recent 30 days) on SSD-backed Cassandra nodes; warm data (30 days – 2 years) on HDD-backed nodes; cold data (2+ years) archived to object storage (S3/GCS) with a retrieval path for compliance queries.

Presence: The Often-Skipped Subsystem

Presence — knowing who is online — requires its own design. The naive approach (query the WebSocket server registry) does not scale for read.

Heartbeat-based presence: connected clients send a heartbeat every 10 seconds. The chat server updates a presence record in Redis with a TTL of 15 seconds:

SETEX presence:{user_id} 15 "online"

If the TTL expires (no heartbeat received), the user is marked offline. This handles connection drops gracefully.

Presence subscriptions: User A should see User B's presence only if they are in a conversation together. Pushing all presence updates to all users is O(users²) — not feasible. Instead, subscribe to presence for a specific set of user IDs when a conversation is opened, and unsubscribe when it is closed.

A dedicated Presence Service maintains subscriptions and fans out presence updates only to interested subscribers via the WebSocket delivery path.

Push Notifications for Offline Users

When a recipient is offline (no active WebSocket connection), the message must be delivered via push notification (APNs for iOS, FCM for Android).

The delivery path:

Fanout worker detects recipient is offline (connection registry lookup returns no server).
Worker enqueues a push notification job.
Push Notification Service fetches the job, calls APNs/FCM with device token + payload.
On reconnect, user fetches unread messages from Cassandra using their last-read sequence number.

Messages delivered via push should not include full message content if the conversation is end-to-end encrypted — the notification triggers a fetch, not a display.

Key Takeaways

Chat is four distinct subsystems: delivery (WebSocket management), fanout (routing to recipients), storage (ordering and durability), and presence — design each explicitly.
Server-to-server internal RPC is required to deliver messages to recipients whose WebSocket connection lives on a different server than the sender's.
Group chat fanout must be asynchronous; for large groups (100+ members), switch to pull-on-connect to avoid per-message delivery amplification.
Use per-conversation sequence numbers for total ordering, not wall-clock timestamps — clock skew and concurrent writes make timestamps unreliable for ordering.
Cassandra's partition-key + clustering-key model is purpose-built for the (conversation_id, sequence_number) access pattern at chat scale.
Presence tracking requires heartbeat TTLs and subscription-based fanout — avoid pushing all presence events to all users.