System Design Interviews

Ride-Sharing Walkthrough

Ravinder·May 6, 2025·6 min read

System DesignInterviewsArchitectureGeospatial

Series

System Design Interviews, Real

Part 6 of 10

← Part 5

Newsfeed Walkthrough

Part 7 →

Rate-Limiter Walkthrough

Ride-sharing is one of the most information-dense system design prompts. It packs geo-indexing, real-time data pipelines, matching algorithms, and location broadcast into a single problem. Most candidates try to design all of it at once and produce a vague sketch of everything. The right approach is to decompose it into three distinct subsystems, solve each deliberately, and then show how they interact.

Scope and Constraints

10M daily riders, 2M daily drivers. Peak hour: 30% of trips occur in 10% of the day.
Trips per day: assume 5M trips (some riders make 2+ trips, some make 0 on a given day).
Average trip: 20 minutes. Active drivers send location pings every 4 seconds.
Matching latency: < 5 seconds from ride request to driver assignment.
Location update ingestion: 2M drivers × 15 pings/minute = 500K pings/minute ≈ 8,300 writes/second during peak.
Rider feed (map showing nearby drivers): read every 2 seconds per active rider-app session.

The 8,300 writes/second for driver location is the dominant write load. It drives the architecture.

Subsystem 1: Location Ingestion

Driver location updates are high-frequency, low-value-per-event writes. You do not need to persist every ping durably; you need the latest position per driver, queryable in near real time.

The wrong approach: write every ping to a relational database. At 8,300 writes/second, a single PostgreSQL primary saturates quickly, and you are persisting stale data you will never query.

The right approach: store latest position per driver in Redis as a geospatial index. Redis supports GEOADD (insert/update a lat/lon for a named member) and GEORADIUS (find all members within N km of a point). Updates are O(log N) per driver, and GEORADIUS queries are fast within a bounded area.

Driver sends ping:
  POST /location {driver_id, lat, lon, timestamp}
 
Location Service:
  GEOADD drivers:{city_shard} lon lat driver_id
  HSET driver_meta:{driver_id} status available timestamp <ts>

Shard the geo index by city or grid cell (geohash prefix). A single Redis node can hold millions of driver positions, but partitioning by city also reduces GEORADIUS scan scope.

Subsystem 2: Geo-Indexing for Matching

When a rider requests a ride, you need to find nearby available drivers quickly. GEORADIUS on the live Redis geo index returns candidate drivers within radius R.

But what is the right radius? Start with 2 km. If fewer than K candidates are returned, expand to 5 km. If still empty, return "no drivers available."

flowchart TD RR[Rider Requests Ride] --> GL[Get rider lat/lon] GL --> GR[GEORADIUS drivers:{city} lat lon 2km] GR --> C{Candidates found?} C -->|< 3 candidates| EX[Expand radius to 5km] EX --> GR2[GEORADIUS at 5km] GR2 --> C2{Candidates found?} C2 -->|None| NA[Return no drivers available] C2 -->|Found| MA[Run matching algorithm] C -->|>= 3 candidates| MA MA --> AS[Assign driver\nSend push notification]

Geohash vs. S2 vs. H3: GEORADIUS internally uses a geohash-based index. For more complex spatial queries (polygon search, road-network distance vs. straight-line), companies like Uber use S2 geometry or H3 hexagonal indexing. For the interview, GEORADIUS is sufficient; mention that production systems switch to H3 or S2 for road-network-aware proximity.

Subsystem 3: Matching Algorithm

Matching is not just "nearest driver." Real matching considers:

Estimated time of arrival (ETA): computed from road network, not straight-line distance.
Driver heading: a driver facing away from the pickup is a worse match than a closer driver heading toward it.
Acceptance rate history: drivers who frequently reject are scored lower.
Fare optimization: surge pricing zones affect which drivers are available and willing.

For the interview, a weighted score is enough:

def match_score(driver, rider_location):
    eta = road_network_eta(driver.location, rider_location)
    heading_alignment = cosine_similarity(driver.heading, direction_to_rider)
    return (
        - 0.6 * eta
        + 0.3 * heading_alignment
        + 0.1 * driver.acceptance_rate
    )

Matching runs on a small candidate set (3–10 drivers from the geo query), so this is cheap computation — not a big-data problem at the matching step.

Real-Time Location Broadcast (The Rider's Map)

Riders see a map of nearby drivers updating every few seconds. This is a read-heavy, low-latency broadcast problem — separate from the matching subsystem.

Push via WebSocket: an active rider app holds a WebSocket connection to a location broadcast server. The server pushes nearby driver positions every 2 seconds without the client polling.

How the server knows what to push: a background process queries GEORADIUS for each active rider's location every 2 seconds and pushes the result set over the rider's WebSocket. At 5M concurrent active riders at peak, this is expensive — you need broadcast sharding (each server handles a geographic region or a shard of rider IDs).

flowchart LR DS[Driver pings location\nHTTP POST] --> LS[Location Service] LS --> RD[Redis GeoIndex] RD --> BS[Broadcast Service\nBackground loop every 2s] BS --> WS[WebSocket connections\nper active rider] WS --> RM[Rider Map Updates]

An alternative for rider-map updates: Server-Sent Events (SSE) instead of WebSocket. SSE is unidirectional (server to client), simpler to implement, and adequate for map updates where the client does not need to send messages back on the same connection.

Capacity Sanity Check

Component	Load
Location write (Redis GEOADD)	8,300 writes/sec peak
Rider map reads (GEORADIUS)	5M riders × 0.5 reads/sec = 2.5M reads/sec across all servers
Matching requests	5M trips/day ÷ 86,400 sec ≈ 58 matches/sec avg
WebSocket connections	5M concurrent during peak — requires ~5,000 server instances at 1,000 conns/server

The WebSocket connection count is the infrastructure scale challenge. This is why location broadcast servers are a specialized tier, not just more app servers.

Trip State Machine

A trip has clear state transitions that must be persisted durably (not just in Redis):

REQUESTED → MATCHED → DRIVER_EN_ROUTE → PICKUP → IN_PROGRESS → COMPLETED
                                                               ↓
                                                         CANCELLED (any state)

This state machine lives in a relational database (PostgreSQL), not in Redis. Redis is for ephemeral location data. Trip records need ACID guarantees — especially payment finalization at COMPLETED.

Key Takeaways

Decompose ride-sharing into three distinct subsystems: location ingestion, matching, and real-time broadcast — designing them together produces an incoherent sketch.
Driver location lives in a Redis geospatial index (GEOADD/GEORADIUS), not a relational DB — it is ephemeral, high-frequency, and needs fast proximity queries.
Matching runs on a small geo-filtered candidate set; the algorithm complexity is intentional, not a big-data problem.
WebSocket connection count (5M+ during peak) is the infrastructure scaling challenge for real-time map updates — plan for a dedicated broadcast tier.
Trip state machine persists durably in a relational database; Redis is not appropriate for data that needs ACID guarantees at payment time.
Geo index sharding by city or geohash prefix both reduces GEORADIUS scan scope and enables horizontal scaling of the location service.