Skip to main content
API Design Mastery

Webhooks

Ravinder··5 min read
API DesignRESTGraphQLgRPCWebhooks
Share:
Webhooks

Webhooks invert the polling model: instead of clients asking "did anything change?", your API pushes notifications the moment something does. Done well, they are one of the most powerful integration primitives you can offer. Done poorly, they become a distributed reliability problem: missed events, duplicate deliveries, spoofed payloads, and ordering violations that corrupt your integrators' data. The gap between a webhook that works in testing and one that holds up in production is almost entirely in the delivery contract.

Webhook Payload Design

Every webhook event should be self-describing, versioned, and carry enough context to be actionable without a follow-up API call:

{
  "id": "evt_01J8K2MNPQ3RS4TUV5WX6YZ7A",
  "type": "order.shipped",
  "apiVersion": "2025-11-01",
  "createdAt": "2025-11-19T14:23:45Z",
  "data": {
    "object": {
      "id": "ord_789",
      "status": "shipped",
      "customerId": "cust_123",
      "items": [
        {"productId": "prod_A", "quantity": 2}
      ],
      "trackingNumber": "1Z999AA1012345678",
      "shippedAt": "2025-11-19T14:20:00Z"
    },
    "previousAttributes": {
      "status": "processing"
    }
  }
}

Key fields:

  • id — a unique event ID, used for deduplication.
  • type — namespaced event type (resource.action).
  • apiVersion — pinned to the API version when the subscription was created.
  • data.previousAttributes — what changed (the delta), so receivers do not need to call back immediately.

Embed enough data for the most common use case. If the payload is too large, include the object ID and let clients fetch details — but document this explicitly.

HMAC Signing

Without signing, any entity on the internet can POST to your customers' webhook endpoints pretending to be you. HMAC-SHA256 signing prevents this:

X-Signature-256: sha256=a1b2c3d4e5f6...
X-Signature-Timestamp: 1700395425

Signing procedure:

import hmac
import hashlib
import time
 
def sign_payload(secret: str, payload: bytes, timestamp: int) -> str:
    signed_content = f"{timestamp}.".encode() + payload
    signature = hmac.new(
        secret.encode(),
        signed_content,
        hashlib.sha256
    ).hexdigest()
    return f"sha256={signature}"

Receiver verification:

def verify_webhook(request, secret: str) -> bool:
    timestamp = int(request.headers["X-Signature-Timestamp"])
    signature = request.headers["X-Signature-256"]
 
    # Reject stale requests (replay window: 5 minutes)
    if abs(time.time() - timestamp) > 300:
        return False
 
    expected = sign_payload(secret, request.body, timestamp)
    return hmac.compare_digest(expected, signature)

Include the timestamp in the signed content to prevent replay attacks — a replayed payload from 24 hours ago will fail the staleness check. Use hmac.compare_digest (constant-time comparison) to prevent timing attacks.

Delivery and Retry

sequenceDiagram participant API as Your API participant Queue as Event Queue participant Worker as Delivery Worker participant Endpoint as Customer Endpoint API->>Queue: Enqueue event evt_A Worker->>Endpoint: POST /webhook (attempt 1) Endpoint-->>Worker: 500 Internal Server Error Worker->>Worker: Wait 30s (backoff) Worker->>Endpoint: POST /webhook (attempt 2) Endpoint-->>Worker: 200 OK Worker->>Queue: Mark evt_A delivered

Retry policy: exponential backoff with jitter, bounded by a max retry count and deadline.

Attempt 1: immediately
Attempt 2: 30s
Attempt 3: 5m
Attempt 4: 30m
Attempt 5: 2h
Attempt 6: 8h
Attempt 7: 24h
Give up: move to dead letter queue

Jitter prevents thundering herd when a downstream endpoint recovers after downtime and would otherwise receive all retried events simultaneously.

A delivery is successful only when the endpoint returns 2xx within a timeout (20–30 seconds). Treat 4xx as permanent failures (stop retrying), 5xx and timeouts as transient (retry).

Ordering Guarantees

Webhooks do not guarantee ordering by default — network conditions, retries, and parallel workers can invert event sequences. Design receivers to be order-tolerant:

{
  "type": "order.status_changed",
  "data": {
    "object": {
      "id": "ord_789",
      "status": "delivered",
      "sequenceNumber": 7
    }
  }
}

Include a monotonic sequenceNumber per resource so receivers can detect out-of-order delivery. If event 7 arrives before event 5, the receiver can either buffer event 7, request a replay of the gap, or re-fetch the current resource state.

For strict ordering requirements, offer a per-resource ordered queue (one goroutine/worker per resource ID). Events for different resources can still be processed in parallel, but events for a single resource are delivered in order.

Deduplication

At-least-once delivery is the realistic guarantee. Receivers will see duplicate events. Design for it:

def handle_webhook(event: dict):
    event_id = event["id"]
 
    # Idempotency check
    if redis.setnx(f"webhook:seen:{event_id}", 1, ex=86400):
        # First time seeing this event — process it
        process_event(event)
    else:
        # Duplicate — acknowledge but skip
        pass

Acknowledge the duplicate with 200 OK so your delivery system knows not to retry it.

Replay API

Provide a replay mechanism for events your customers missed:

POST /webhooks/subscriptions/{subId}/replay HTTP/1.1
Content-Type: application/json
 
{
  "since": "2025-11-18T00:00:00Z",
  "until": "2025-11-19T00:00:00Z",
  "types": ["order.shipped", "order.delivered"]
}

Store all events (with their original payload) for at least 7 days. Event replay is the escape hatch when a customer's endpoint was down for maintenance, their deploy was broken, or their database migration ate some records.

Key Takeaways

  • Every webhook event needs a unique ID for deduplication, a namespaced type, a pinned API version, and enough payload data to be actionable without a follow-up call.
  • Sign all payloads with HMAC-SHA256 including a timestamp; receivers must verify the signature and reject requests older than 5 minutes to prevent replay attacks.
  • Use exponential backoff with jitter for retries; treat 4xx as permanent failures and 5xx/timeouts as transient; route undeliverable events to a dead-letter queue.
  • Include a per-resource sequence number so receivers can detect and handle out-of-order delivery.
  • Guarantee at-least-once delivery and design receiver logic to be idempotent using the event ID as a deduplication key.
  • Provide a replay API backed by at least 7 days of event history — it is your customers' safety net when things go wrong.
Share: