Build WhatsApp Message Delivery with Offline Queuing

System Design Deep Dive

WhatsApp Message Delivery

Delivering end-to-end encrypted messages in 100ms to online users while durably queuing for offline ones

⏱ 14 min read📐 Advanced🏗️ Messaging

Imagine a postal service where every letter is sealed with a lock that only the recipient can open - and you, the postal worker, never have a key. That’s end-to-end encryption: the server routes millions of sealed envelopes per second without being able to read any of them. Now add the requirement that letters addressed to someone currently away from home must be held safely, delivered exactly once when they return, and expire after 30 days if they never come back.

That’s the WhatsApp delivery problem. At 100 billion messages per day across 2+ billion users, the scale is staggering - but the real engineering challenge is not throughput, it’s the combination of guarantees that must hold simultaneously. Messages must arrive in order. Each must be delivered exactly once. End-to-end encryption must remain intact across the routing layer. Offline users can reconnect days later and must receive all queued messages in the right sequence. And push notifications must fire for the offline case without creating a delivery race condition with the WebSocket reconnect.

A naive approach - store messages in a database, poll for new ones - fails on latency (polling at 100ms intervals means 100ms average delay, not 100ms maximum), consistency (two concurrent poll queries might deliver the same message twice), and encryption (storing decryptable ciphertext on a central server breaks E2E guarantees). A more sophisticated attempt - a central message queue per user - fixes latency but breaks under multi-device scenarios: if the same user is connected on both phone and laptop, which device drains the queue?

The forces in tension are: delivery latency vs. durability of the offline queue, exactly-once delivery vs. horizontal scaling of delivery servers, multi-device fan-out vs. the single-recipient model of E2E encryption, and push notification timing vs. the ACK-based delivery state machine. We need to solve for end-to-end encryption key management, the message queue per recipient device, the ACK protocol, push notification fallback, and re-delivery on reconnect.

Requirements and Constraints

Functional Requirements

Send a message from sender to recipient in under 100ms when both are online
Queue messages durably when the recipient is offline (up to 30 days)
Deliver all queued messages in order when the recipient reconnects
Support multi-device: a user’s messages arrive on all their registered devices
Provide delivery receipts: sent, delivered, read
Support message expiry: configurable self-destruct timers (30s to 7 days)
End-to-end encrypt every message - the server must never see plaintext

Non-Functional Requirements

100 billion messages per day = ~1.16 million messages/second average, ~4M/second at peak
Online delivery latency: < 100ms p99
Offline queue durability: messages survive server crashes, available after reconnect
Queue depth: up to 10,000 messages per device during offline period
Offline message retention: 30 days, then expired and sender notified
Availability: 99.99% (< 52 minutes downtime per year)
Concurrent WebSocket connections: 2 billion sustained

Constraints

Encryption keys are managed by clients; the server stores only public keys
Messages contain only encrypted ciphertext from the server’s perspective
Each device has its own set of encryption keys (Signal Protocol)
Multi-device fan-out happens at the sender client, not the server

High-Level Architecture

The system has six major components: a Connection Layer that maintains WebSocket connections for online users, a Message Router that dispatches inbound messages to the right delivery path, an Offline Queue that durably stores messages for disconnected devices, a Key Distribution Service that serves public keys for encryption, an ACK Protocol Engine that tracks delivery state, and a Push Notification Gateway that wakes offline devices.

Data flows in two distinct paths. Online path: sender client sends an encrypted message to the connection server via WebSocket; the connection server forwards to the Message Router; the router finds the recipient’s active connection server and delivers directly via server-to-server RPC; the recipient’s connection server pushes the message down the WebSocket; the recipient client sends an ACK; the ACK propagates back to the sender. Total: < 100ms. Offline path: Message Router finds no active connection for the recipient’s device; message is written to the Offline Queue with a 30-day TTL; Push Notification Gateway fires a silent push to wake the device; when the device reconnects, it drains its offline queue in order.

Key Insight

The server never routes plaintext - it routes opaque binary blobs with a recipient device ID and a sequence number. E2E encryption is not a layer on top of the routing system; it IS the data model the routing system operates on.

The Connection Layer

The Connection Layer maintains long-lived WebSocket connections for 2 billion concurrent users. Each connection server (a stateful process) holds hundreds of thousands of WebSocket connections. A global registry maps {user_id, device_id} to the connection server handling that device’s current session.

The connection registry is a distributed in-memory store (backed by Cassandra for durability). Each entry has a TTL of 90 seconds, refreshed by a client heartbeat every 30 seconds. If a device fails to send a heartbeat for 90 seconds, its registry entry expires and it is considered offline.

# Connection registry - Redis per-cluster with Cassandra backing
import redis
import time

r = redis.Redis(host='connection-registry')

def register_connection(user_id: str, device_id: str, server_id: str):
    key = f"conn:{user_id}:{device_id}"
    r.set(key, server_id, ex=90)  # 90s TTL, refreshed by heartbeat

def get_connection_server(user_id: str, device_id: str) -> str | None:
    key = f"conn:{user_id}:{device_id}"
    return r.get(key)

def heartbeat(user_id: str, device_id: str, server_id: str):
    register_connection(user_id, device_id, server_id)  # reset TTL

def get_all_devices(user_id: str) -> dict[str, str]:
    """Returns {device_id: server_id} for all online devices."""
    pattern = f"conn:{user_id}:*"
    keys = r.keys(pattern)
    if not keys:
        return {}
    values = r.mget(keys)
    result = {}
    for key, val in zip(keys, values):
        if val:
            device_id = key.decode().split(':')[2]
            result[device_id] = val.decode()
    return result

The server-to-server path uses gRPC with mutual TLS between connection servers. When server A needs to deliver a message to a user connected to server B, it makes a unary RPC call to server B’s DeliverMessage endpoint. The message payload is the E2E-encrypted ciphertext - server A passes it through without decrypting.

// Connection server gRPC service
syntax = "proto3";

service ConnectionServer {
  rpc DeliverMessage(DeliverRequest) returns (DeliverResponse);
  rpc GetConnectionStatus(StatusRequest) returns (StatusResponse);
}

message DeliverRequest {
  string user_id = 1;
  string device_id = 2;
  bytes ciphertext = 3;         // E2E encrypted payload - opaque to server
  string message_id = 4;        // globally unique UUID
  int64 server_timestamp_ms = 5;
  int32 message_type = 6;       // 1=text, 2=media_pointer, 3=control
}

message DeliverResponse {
  bool delivered = 1;
  string error = 2;  // empty on success
}

Real World

WhatsApp’s actual architecture uses XMPP (originally) evolved to a custom binary protocol. The connection layer design - stateful servers holding WebSocket connections with a registry for routing - is confirmed in Meta’s engineering blog posts on WhatsApp infrastructure. The key architectural choice of not storing message content on the server is a deliberate privacy design, not just an encryption convenience.

End-to-End Encryption and Key Distribution

The encryption model is based on the Signal Protocol: each device has a long-term identity key pair, a signed pre-key, and a set of one-time pre-keys. To send a message, the sender performs an X3DH key agreement using the recipient’s public keys to derive a shared secret, then uses the Double Ratchet algorithm to generate per-message encryption keys.

The Key Distribution Service (KDS) stores only public keys. It serves two requests: “give me the public key bundle for device X so I can initiate a session” and “here are new one-time pre-keys from device Y to replenish its supply.”

# Key Distribution Service - public key storage
from cassandra.cluster import Cluster

cluster = Cluster(['kds-cassandra'])
session = cluster.connect('whatsapp')

def get_key_bundle(user_id: str, device_id: str) -> dict:
    """Returns public key bundle for X3DH key agreement."""
    row = session.execute(
        "SELECT identity_key, signed_prekey, signed_prekey_sig, "
        "one_time_prekey FROM device_keys "
        "WHERE user_id = %s AND device_id = %s LIMIT 1",
        [user_id, device_id]
    ).one()
    if not row:
        raise KeyError(f"No keys for {user_id}/{device_id}")

    # Atomically claim the one-time pre-key (each can only be used once)
    session.execute(
        "DELETE FROM device_keys WHERE user_id = %s AND device_id = %s "
        "AND one_time_prekey = %s",
        [user_id, device_id, row.one_time_prekey]
    )
    return {
        'identity_key': row.identity_key,
        'signed_prekey': row.signed_prekey,
        'signed_prekey_sig': row.signed_prekey_sig,
        'one_time_prekey': row.one_time_prekey,
    }

def replenish_one_time_keys(user_id: str, device_id: str, keys: list[bytes]):
    """Device uploads new one-time pre-keys."""
    for key in keys:
        session.execute(
            "INSERT INTO device_keys (user_id, device_id, one_time_prekey) "
            "VALUES (%s, %s, %s)",
            [user_id, device_id, key]
        )

Watch Out

One-time pre-keys must be consumed atomically - if two senders fetch the same pre-key simultaneously, they’d derive the same session key, breaking forward secrecy. The KDS uses a Cassandra conditional delete (compare-and-delete) to ensure each pre-key is claimed exactly once.

When a device runs low on pre-keys (below 20 remaining), the KDS sends a replenishment request to the device via its WebSocket. Devices upload 100 new one-time pre-keys in response. If pre-keys are completely exhausted, the KDS returns only the signed pre-key, which is reusable - this is less secure but allows session establishment to proceed.

Key Insight

The one-time pre-key mechanism is what gives the Signal Protocol its forward secrecy guarantee - even if an attacker later compromises a device’s identity key, they cannot decrypt past messages because each session was initialized with a one-time key that no longer exists.

The Offline Queue

The Offline Queue stores messages for devices that are not currently connected. It is the durability heart of the system. Each queue is owned by a {user_id, device_id} pair and is strictly ordered by sequence number.

-- Offline queue schema - Cassandra CQL
CREATE TABLE offline_messages (
    user_id       TEXT,
    device_id     TEXT,
    seq_num       BIGINT,         -- monotonically increasing per device
    message_id    UUID,
    ciphertext    BLOB,           -- E2E encrypted payload
    message_type  INT,
    sender_id     TEXT,
    enqueued_at   TIMESTAMP,
    expires_at    TIMESTAMP,      -- enqueued_at + 30 days
    PRIMARY KEY ((user_id, device_id), seq_num)
) WITH CLUSTERING ORDER BY (seq_num ASC)
  AND default_time_to_live = 2592000;  -- 30 days

-- Sequence counter table (one row per device)
CREATE TABLE device_seq (
    user_id   TEXT,
    device_id TEXT,
    last_seq  BIGINT,
    PRIMARY KEY (user_id, device_id)
);

Sequence numbers are assigned by the Message Router using a Cassandra counter-style atomic increment:

# Sequence number assignment - atomic increment
def next_seq(user_id: str, device_id: str) -> int:
    result = session.execute(
        "UPDATE device_seq SET last_seq = last_seq + 1 "
        "WHERE user_id = %s AND device_id = %s",
        [user_id, device_id]
    )
    # Read back the current value
    row = session.execute(
        "SELECT last_seq FROM device_seq WHERE user_id = %s AND device_id = %s",
        [user_id, device_id]
    ).one()
    return row.last_seq

def enqueue_message(user_id: str, device_id: str, message_id: str,
                    ciphertext: bytes, message_type: int, sender_id: str):
    seq = next_seq(user_id, device_id)
    expires_at = time.time() + 2592000  # 30 days
    session.execute(
        "INSERT INTO offline_messages "
        "(user_id, device_id, seq_num, message_id, ciphertext, message_type, "
        "sender_id, enqueued_at, expires_at) "
        "VALUES (%s, %s, %s, %s, %s, %s, %s, toTimestamp(now()), %s)",
        [user_id, device_id, seq, message_id, ciphertext, message_type,
         sender_id, expires_at]
    )
    return seq

When a device reconnects, it tells the server its last acknowledged sequence number (last_ack_seq). The server queries for all messages with seq_num > last_ack_seq and streams them to the client:

# Reconnect sync - drain offline queue from last ACK
def sync_offline_messages(user_id: str, device_id: str,
                           last_ack_seq: int, limit: int = 100) -> list[dict]:
    rows = session.execute(
        "SELECT seq_num, message_id, ciphertext, message_type, sender_id "
        "FROM offline_messages "
        "WHERE user_id = %s AND device_id = %s AND seq_num > %s "
        "LIMIT %s",
        [user_id, device_id, last_ack_seq, limit]
    )
    return [
        {
            'seq_num': r.seq_num,
            'message_id': str(r.message_id),
            'ciphertext': bytes(r.ciphertext),
            'message_type': r.message_type,
            'sender_id': r.sender_id,
        }
        for r in rows
    ]

Real World

The reconnect sync pattern - client sends last-seen sequence number, server streams everything after - is also used by Apache Kafka consumers (consumer offset), IRC servers (message replay), and Apple’s APNs (notification coalescing). The sequence-number-based sync is universally preferred over timestamp-based sync because sequence numbers are monotonic and gap-free.

The ACK Protocol

The ACK protocol tracks the delivery state of each message through a state machine with four states: SENT (server received), DELIVERED (recipient device received), READ (user opened the conversation), and FAILED (delivery failed after max retries).

The state machine transitions:

SENT -> DELIVERED: recipient device sends server-ACK after receiving message
DELIVERED -> READ: recipient app sends read-receipt when user opens conversation
SENT -> FAILED: delivery not confirmed after 30 days (offline queue expired)

The sender receives delivery receipts via the same WebSocket channel used for inbound messages. Receipts are small control messages routed back through the connection layer:

# Delivery state tracking - Redis with Cassandra fallback
DELIVERY_STATES = {1: 'sent', 2: 'delivered', 3: 'read', 4: 'failed'}

def update_delivery_state(message_id: str, new_state: int,
                           sender_id: str, recipient_device: str):
    # Write to Redis for fast sender queries
    r.hset(f"msg_state:{message_id}", mapping={
        'state': new_state,
        'recipient': recipient_device,
        'ts': time.time()
    })
    r.expire(f"msg_state:{message_id}", 2592000)  # 30d TTL

    # Persist to Cassandra for durability
    session.execute(
        "INSERT INTO message_delivery_log "
        "(message_id, state, recipient_device, sender_id, updated_at) "
        "VALUES (%s, %s, %s, %s, toTimestamp(now()))",
        [message_id, DELIVERY_STATES[new_state], recipient_device, sender_id]
    )

def send_delivery_receipt(message_id: str, state: int,
                           sender_id: str, sender_device: str):
    """Route receipt back to sender's active connection."""
    server_id = get_connection_server(sender_id, sender_device)
    if server_id:
        receipt_payload = {
            'type': 'delivery_receipt',
            'message_id': message_id,
            'state': state,
        }
        deliver_to_server(server_id, sender_id, sender_device, receipt_payload)

Watch Out

If the sender is offline when a delivery receipt arrives, the receipt itself must be queued in the sender’s offline queue - not discarded. A common mistake is treating receipts as fire-and-forget. Receipts that don’t make it back to the sender leave the UI stuck on “sending…” indefinitely.

Push Notification Fallback

When a message is enqueued to an offline device’s queue, the Push Notification Gateway fires a silent push to wake the device. A silent push does not display a notification to the user; it wakes the app in the background so it can establish a WebSocket connection and drain its queue.

# Push notification - silent push to wake device for queue drain
import httpx

APNS_URL = "https://api.push.apple.com/3/device/{token}"
FCM_URL = "https://fcm.googleapis.com/v1/projects/{project}/messages:send"

async def send_silent_push(device_token: str, platform: str,
                            user_id: str, queue_depth: int):
    if platform == 'ios':
        payload = {
            'aps': {
                'content-available': 1,   # silent push flag
            },
            'user_id': user_id,
            'queue_depth': queue_depth,
        }
        headers = {
            'apns-push-type': 'background',
            'apns-priority': '5',  # low priority for silent push
            'apns-topic': 'com.whatsapp.WhatsApp',
        }
        url = APNS_URL.format(token=device_token)
        async with httpx.AsyncClient(http2=True) as client:
            await client.post(url, json=payload, headers=headers)

    elif platform == 'android':
        payload = {
            'message': {
                'token': device_token,
                'data': {
                    'type': 'silent_sync',
                    'user_id': user_id,
                    'queue_depth': str(queue_depth),
                },
                'android': {
                    'priority': 'normal',  # not high - silent sync
                },
            }
        }
        async with httpx.AsyncClient() as client:
            await client.post(FCM_URL.format(project='whatsapp-prod'),
                             json=payload,
                             headers={'Authorization': f'Bearer {get_fcm_token()}'})

The push notification and the WebSocket reconnect can race: the device might reconnect before the silent push arrives (e.g., user opened the app manually). This race is harmless because the sync protocol is idempotent - the device always syncs from its last_ack_seq, so re-syncing after a reconnect simply returns the same messages again if they were already delivered.

Key Insight

Silent push is used specifically to avoid showing a notification for every queued message on reconnect - if you used visible pushes, a user returning from 3 days offline would see thousands of notification banners. Silent push wakes the app once to drain the queue silently, then the app shows one aggregate unread count.

Data Model

-- Device registration - Cassandra
CREATE TABLE devices (
    user_id           TEXT,
    device_id         TEXT,
    push_token        TEXT,
    platform          TEXT,    -- ios | android | web
    last_seen_at      TIMESTAMP,
    registration_id   TEXT,    -- stable ID for key distribution
    PRIMARY KEY (user_id, device_id)
);

-- Message delivery log (for sender receipts)
CREATE TABLE message_delivery_log (
    message_id        UUID,
    state             TEXT,
    recipient_device  TEXT,
    sender_id         TEXT,
    updated_at        TIMESTAMP,
    PRIMARY KEY (message_id, updated_at)
) WITH CLUSTERING ORDER BY (updated_at DESC)
  AND default_time_to_live = 2592000;

-- Device public keys (KDS)
CREATE TABLE device_keys (
    user_id          TEXT,
    device_id        TEXT,
    identity_key     BLOB,
    signed_prekey    BLOB,
    signed_prekey_sig BLOB,
    one_time_prekey  BLOB,
    PRIMARY KEY (user_id, device_id, one_time_prekey)
);

-- Message expiry tracking
CREATE TABLE message_expiry (
    conversation_id  TEXT,
    message_id       UUID,
    expires_at       TIMESTAMP,
    PRIMARY KEY (conversation_id, expires_at, message_id)
) WITH CLUSTERING ORDER BY (expires_at ASC)
  AND default_time_to_live = 604800;  -- 7-day max self-destruct window

Partitioning strategy: offline_messages is partitioned by (user_id, device_id) - all messages for a device are co-located, making reconnect sync a single-partition scan. device_keys is partitioned by (user_id, device_id, one_time_prekey) to support atomic key claiming via conditional deletes.

Key Algorithms and Protocols

Double Ratchet - Per-Message Key Derivation

The Double Ratchet algorithm ensures each message uses a unique encryption key, derived from the previous key using a KDF chain. This means compromising one message key does not compromise past or future messages.

# Double Ratchet - simplified ratchet step
import hmac
import hashlib
import os

def kdf_chain_step(chain_key: bytes) -> tuple[bytes, bytes]:
    """Returns (new_chain_key, message_key) for one ratchet step."""
    message_key = hmac.new(chain_key, b'\x01', hashlib.sha256).digest()
    new_chain_key = hmac.new(chain_key, b'\x02', hashlib.sha256).digest()
    return new_chain_key, message_key

def encrypt_message(plaintext: bytes, chain_key: bytes) -> tuple[bytes, bytes]:
    """Returns (ciphertext, new_chain_key)."""
    from cryptography.hazmat.primitives.ciphers.aead import AESGCM
    new_chain_key, message_key = kdf_chain_step(chain_key)
    nonce = os.urandom(12)
    cipher = AESGCM(message_key)
    ciphertext = nonce + cipher.encrypt(nonce, plaintext, None)
    # Wipe message key from memory after use - never persist it
    message_key = b'\x00' * 32
    return ciphertext, new_chain_key

Time complexity: O(1) per message. Each ratchet step is a single HMAC operation. The chain key advances forward-only - there is no way to derive a previous message key from a current chain key.

Message Ordering Guarantee

The offline queue sequence number guarantees ordered delivery on reconnect. For online delivery, WebSocket connections maintain TCP ordering. The risk is reordering between the offline and online paths:

# Delivery path selection with ordering guarantee
def route_message(user_id: str, device_id: str,
                  message_id: str, ciphertext: bytes,
                  message_type: int, sender_id: str) -> str:
    server_id = get_connection_server(user_id, device_id)

    if server_id:
        # Online path: try direct delivery first
        result = deliver_to_server(server_id, user_id, device_id,
                                   message_id, ciphertext, message_type)
        if result.delivered:
            # Also assign seq_num so reconnect sync stays coherent
            seq = next_seq(user_id, device_id)
            record_delivered_seq(user_id, device_id, seq, message_id)
            return 'online'

    # Offline path (or online delivery failed - fallback)
    seq = enqueue_message(user_id, device_id, message_id, ciphertext,
                          message_type, sender_id)
    trigger_silent_push(user_id, device_id)
    return f'queued:{seq}'

The critical detail: even for online delivery, we assign and record a sequence number. This ensures the reconnect sync doesn’t re-deliver messages that were already received via the online path.

Key Insight

The sequence number must be assigned atomically with routing - not after delivery confirmation. If you assign it only on successful online delivery, a crash between delivery and assignment creates a gap in the sequence, causing the reconnect sync to re-deliver already-delivered messages.

Scaling and Performance

Given:
  - 100B messages/day = 1.16M messages/second average
  - Peak: 4M messages/second
  - Average ciphertext size: 500 bytes (Signal-encrypted short message)
  - Offline queue: 30% of messages go offline (3B/day offline)

Cassandra offline queue writes:
  - 3B/day offline = 34,700 writes/second
  - Each write: ~500 bytes ciphertext + ~100 bytes metadata = 600 bytes
  - Storage: 3B/day * 600B * 30 days retention = ~54 TB offline queue
  - With RF=3: ~162 TB total Cassandra storage for offline queue

Connection registry (Redis):
  - 2B active devices, ~60% online at peak = 1.2B entries
  - Each entry: 48 bytes (user_id:device_id -> server_id)
  - Total: ~58 GB Redis - needs cluster mode

gRPC server-to-server throughput:
  - 1.16M messages/second / 10,000 connection servers = 116 messages/server/second
  - Each server handles 200K WebSocket connections
  - Each WebSocket handles ~0.58 messages/second average
  - Peak: 4M/10K = 400 messages/server/second

Push notification volume:
  - 30% offline = 1.2B offline messages/day per day / avg queue depth 4 = 300M pushes/day
  - APNs/FCM combined: ~3,500 pushes/second sustained

The dominant scaling challenge is the connection registry. At 1.2 billion online devices, the registry needs to serve ~4 million lookups per second (one per routed message). Redis Cluster with 32 shards can handle this: each shard serves ~125,000 lookups/second, well within Redis’s 500,000 ops/second per node capacity.

Real World

Erlang’s process model was the reason WhatsApp originally chose it - each WebSocket connection is a lightweight process, and Erlang can sustain millions of concurrent processes on a single machine. A single WhatsApp Erlang server handled 2 million concurrent connections in their famous 2012 benchmark. The connection-per-process model maps perfectly to the long-lived WebSocket pattern.

Failure Modes and Recovery

Failure	Detection	Impact	Recovery
Connection server crash	Registry TTL expires (90s), health check failure	All users on that server disconnected; pending ACKs lost	Clients auto-reconnect (exponential backoff); offline queue ensures no message loss
Cassandra node failure	Nodetool nodesync, health probes	Offline queue writes/reads degraded	RF=3 means quorum (2 nodes) sufficient; failed node replaced within 1h
Push notification delivery failure (APNs/FCM)	4xx/5xx response, HTTP/2 stream error	Device not woken; messages accumulate in queue undelivered	App reconnects on next foreground open; queue drains then; 30-day TTL provides long recovery window
Pre-key exhaustion on KDS	KDS returns count < 20, alert fires	New sessions use reusable signed pre-key (less forward secrecy)	Device receives replenishment request on next heartbeat; uploads 100 new keys
Message Router split-brain (two servers route same message)	Duplicate message_id detected at delivery	Message delivered twice	Client deduplicates by message_id; server-side idempotent insert via Cassandra message_id check
Offline queue expired (30 days)	expires_at timestamp, Cassandra TTL	Sender notified via delivery receipt state ‘failed’	No recovery - by design; notify sender with FAILED receipt

Watch Out

The most dangerous operational failure is a Redis cluster failure that wipes the connection registry. Without the registry, the Message Router cannot route online deliveries and falls back to queuing everything offline - causing massive queue accumulation. Always replicate the connection registry to a secondary cluster and implement a circuit breaker that switches to the secondary on primary failure.

Comparison of Approaches

Approach	Online Latency	Offline Durability	E2E Encryption	Complexity	Best Fit
Polling (HTTP long-poll)	100ms-2s	Database-backed	Possible	Low	< 1M users, infrequent messaging
Server-Sent Events	50ms	Limited	Possible	Medium	Read-heavy; no bidirectional need
WebSocket + persistent queue (our approach)	< 50ms	Strong (Cassandra RF=3)	Native	High	100M+ users, bidirectional, offline-first
XMPP federation	50-200ms	Server-dependent	Add-on	Medium	Federated messaging, open standards
Peer-to-peer (WebRTC DataChannel)	< 20ms	None	Native	Very High	Real-time only; both online required

The WebSocket + persistent queue approach wins for WhatsApp’s requirements because it combines low online latency with strong offline guarantees. The key tradeoff vs. polling: connection servers are stateful, requiring careful handling of server crashes and connection redistribution. The tradeoff vs. peer-to-peer: requires server infrastructure, but gains offline queuing and works through NAT/firewalls.

Key Takeaways

End-to-end encryption: The server routes opaque ciphertext identified by recipient device ID - plaintext never touches the routing layer.
Message queue per recipient device: Scoping the offline queue to (user_id, device_id) rather than just user_id is what enables correct multi-device delivery without fan-out complexity on the server.
Delivery receipts: ACK propagation must be queued for offline senders - receipts are first-class messages, not fire-and-forget signals.
ACK protocol: The four-state delivery receipt (sent, delivered, read, failed) maps directly to the user-facing check marks that users rely on to know their messages are received.
Push notification fallback: Silent push (content-available:1) wakes the app without disturbing the user, allowing the queue drain to happen before the user consciously opens the app.
Message expiry: Self-destruct timers require the expiry state to be tracked per-conversation on both client and server - client enforces UI expiry, server enforces queue deletion.
Re-delivery on reconnect: Sequence-number-based sync from last_ack_seq is idempotent and handles partial failures, network partitions, and server crashes transparently.
Sequence number atomicity: Assign the sequence number atomically at routing time, not after delivery confirmation - this is the guard against reordering on the reconnect path.

The counter-intuitive lesson: the hardest part of end-to-end encrypted messaging is not the encryption itself - it’s the delivery guarantee. You can encrypt a message in microseconds; ensuring it arrives exactly once, in order, on every device, whether online or offline, across server crashes and network partitions, is where the real engineering complexity lives.

Frequently Asked Questions

Q: How does multi-device work if E2E encryption is device-specific? A: In the Signal Protocol used by WhatsApp, the sender encrypts a separate copy of each message for each of the recipient’s registered devices. If a user has 3 devices, the sender’s client creates 3 encrypted payloads using the public key bundle for each device. The server delivers each payload independently. This is why “linked devices” increases encryption work on the sender client linearly with the number of recipient devices.

Q: How do you handle messages sent to a device that has been factory-reset? A: Factory reset invalidates the device’s key material. When the user reinstalls WhatsApp and re-registers, a new device_id and key set are created. The old device_id’s offline queue orphans (it has no device to deliver to) and its messages expire after 30 days. The sender receives FAILED receipts for messages addressed to the old device_id after the queue TTL expires.

Q: Why not use Kafka for the offline queue instead of Cassandra? A: Kafka partitions topics globally - you’d need a partition per user to guarantee per-user ordering, which doesn’t scale to 2 billion users. Cassandra’s partition key (user_id, device_id) gives you exactly-per-device ordering within a partition, with horizontal scaling across partitions. Kafka is better for fan-out to multiple consumers; Cassandra is better for point-to-point per-entity queues with TTL and sparse access patterns.

Q: What happens if a Cassandra write to the offline queue fails? A: The Message Router uses a retry loop with exponential backoff (100ms, 200ms, 400ms, 800ms, 1.6s). If all retries fail, the message is written to a dead-letter queue in a durable store (another Cassandra cluster or S3) and an alert fires. The sender receives no ACK (the WebSocket connection’s write will time out) and retries at the application level. WhatsApp’s client-side retry logic re-sends unacknowledged messages on reconnect, making the retry idempotent via message_id deduplication.

Q: How are group messages handled differently? A: Group messages use sender-key encryption (a variant of Signal Protocol). The sender generates a single sender key for the group and encrypts the message once using that key. The sender key itself is distributed to group members encrypted with each member’s individual key. For delivery, the server maintains a “group member device set” and fans out individual delivery to each member’s devices - but the message payload is the same for all recipients in the same group session. This avoids O(members * devices) message copies for large groups.

Q: How does the system handle the case where a user is switching phones mid-conversation? A: During a device migration, the old device’s session keys are marked as deprecated and the new device registers fresh key material. The old device’s offline queue continues to accumulate messages addressed to its old device_id (which the old device, now reset, cannot decrypt). The new device gets a fresh queue. WhatsApp’s linked-devices flow has the user scan a QR code on the old device before reset, which triggers key re-registration for the new device and re-encrypts pending messages for the new key material.

Interview Questions

Q: How would you design the connection registry to handle 1.2 billion online devices without a single point of failure?

Expected depth: Discuss Redis Cluster with consistent hashing across 32+ shards, each shard replicated (1 primary + 2 replicas), 90-second TTL with heartbeat refresh, Cassandra fallback for registry persistence on Redis failure, and the tradeoff between registry lookup latency (Redis: 0.5ms) and consistency (Redis eventual replication vs. Cassandra quorum reads).

Q: The sequence number assignment and message enqueue must be atomic. How do you achieve this without a distributed transaction?

Expected depth: Discuss Cassandra’s lightweight transactions (IF NOT EXISTS, counter operations), the tradeoff of LWT latency (~10ms for paxos round trip) vs. correctness, alternative approaches using a dedicated sequence service with a Redis INCR (sub-millisecond), and how you’d handle the case where seq assignment succeeds but queue write fails (gap in sequence).

Q: How would you add support for message search without breaking E2E encryption?

Expected depth: Cover client-side search index (search runs on-device, not server), the tradeoff of index storage on device vs. server-side keyword hash matching, Apple’s approach with end-to-end encrypted iCloud backups for search indexes, and why server-side full-text search is fundamentally incompatible with true E2E encryption.

Q: Design the push notification system to handle 300M silent pushes per day while respecting APNs rate limits.

Expected depth: Cover APNs per-device rate limits (not documented but empirically ~1/second per device), coalescing multiple queued messages into a single silent push per device (send one push when queue goes from 0 to N, not one per message), the Push Notification Gateway as a separate service with APNs/FCM connection pools, and handling invalid/expired device tokens (clean up device table on 410 response from APNs).

Q: How would you implement “last seen” and “online” status without revealing precise presence information to all contacts?

Expected depth: Discuss the heartbeat-based liveness model (WebSocket keepalive drives last_seen), privacy settings (show to contacts only, show to nobody), the staleness window for “online” display (show “online” if last_seen < 60s), and why you can’t derive exact online/offline status from WebSocket connection state alone (background app refresh, NAT keepalives).

Premium Content

Unlock the full article along with everything else in the archive — all in one place.

In-depth analysis Expert insights Full archive access

Unlock Full Article