Build a Flash Sale System for 500K Concurrent Checkouts


scalability distributed-systems reliability

System Design Deep Dive

Flash Sale: 500K Concurrent Checkouts

Selling out 10,000 items in seconds without overselling a single unit across 500K simultaneous buyers

⏱ 14 min read📐 Advanced🏗️ Scalability

Think of a concert ticket office on opening day: one window, thousands of people queued around the block, and exactly 1,000 seats available. The clerk must move fast enough that nobody waits forever, accurate enough that seat 1001 is never sold, and fair enough that the person who arrived first gets served first. Oversell by even one ticket and you have a furious customer at the door with no seat. The problem is making sure you never sell seat 1001 - at any scale.

Flash sales are the digital equivalent. When a brand drops a limited-edition product at noon, 500,000 users all click “Buy Now” simultaneously. The inventory is 10,000 units. Every component in the system - the application servers, the cache, the database - gets a wall of traffic that arrives in the same second and drops off just as suddenly. Your system must absorb the spike without overselling a single unit, without a thundering herd collapsing your database, and without making users wait so long they give up.

The tension is three-way: speed (checkout must feel instant, or users abandon), correctness (no overselling - ever, even under partial failure), and fairness (the person who clicked first should get the item, not the person whose request happened to land on a lightly-loaded pod). These three goals pull against each other. The fastest approach - a simple database decrement - collapses under concurrent load. The most correct approach - serialized global locking - is too slow. The fairest approach - a sorted queue - adds latency. The design must find the point where all three are satisfied simultaneously.

Requirements and Constraints

Functional Requirements

  • Handle burst traffic at the moment of sale start, with requests arriving at 50,000+ per second
  • Enforce strict inventory limits: if 10,000 units are available, exactly 10,000 orders may be confirmed - no more
  • Confirm every successfully placed order durably so it can be fulfilled, even if the confirming service crashes
  • Return a clear “sold out” response to users who arrive after inventory is exhausted
  • Allow users to retry a failed checkout without creating duplicate orders

Non-Functional Requirements

  • 500,000 concurrent users at peak, with the first 10 seconds being the most intense
  • Checkout response latency under 500ms at p99 during peak burst
  • Zero oversells: inventory correctness is the hard constraint, not a soft target
  • 99.99% availability for the checkout endpoint during the sale window
  • Graceful degradation: when inventory is exhausted, the system returns fast “sold out” responses rather than queueing new requests

Constraints and Assumptions

  • Inventory count is the absolute source of truth; the cache counter must be reconciled against it
  • Payment processing is asynchronous: an order is confirmed before payment is captured, with fulfillment gated on payment success
  • Users are authenticated before reaching the checkout flow; auth is not in scope
  • A user may only purchase one unit per sale event (per-user rate limiting enforced upstream)

High-Level Architecture

The system has seven major layers. The Load Balancer absorbs the burst and enforces per-IP rate limits. The Queue Gateway (virtual waiting room) is the admission control layer that converts a simultaneous spike into a manageable stream. The Checkout Service drives the checkout state machine. The Inventory Service wraps the Redis atomic counter and the reservation table. The Order Service creates durable order records with idempotency guarantees. The Payment Service handles async payment capture. The Cache Layer (Redis) is the primary inventory counter, backed by the database as the source of truth.

Flash sale system architecture showing Load Balancer, Queue Gateway virtual waiting room, Checkout Service, Inventory Service with Redis, Order Service with database, and Payment Service

The key architectural decision is the placement of the Queue Gateway before the Checkout Service. Without it, 500K concurrent requests hit the Checkout Service simultaneously, causing thundering herd on Redis and the database. With it, the Queue Gateway assigns each incoming request a queue position and holds the HTTP connection open (or returns a polling token), then allows checkout workers to drain the queue at a controlled rate - matching the throughput of the inventory and order systems.

Key Insight

The virtual waiting room is the architectural decision that makes everything else tractable. By serializing the burst at the edge, the downstream inventory and order systems see steady-state load rather than a tsunami. The queue absorbs the spike so that the atomic inventory counter and idempotent order creation can work correctly without being overwhelmed.

The Inventory Counter Design

The naive approach to inventory management is a database row with a count column and a WHERE count > 0 check before decrement. Under 500K concurrent requests, this creates a massive lock contention problem: every transaction races to read-modify-write the same row, throughput collapses, and the database primary becomes the bottleneck.

The correct approach is a cache-backed inventory counter using Redis’s DECR command, which is atomic at the single-key level. Redis processes commands in a single thread, so DECR inventory:product:42 is guaranteed to be serialized - no two decrements happen simultaneously, and the counter can never go below zero if we add a Lua check.

# Atomic inventory decrement with floor check using Redis Lua script
import redis

r = redis.Redis(host='redis-primary', port=6379, decode_responses=True)

# Load the atomic decrement Lua script
ATOMIC_DECR = r.register_script("""
local key = KEYS[1]
local current = tonumber(redis.call('GET', key))
if current == nil then
    return -2  -- key does not exist (sale not initialized)
end
if current <= 0 then
    return -1  -- sold out
end
return redis.call('DECR', key)
""")

def try_decrement_inventory(product_id: str) -> int:
    """
    Returns:
      >= 0: new inventory level after decrement (success)
      -1: sold out
      -2: sale not initialized
    Raises on Redis connection failure.
    """
    key = f"inventory:product:{product_id}"
    result = ATOMIC_DECR(keys=[key])
    return int(result)

# Initialize inventory at sale start
def initialize_inventory(product_id: str, count: int, ttl_seconds: int = 3600):
    key = f"inventory:product:{product_id}"
    r.set(key, count, ex=ttl_seconds, nx=True)  # nx=True: only set if not exists

The optimistic locking pattern applies when writing the final inventory state back to the database. Rather than using a database-level lock, we attach a version number to the inventory row and reject updates where the version has changed since we read it - detecting concurrent modifications without holding a lock.

# Optimistic locking for DB inventory sync
def sync_inventory_to_db(product_id: str, expected_version: int, new_count: int, db_conn):
    result = db_conn.execute("""
        UPDATE inventory
        SET quantity = %s,
            version = version + 1,
            updated_at = NOW()
        WHERE product_id = %s
          AND version = %s
    """, (new_count, product_id, expected_version))
    if result.rowcount == 0:
        raise OptimisticLockError(f"Inventory row for {product_id} was modified concurrently")
    return True
Watch Out

The Redis counter and the database inventory count can diverge if Redis crashes between a DECR and the corresponding order commit. You need a periodic reconciliation job (covered in the Oversell Reconciliation section) that compares the committed order count against the database inventory row and resets the Redis counter to match. Never trust the Redis counter alone as the final source of truth - it is a fast-path gate, not the ledger.

Distributed Locking and Reservation

Decrementing the Redis counter claims a unit, but the user has not yet completed payment. Between the DECR and the order being marked “confirmed,” the inventory unit is in a limbo state - decremented from the counter but not yet durably committed. If the checkout service crashes at this point, the unit is effectively lost.

The solution is a soft inventory reservation with a TTL. When a user passes through the Queue Gateway, the system creates a reservation record: “user X has 90 seconds to complete checkout for product Y.” This reservation atomically decrements the Redis counter AND writes a reservation row to the database. If the user completes checkout, the reservation converts to an order. If the TTL expires, the reservation is released and the unit is returned to the counter.

The atomic reserve-and-decrement operation uses a distributed lock via a Redis Lua script to make the two operations (counter DECR + reservation write signal) appear as a single transaction:

-- Lua script: atomic reserve (DECR counter + write reservation key)
-- KEYS[1] = inventory counter key
-- KEYS[2] = reservation key for this user+product
-- ARGV[1] = reservation TTL in seconds
-- ARGV[2] = reservation value (user_id:product_id:timestamp)
local inv_key = KEYS[1]
local res_key = KEYS[2]
local ttl = tonumber(ARGV[1])
local res_val = ARGV[2]

-- Check if user already has an active reservation (idempotent)
if redis.call('EXISTS', res_key) == 1 then
    return 0  -- already reserved, return success without double-decrementing
end

-- Check inventory
local current = tonumber(redis.call('GET', inv_key))
if current == nil or current <= 0 then
    return -1  -- sold out
end

-- Atomic: decrement inventory and set reservation with TTL
redis.call('DECR', inv_key)
redis.call('SET', res_key, res_val, 'EX', ttl)
return 1  -- success
RESERVE_SCRIPT = r.register_script(open('reserve.lua').read())

def reserve_inventory(product_id: str, user_id: str, ttl: int = 90) -> bool:
    inv_key = f"inventory:product:{product_id}"
    res_key = f"reservation:{product_id}:{user_id}"
    res_val = f"{user_id}:{product_id}:{int(time.time())}"
    result = RESERVE_SCRIPT(keys=[inv_key, res_key], args=[ttl, res_val])
    return int(result) >= 0
Virtual waiting room queue design showing HTTP admission control, queue position assignment, and fixed-rate worker pool draining the queue with first-come-first-served ordering
Real World

Amazon uses virtual queues (waiting rooms) for high-demand product launches. Ticketmaster’s “Verified Fan” presale system uses a queue with randomized ordering to prevent bot advantage, then drains through a fixed worker pool. Both systems use reservation TTLs: if you hold a concert ticket in your cart for more than 10 minutes without completing purchase, it is released back to the pool. The TTL is the safety valve that prevents inventory from being permanently locked by abandoned carts.

Queue-Based Checkout

The queue-based checkout pattern converts the spike of 500K simultaneous HTTP requests into a bounded stream of checkout operations. The Queue Gateway accepts every incoming request immediately (returning a queue position token), then a pool of checkout worker processes drains the queue at a controlled rate - typically matching the throughput ceiling of the inventory and order systems.

The queue is implemented with Kafka or SQS. Each checkout attempt becomes a message. Consumer workers pull from the queue, attempt the full checkout flow (reserve inventory, create order, trigger payment), and ack the message only on success. On failure, the message is retried with backoff up to a configurable maximum.

Data flow diagram showing user clicking Buy, entering the checkout queue, worker picking up the message, checking Redis inventory, atomically reserving, creating an idempotent order, triggering payment, and returning confirmation
# Kafka consumer worker: processes checkout queue messages
from confluent_kafka import Consumer, KafkaError
import json
import logging

log = logging.getLogger(__name__)

KAFKA_CONFIG = {
    'bootstrap.servers': 'kafka-1:9092,kafka-2:9092,kafka-3:9092',
    'group.id': 'checkout-workers',
    'auto.offset.reset': 'earliest',
    'enable.auto.commit': False,          # manual commit after processing
    'max.poll.interval.ms': 30000,
    'session.timeout.ms': 10000,
}

def run_checkout_worker(inventory_svc, order_svc, payment_svc):
    consumer = Consumer(KAFKA_CONFIG)
    consumer.subscribe(['checkout-queue'])

    while True:
        msg = consumer.poll(timeout=1.0)
        if msg is None:
            continue
        if msg.error():
            if msg.error().code() == KafkaError._PARTITION_EOF:
                continue
            log.error("Kafka error: %s", msg.error())
            continue

        payload = json.loads(msg.value().decode('utf-8'))
        user_id = payload['user_id']
        product_id = payload['product_id']
        idempotency_key = payload['idempotency_key']

        try:
            # Step 1: Reserve inventory (atomic Lua script)
            reserved = inventory_svc.reserve(product_id, user_id, ttl=90)
            if not reserved:
                notify_user(user_id, status='sold_out')
                consumer.commit(msg)
                continue

            # Step 2: Create order (idempotent)
            order = order_svc.create_order(
                user_id=user_id,
                product_id=product_id,
                idempotency_key=idempotency_key,
            )

            # Step 3: Trigger async payment
            payment_svc.enqueue_payment(order_id=order.id, user_id=user_id)

            notify_user(user_id, status='confirmed', order_id=order.id)
            consumer.commit(msg)

        except Exception as e:
            log.error("Checkout failed for user %s: %s", user_id, e)
            # Do NOT commit - message will be redelivered and retried
            # Reservation TTL will auto-release if order creation fails repeatedly
Key Insight

The fairness guarantee of queue-based checkout comes from Kafka partition ordering: messages within a partition are processed in the order they were produced. By hashing checkout requests to partitions by arrival timestamp (or using a sequential token assigned at the Queue Gateway), first-come-first-served ordering is structurally enforced at the queue layer, not via application logic. Users who arrived first are in the queue first and will be processed first.

Idempotent Order Creation

When a checkout worker creates an order, it may fail mid-way: the database write succeeds, but the network drops before the worker gets the acknowledgment. The worker retries, potentially creating a duplicate order. The user ends up paying twice for one item - a catastrophic UX and accounting failure.

Idempotent order creation prevents this by attaching a unique idempotency_key to every checkout attempt. The key is generated by the Queue Gateway when the user submits their checkout - it is stable across retries for the same checkout attempt. The Order Service enforces uniqueness on this key at the database layer, turning any retry into a no-op that returns the existing order.

-- Orders table with idempotency key enforcement
CREATE TABLE orders (
    order_id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    idempotency_key TEXT NOT NULL,
    user_id         TEXT NOT NULL,
    product_id      TEXT NOT NULL,
    status          TEXT NOT NULL DEFAULT 'pending'
                    CHECK (status IN ('pending', 'confirmed', 'cancelled', 'fulfilled')),
    quantity        INTEGER NOT NULL DEFAULT 1,
    unit_price_paise INTEGER NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    -- Unique constraint enforces idempotency at the DB layer
    CONSTRAINT uq_orders_idempotency UNIQUE (idempotency_key),

    -- Prevent one user from having multiple pending orders per product
    CONSTRAINT uq_user_product_active UNIQUE (user_id, product_id)
        DEFERRABLE INITIALLY DEFERRED
);

CREATE INDEX idx_orders_user ON orders (user_id, created_at DESC);
CREATE INDEX idx_orders_product ON orders (product_id, status);
CREATE INDEX idx_orders_status ON orders (status, created_at) WHERE status = 'pending';
# Idempotent order creation using INSERT ... ON CONFLICT DO NOTHING
def create_order(db_conn, user_id: str, product_id: str,
                 idempotency_key: str, unit_price_paise: int) -> dict:
    """
    Creates an order or returns the existing one if idempotency_key already used.
    Never raises on duplicate - always returns the canonical order.
    """
    result = db_conn.execute("""
        INSERT INTO orders (idempotency_key, user_id, product_id, unit_price_paise)
        VALUES (%s, %s, %s, %s)
        ON CONFLICT (idempotency_key) DO NOTHING
        RETURNING order_id, status, created_at
    """, (idempotency_key, user_id, product_id, unit_price_paise))

    if result.rowcount == 0:
        # ON CONFLICT path: fetch the existing order
        existing = db_conn.execute("""
            SELECT order_id, status, created_at
            FROM orders WHERE idempotency_key = %s
        """, (idempotency_key,)).fetchone()
        return {'order_id': existing[0], 'status': existing[1], 'duplicate': True}

    row = result.fetchone()
    return {'order_id': row[0], 'status': row[1], 'duplicate': False}
Watch Out

The idempotency key must be generated before the checkout attempt enters the queue, not inside the worker. If the worker generates the key, a retry (new worker invocation) generates a new key and the duplicate protection fails. Generate the key at the Queue Gateway when the user submits checkout, embed it in the queue message, and keep it stable for the lifetime of that checkout attempt.

Oversell Reconciliation

Even with atomic Redis counters and idempotent order creation, oversells can happen in edge cases: Redis crashes and is restored from a backup that predates several DECRs; a Lua script times out mid-execution and the Redis state is uncertain; a network partition causes the same reservation to be claimed on two Redis replicas before replication catches up.

Oversell reconciliation is the safety net that detects and corrects these anomalies after the fact. A periodic reconciliation job compares the number of confirmed orders against the inventory ceiling for each product and flags any product where orders exceed inventory.

-- Detect oversells: products where confirmed orders exceed inventory
SELECT
    o.product_id,
    i.quantity AS inventory_ceiling,
    COUNT(o.order_id) AS confirmed_order_count,
    COUNT(o.order_id) - i.quantity AS oversell_count
FROM orders o
JOIN inventory i ON i.product_id = o.product_id
WHERE o.status IN ('confirmed', 'fulfilled')
  AND o.created_at >= i.sale_start_at
GROUP BY o.product_id, i.quantity
HAVING COUNT(o.order_id) > i.quantity
ORDER BY oversell_count DESC;

When an oversell is detected, the system must decide between two compensation strategies:

  • Cancellation: Cancel the most recent excess orders (last-in, first-cancelled), notify users with a full refund. Simpler operationally but causes user trust damage.
  • Fulfillment-first: Source additional inventory from a buffer pool or a secondary supplier to fulfill all confirmed orders. Requires pre-arrangement but maintains user trust.

The reconciliation job also resets the Redis counter to match the actual remaining inventory (inventory ceiling minus confirmed orders), healing any counter drift.

# Reconciliation job: detect oversells and reset Redis counter
def reconcile_inventory(product_id: str, db_conn, redis_client):
    row = db_conn.execute("""
        SELECT i.quantity,
               COALESCE(COUNT(o.order_id), 0) AS confirmed_count
        FROM inventory i
        LEFT JOIN orders o
            ON o.product_id = i.product_id
            AND o.status IN ('confirmed', 'fulfilled')
            AND o.created_at >= i.sale_start_at
        WHERE i.product_id = %s
        GROUP BY i.quantity
    """, (product_id,)).fetchone()

    ceiling, confirmed = row[0], row[1]
    remaining = max(0, ceiling - confirmed)
    redis_key = f"inventory:product:{product_id}"

    # Reset Redis counter to authoritative DB value
    redis_client.set(redis_key, remaining)
    return {'oversell': confirmed > ceiling, 'excess': max(0, confirmed - ceiling)}
Real World

Shopify runs reconciliation jobs after every major flash sale to verify that Redis-backed inventory counters match the order ledger. When discrepancies are found, Shopify’s default strategy is to honor all confirmed orders and absorb the overstock cost, which is typically less than the reputational damage of cancelling orders. For very large oversells (more than 5% above ceiling), a manual review process is triggered before any cancellation decisions are made.

Data Model

The data model has three core tables: inventory (the ceiling and state for each sale), reservations (soft holds with TTL, mirroring the Redis reservation keys), and orders (the durable ledger of confirmed purchases).

-- Inventory table: one row per product per sale event
CREATE TABLE inventory (
    product_id      TEXT NOT NULL,
    sale_id         TEXT NOT NULL,
    quantity        INTEGER NOT NULL CHECK (quantity >= 0),
    version         BIGINT NOT NULL DEFAULT 1,
    sale_start_at   TIMESTAMPTZ NOT NULL,
    sale_end_at     TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (product_id, sale_id)
);

-- Sharding key: product_id routes all inventory reads/writes to one shard,
-- avoiding cross-shard joins during checkout.
CREATE INDEX idx_inventory_sale ON inventory (sale_id, sale_start_at);

-- Reservations table: soft holds during checkout (TTL enforced by app layer)
CREATE TABLE reservations (
    reservation_id  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         TEXT NOT NULL,
    product_id      TEXT NOT NULL,
    sale_id         TEXT NOT NULL,
    status          TEXT NOT NULL DEFAULT 'active'
                    CHECK (status IN ('active', 'converted', 'expired', 'released')),
    expires_at      TIMESTAMPTZ NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    -- One active reservation per user per product per sale
    CONSTRAINT uq_reservation_user_product UNIQUE (user_id, product_id, sale_id)
        WHERE status = 'active'
);

CREATE INDEX idx_reservations_expiry ON reservations (expires_at)
    WHERE status = 'active';
CREATE INDEX idx_reservations_user ON reservations (user_id, created_at DESC);

-- Orders table (shown in full in Idempotent Order Creation section above)
-- Sharding key: user_id for orders (user queries by their own orders)
-- product_id index for inventory reconciliation queries

The sharding strategy separates concerns: inventory shards by product_id to keep all inventory operations for one product on one shard, avoiding distributed transactions. orders shards by user_id to keep “my orders” queries fast, with a secondary index on product_id for the reconciliation job. The reservations table is small enough to fit on a single shard per sale event.

Key Algorithms and Protocols

Atomic DECR-Check Pattern (Redis Lua)

The core inventory operation must be atomic: check that inventory is greater than zero AND decrement, with no window for another process to slip in between. The Lua approach shown earlier achieves this. The Go equivalent is:

// Go: atomic inventory decrement via Redis Lua
package inventory

import (
    "context"
    "errors"
    "github.com/redis/go-redis/v9"
)

var ErrSoldOut = errors.New("inventory: sold out")
var ErrNotInit = errors.New("inventory: sale not initialized")

var atomicDecr = redis.NewScript(`
    local current = tonumber(redis.call('GET', KEYS[1]))
    if current == nil then return -2 end
    if current <= 0 then return -1 end
    return redis.call('DECR', KEYS[1])
`)

func AtomicDecrement(ctx context.Context, rdb *redis.Client, productID string) (int64, error) {
    key := "inventory:product:" + productID
    result, err := atomicDecr.Run(ctx, rdb, []string{key}).Int64()
    if err != nil {
        return 0, err
    }
    switch result {
    case -2:
        return 0, ErrNotInit
    case -1:
        return 0, ErrSoldOut
    default:
        return result, nil
    }
}
Key Insight

Redis Lua scripts execute atomically on the Redis server: no other command can interleave during script execution. This is stronger than a compare-and-swap because it fuses the read and write into one uninterruptible operation. The script never blocks other clients because Redis is single-threaded - it processes the Lua script to completion before handling the next command in the queue.

Two-Phase Reservation (Reserve Then Commit)

The reservation protocol has two phases. Phase 1 (reserve): atomically decrement the Redis counter and write a Redis reservation key with TTL. This is the fast path - it completes in under 5ms. Phase 2 (commit): after order creation succeeds in the database, mark the reservation as “converted” and write the final order record. If Phase 2 fails, the reservation TTL expires and Phase 1 is automatically reversed.

# Two-phase reservation with automatic rollback via TTL
import time
from dataclasses import dataclass
from enum import Enum

class ReservationStatus(Enum):
    ACTIVE = "active"
    CONVERTED = "converted"
    EXPIRED = "expired"

@dataclass
class Reservation:
    user_id: str
    product_id: str
    expires_at: float
    status: ReservationStatus = ReservationStatus.ACTIVE

def two_phase_checkout(user_id, product_id, idempotency_key,
                       price_paise, inventory_svc, order_svc, db_conn):
    # Phase 1: Reserve (fast, atomic, in-memory)
    reserved = inventory_svc.reserve(product_id, user_id, ttl=90)
    if not reserved:
        return {'status': 'sold_out'}

    try:
        # Phase 2: Commit (durable, idempotent)
        order = order_svc.create_order(
            db_conn, user_id, product_id, idempotency_key, price_paise
        )
        inventory_svc.mark_reservation_converted(product_id, user_id)
        return {'status': 'confirmed', 'order_id': order['order_id']}

    except Exception as e:
        # Phase 2 failed: reservation will auto-expire via TTL
        # Explicitly release early to return inventory faster
        inventory_svc.release_reservation(product_id, user_id)
        raise

Optimistic Locking with Version Vectors

For multi-product flash sales where the inventory table is updated by a reconciliation job, optimistic locking with a version counter prevents stale reads from overwriting fresh data.

# Optimistic locking with retry on version conflict
import time

MAX_RETRIES = 3
RETRY_BASE_DELAY = 0.05  # 50ms

def update_inventory_with_retry(db_conn, product_id, delta, max_retries=MAX_RETRIES):
    for attempt in range(max_retries):
        row = db_conn.execute("""
            SELECT quantity, version FROM inventory
            WHERE product_id = %s FOR UPDATE SKIP LOCKED
        """, (product_id,)).fetchone()

        if not row:
            raise ValueError(f"No inventory row for {product_id}")

        current_qty, version = row
        new_qty = current_qty + delta

        if new_qty < 0:
            raise ValueError("Insufficient inventory")

        updated = db_conn.execute("""
            UPDATE inventory
            SET quantity = %s, version = version + 1, updated_at = NOW()
            WHERE product_id = %s AND version = %s
        """, (new_qty, product_id, version))

        if updated.rowcount == 1:
            return new_qty  # success

        # Optimistic lock conflict: back off and retry
        time.sleep(RETRY_BASE_DELAY * (2 ** attempt))

    raise RuntimeError(f"Failed to update inventory after {max_retries} retries")
Key Insight

The FOR UPDATE SKIP LOCKED clause is the key that allows multiple reconciliation workers to run in parallel without blocking each other. Each worker picks a product that is not currently being updated by another worker, processes it, and moves on. This is the same pattern PostgreSQL uses for job queues and is dramatically more efficient than FOR UPDATE (which blocks on locked rows) or no locking (which causes lost updates).

Scaling and Performance

Scaling diagram showing multiple checkout worker pods consuming from a partitioned Kafka queue, Redis cluster sharding inventory counters per product, and read replica order databases

Back-of-envelope calculations:

Given 500K concurrent users and 10K inventory units:

  • Queue depth at peak: 500,000 messages arrive in the first second. Queue needs to handle 500K messages instantly - Kafka at 3 brokers handles millions of messages per second with no problem.
  • Worker throughput: Each worker processes 1 checkout in ~50ms (5ms Redis + 10ms DB write + 35ms overhead). At 100 workers, throughput is 2,000 checkouts/second.
  • Time to sell out: 10,000 units at 2,000/second = 5 seconds. The first 10K messages in the queue get inventory; the remaining 490K get “sold out” responses.
  • Time to drain queue: 490K “sold out” responses at 2,000/second = 245 seconds. To drain faster, scale workers to 1,000 - then 490K/20,000 = 24.5 seconds to clear the queue.
  • Redis DECR throughput: Redis processes ~100,000 commands/second on a single instance. At peak 50K checkouts/second, you need 2 Redis primaries or Redis Cluster with 2 shards.
  • Redis Cluster sharding: Shard by product_id. For a single product flash sale, one Redis primary handles all DECRs. For multi-product sales, distribute products across shards.
  • Database write throughput: Order inserts at 2,000/second require a write-optimized Postgres config (synchronous_commit=off for the order table, or use CockroachDB for horizontal write scaling).
  • Memory for 500K queue entries: Each Kafka message is ~500 bytes (user_id, product_id, idempotency_key, timestamp). 500K messages = 250 MB. Trivial for Kafka.
Real World

Shopify’s infrastructure team documented handling flash sales with over 40,000 checkouts per minute for major brands like Supreme. Their approach uses a queue-based checkout system with Redis atomic counters and Kafka for durability. During the 2022 Adidas Yeezy restocks, Shopify processed tens of thousands of concurrent checkout attempts by using virtual waiting rooms that throttled the checkout rate to match backend capacity, returning queue position tokens to users rather than letting all requests hit the inventory system simultaneously.

Failure Modes and Recovery

FailureDetectionImpactRecovery
Redis primary crashHealth check fails, Redis Sentinel promotes replicaCounter lost or stale; potential oversell windowReconciliation job resets counter from DB; reservation TTLs prevent permanent lock
Queue overflow (Kafka partition full)Consumer lag metric spikesNew checkout requests rejected at Queue GatewayScale Kafka brokers; Queue Gateway returns “try again” with backoff
DB primary failureConnection errors on order writesOrder creation fails; workers retry with idempotencyPostgres failover to replica (60-120s); workers retry until DB available
Payment gateway timeoutPayment request times out after 30sOrder confirmed but payment not capturedAsync retry job retries payment capture up to 3 times; cancel order after 3 failures
Worker pod crash mid-checkoutKafka message not committed; message redeliveredDuplicate checkout attempt on redeliverIdempotency key prevents duplicate order; Redis reservation check prevents double-DECR
Clock skew on reservation TTLReservation expires prematurelyUser loses reservation before completing checkoutUse Redis TTL (server clock) not client clock for all TTL decisions
Watch Out

The most dangerous failure mode is Redis recovering from a stale snapshot after a crash. If Redis restores from a backup that predates 1,000 DECRs, the counter is now 1,000 higher than reality. Every subsequent checkout will succeed against this inflated counter until the next reconciliation run. To minimize the window, run the reconciliation job on Redis startup as part of the recovery procedure, and configure Redis to use AOF (append-only file) persistence so the counter can be replayed from the log rather than restored from a point-in-time snapshot.

Comparison of Approaches

ApproachThroughputOversell RiskComplexityBest Fit
Optimistic DB locking (SELECT FOR UPDATE)Low - lock contention at 500K concurrentLow if implemented correctlyMediumLow-traffic sales where DB is not the bottleneck
Pessimistic DB locking (row-level lock)Very low - serializes all checkouts at DBVery lowLowTesting and admin tools only; not production flash sales
Redis atomic counter (this design)Very high - 100K ops/sec per Redis nodeNear-zero with Lua script + reconciliationMedium-HighFlash sales with burst traffic and tight inventory
Queue-based serialization (this design)High and controllable - tuned by worker countVery low - sequential processingHighAny flash sale where fairness and correctness are required
Distributed lock (Redlock across 5 nodes)Medium - lock acquisition adds 5-15msVery low - strict mutual exclusionHighMulti-step operations that need strong consistency

The Redis atomic counter and queue-based serialization are complementary, not competing. The Redis counter is the fast gate that answers “is there inventory left?” in under 1ms. The queue is the fairness and durability layer that ensures the answer is processed correctly. Together they form the core of a production flash sale system.

Key Takeaways

  • Cache-backed inventory counter is the fast gate - Redis DECR via Lua script is atomic, sub-millisecond, and handles 100K ops/second per node. It is the only realistic mechanism for inventory checks under 500K concurrent load.
  • Virtual waiting room converts a spike into a stream - without a queue-based checkout, 500K simultaneous requests create thundering herd on every downstream system. The queue absorbs the burst so workers see steady-state load.
  • Inventory reservation with TTL is the safety net - soft reserves prevent inventory from being permanently locked by abandoned carts. The TTL automatically releases unclaimed units back to the pool.
  • Idempotency keys are non-negotiable - every checkout attempt must carry a stable idempotency key generated before the request enters the queue. Without it, Kafka redelivery creates duplicate orders and double charges.
  • Optimistic locking protects the database layer - version-based conflict detection on the inventory row prevents reconciliation writes from overwriting each other when multiple jobs run in parallel.
  • Distributed locks via Lua scripts are stronger than application-level locks - they execute atomically inside Redis with no window for concurrent modification, and they self-expire via TTL if the holding process crashes.
  • Oversell reconciliation is the integrity backstop - even with atomic counters, edge cases (Redis restart, network partition) can cause counter drift. A periodic job comparing order count to inventory ceiling detects and heals all anomalies.
  • Fairness requires queue position assignment at admission - to guarantee first-come-first-served, queue position must be assigned when the request enters the Queue Gateway, not when it is processed by a worker. Kafka partition ordering preserves this sequence through to processing.

Frequently Asked Questions

Q: Why not just use a database transaction with SELECT FOR UPDATE for inventory instead of Redis?

A: At 50,000 checkouts per second, a SELECT FOR UPDATE on the inventory row creates a global bottleneck: every checkout serializes through one database row lock. Postgres can handle roughly 5,000-10,000 row-level lock acquisitions per second under ideal conditions, so you’d need 5-10x your actual capacity just to handle the lock contention. Redis atomic operations sidestep this by using in-memory execution on a single thread - no lock manager, no transaction overhead, just sequential command processing.

Q: Why not use a distributed lock (Redlock) for the entire checkout flow?

A: Redlock acquires a lock across 5 Redis nodes, which takes 5-15ms in the best case. At 50K checkouts/second with a 15ms lock window, you’d need 750 concurrent lock holders - which defeats the serialization goal. Redlock is appropriate for protecting a critical section that runs in under 1ms. For anything longer, a queue-based approach where messages are processed sequentially provides the same mutual exclusion without the lock timeout overhead.

Q: How do you handle the case where a user’s payment fails after their order is confirmed?

A: Order confirmation and payment capture are decoupled. Confirmation means “you have a reservation and an order record.” Payment capture is async. If payment fails after 3 retries (configurable), the order transitions to “cancelled,” the reservation is released, and the inventory unit is returned to the Redis counter via an explicit INCR. This is a compensating transaction: it reverses the DECR that happened during reservation.

Q: Why use Kafka instead of SQS for the checkout queue?

A: Kafka preserves message ordering within a partition, which is critical for the fairness guarantee (first-come-first-served). SQS standard queues offer no ordering guarantee. SQS FIFO queues offer ordering but are limited to 3,000 messages/second per queue - far below the 500K messages in the first second of a flash sale. Kafka at 3 brokers handles millions of messages/second with ordering per partition.

Q: What happens if the Queue Gateway is down when the sale starts?

A: The Queue Gateway is the admission control bottleneck, so it must be highly available. Run it as a horizontally scaled stateless service behind a load balancer, with multiple replicas across availability zones. The queue position counter (used for fairness ordering) is stored in Redis with replication. If the Queue Gateway loses Redis connectivity, it falls back to a simple timestamp-based ordering from the request arrival time. If the Queue Gateway itself goes down, new checkout requests queue at the load balancer until the service recovers.

Q: How do you prevent bots from holding all reservations without completing checkout?

A: Three layers of defense. First, per-user rate limiting enforced at the load balancer (one checkout attempt per user per sale). Second, CAPTCHA or browser fingerprinting at the Queue Gateway to filter automated traffic before it enters the queue. Third, short reservation TTLs (90 seconds) so any reservation not converted to an order within 90 seconds is automatically released - bots that hold reservations without completing payment lose them quickly.

Interview Questions

Q: Walk me through how you prevent overselling when 500K users click Buy simultaneously.

Expected depth: Describe the two-layer protection: first, the Redis Lua script that atomically checks-and-decrements the counter (preventing concurrent DECRs from going below zero); second, the queue-based checkout that serializes processing so only one worker handles each inventory unit at a time. Mention that the Redis counter is the fast gate but not the final source of truth - the database inventory row with optimistic locking is the ledger. Discuss the reconciliation job as the backstop that detects counter drift caused by Redis restarts or network partitions.

Q: What happens if your Redis inventory counter crashes mid-sale and restarts with a stale count?

Expected depth: Explain that a stale Redis restart creates an inflated counter (the Redis backup predates some DECRs, so the counter shows more inventory than actually remains). This creates an oversell window until the next reconciliation run. To minimize this: configure Redis AOF persistence (append-only log) so the counter is replayed from a complete command log rather than a point-in-time snapshot; run reconciliation immediately on Redis startup as part of the recovery procedure; alert on any discrepancy between Redis count and DB confirmed order count that exceeds a threshold.

Q: How would you ensure a user who clicks Buy at 12:00:00.001 gets priority over one who clicks at 12:00:00.500?

Expected depth: The Queue Gateway must assign a sequence number or timestamp at the moment the HTTP request arrives - not when it enters the Kafka queue or when a worker picks it up. The sequence number becomes part of the queue message and determines the Kafka partition and offset. Kafka guarantees ordering within a partition, so messages with earlier sequence numbers are processed first. Workers must process messages in offset order (no parallel processing within a partition) to preserve the fairness guarantee. Discuss the tradeoff: strict ordering requires single-threaded partition consumption, which limits partition throughput. For very high fairness requirements, use more partitions with smaller user segments per partition.

Q: How would you design the system to handle a multi-product flash sale where 50 products are released simultaneously?

Expected depth: Shard the Redis inventory counters by product_id - each product gets its own Redis key, potentially on a different Redis node in a cluster. Shard the Kafka checkout queue by product_id as well - each product gets its own set of partitions, with dedicated worker groups consuming each product’s queue. This prevents a “hot product” from blocking checkout for other products. The Order Service can handle all products on the same database (with product_id indexed), but the inventory and checkout layers are fully parallel. Discuss the capacity implications: 50 products at 500K users each would require 50x the Redis and Kafka capacity, which in practice is mitigated by users distributing their interest across products.

Q: Your idempotency key deduplication uses a UNIQUE constraint. What happens under high write load when many workers hit the same constraint?

Expected depth: The ON CONFLICT DO NOTHING pattern is efficient because the conflict check is a B-tree index lookup on the unique constraint - O(log n) per insert. Under high insert rate, the B-tree gets write-contended at the leaf nodes. Mitigation: use a hash-indexed unique constraint rather than B-tree (PostgreSQL supports CREATE UNIQUE INDEX USING hash), which provides O(1) lookup. Additionally, partition the orders table by created_at date so the index is smaller and less contended. For extreme write rates (over 50K inserts/second), consider pre-sharding by hashing the idempotency_key and routing to different database shards, so the unique constraint is enforced per-shard with a fallback cross-shard check.

Premium Content

Unlock the full article along with everything else in the archive — all in one place.

In-depth analysis Expert insights Full archive access
Unlock Full Article