Build a Flash Sale System for 500K Concurrent Checkouts
scalability distributed-systems reliability
System Design Deep Dive
Flash Sale: 500K Concurrent Checkouts
Selling out 10,000 items in seconds without overselling a single unit across 500K simultaneous buyers
Think of a concert ticket office on opening day: one window, thousands of people queued around the block, and exactly 1,000 seats available. The clerk must move fast enough that nobody waits forever, accurate enough that seat 1001 is never sold, and fair enough that the person who arrived first gets served first. Oversell by even one ticket and you have a furious customer at the door with no seat. The problem is making sure you never sell seat 1001 - at any scale.
Flash sales are the digital equivalent. When a brand drops a limited-edition product at noon, 500,000 users all click “Buy Now” simultaneously. The inventory is 10,000 units. Every component in the system - the application servers, the cache, the database - gets a wall of traffic that arrives in the same second and drops off just as suddenly. Your system must absorb the spike without overselling a single unit, without a thundering herd collapsing your database, and without making users wait so long they give up.
The tension is three-way: speed (checkout must feel instant, or users abandon), correctness (no overselling - ever, even under partial failure), and fairness (the person who clicked first should get the item, not the person whose request happened to land on a lightly-loaded pod). These three goals pull against each other. The fastest approach - a simple database decrement - collapses under concurrent load. The most correct approach - serialized global locking - is too slow. The fairest approach - a sorted queue - adds latency. The design must find the point where all three are satisfied simultaneously.
Requirements and Constraints
Functional Requirements
- Handle burst traffic at the moment of sale start, with requests arriving at 50,000+ per second
- Enforce strict inventory limits: if 10,000 units are available, exactly 10,000 orders may be confirmed - no more
- Confirm every successfully placed order durably so it can be fulfilled, even if the confirming service crashes
- Return a clear “sold out” response to users who arrive after inventory is exhausted
- Allow users to retry a failed checkout without creating duplicate orders
Non-Functional Requirements
- 500,000 concurrent users at peak, with the first 10 seconds being the most intense
- Checkout response latency under 500ms at p99 during peak burst
- Zero oversells: inventory correctness is the hard constraint, not a soft target
- 99.99% availability for the checkout endpoint during the sale window
- Graceful degradation: when inventory is exhausted, the system returns fast “sold out” responses rather than queueing new requests
Constraints and Assumptions
- Inventory count is the absolute source of truth; the cache counter must be reconciled against it
- Payment processing is asynchronous: an order is confirmed before payment is captured, with fulfillment gated on payment success
- Users are authenticated before reaching the checkout flow; auth is not in scope
- A user may only purchase one unit per sale event (per-user rate limiting enforced upstream)
High-Level Architecture
The system has seven major layers. The Load Balancer absorbs the burst and enforces per-IP rate limits. The Queue Gateway (virtual waiting room) is the admission control layer that converts a simultaneous spike into a manageable stream. The Checkout Service drives the checkout state machine. The Inventory Service wraps the Redis atomic counter and the reservation table. The Order Service creates durable order records with idempotency guarantees. The Payment Service handles async payment capture. The Cache Layer (Redis) is the primary inventory counter, backed by the database as the source of truth.
The key architectural decision is the placement of the Queue Gateway before the Checkout Service. Without it, 500K concurrent requests hit the Checkout Service simultaneously, causing thundering herd on Redis and the database. With it, the Queue Gateway assigns each incoming request a queue position and holds the HTTP connection open (or returns a polling token), then allows checkout workers to drain the queue at a controlled rate - matching the throughput of the inventory and order systems.
The virtual waiting room is the architectural decision that makes everything else tractable. By serializing the burst at the edge, the downstream inventory and order systems see steady-state load rather than a tsunami. The queue absorbs the spike so that the atomic inventory counter and idempotent order creation can work correctly without being overwhelmed.
The Inventory Counter Design
The naive approach to inventory management is a database row with a count column and a WHERE count > 0 check before decrement. Under 500K concurrent requests, this creates a massive lock contention problem: every transaction races to read-modify-write the same row, throughput collapses, and the database primary becomes the bottleneck.
The correct approach is a cache-backed inventory counter using Redis’s DECR command, which is atomic at the single-key level. Redis processes commands in a single thread, so DECR inventory:product:42 is guaranteed to be serialized - no two decrements happen simultaneously, and the counter can never go below zero if we add a Lua check.
# Atomic inventory decrement with floor check using Redis Lua script
import redis
r = redis.Redis(host='redis-primary', port=6379, decode_responses=True)
# Load the atomic decrement Lua script
ATOMIC_DECR = r.register_script("""
local key = KEYS[1]
local current = tonumber(redis.call('GET', key))
if current == nil then
return -2 -- key does not exist (sale not initialized)
end
if current <= 0 then
return -1 -- sold out
end
return redis.call('DECR', key)
""")
def try_decrement_inventory(product_id: str) -> int:
"""
Returns:
>= 0: new inventory level after decrement (success)
-1: sold out
-2: sale not initialized
Raises on Redis connection failure.
"""
key = f"inventory:product:{product_id}"
result = ATOMIC_DECR(keys=[key])
return int(result)
# Initialize inventory at sale start
def initialize_inventory(product_id: str, count: int, ttl_seconds: int = 3600):
key = f"inventory:product:{product_id}"
r.set(key, count, ex=ttl_seconds, nx=True) # nx=True: only set if not exists
The optimistic locking pattern applies when writing the final inventory state back to the database. Rather than using a database-level lock, we attach a version number to the inventory row and reject updates where the version has changed since we read it - detecting concurrent modifications without holding a lock.
# Optimistic locking for DB inventory sync
def sync_inventory_to_db(product_id: str, expected_version: int, new_count: int, db_conn):
result = db_conn.execute("""
UPDATE inventory
SET quantity = %s,
version = version + 1,
updated_at = NOW()
WHERE product_id = %s
AND version = %s
""", (new_count, product_id, expected_version))
if result.rowcount == 0:
raise OptimisticLockError(f"Inventory row for {product_id} was modified concurrently")
return True
The Redis counter and the database inventory count can diverge if Redis crashes between a DECR and the corresponding order commit. You need a periodic reconciliation job (covered in the Oversell Reconciliation section) that compares the committed order count against the database inventory row and resets the Redis counter to match. Never trust the Redis counter alone as the final source of truth - it is a fast-path gate, not the ledger.
Distributed Locking and Reservation
Decrementing the Redis counter claims a unit, but the user has not yet completed payment. Between the DECR and the order being marked “confirmed,” the inventory unit is in a limbo state - decremented from the counter but not yet durably committed. If the checkout service crashes at this point, the unit is effectively lost.
The solution is a soft inventory reservation with a TTL. When a user passes through the Queue Gateway, the system creates a reservation record: “user X has 90 seconds to complete checkout for product Y.” This reservation atomically decrements the Redis counter AND writes a reservation row to the database. If the user completes checkout, the reservation converts to an order. If the TTL expires, the reservation is released and the unit is returned to the counter.
The atomic reserve-and-decrement operation uses a distributed lock via a Redis Lua script to make the two operations (counter DECR + reservation write signal) appear as a single transaction:
-- Lua script: atomic reserve (DECR counter + write reservation key)
-- KEYS[1] = inventory counter key
-- KEYS[2] = reservation key for this user+product
-- ARGV[1] = reservation TTL in seconds
-- ARGV[2] = reservation value (user_id:product_id:timestamp)
local inv_key = KEYS[1]
local res_key = KEYS[2]
local ttl = tonumber(ARGV[1])
local res_val = ARGV[2]
-- Check if user already has an active reservation (idempotent)
if redis.call('EXISTS', res_key) == 1 then
return 0 -- already reserved, return success without double-decrementing
end
-- Check inventory
local current = tonumber(redis.call('GET', inv_key))
if current == nil or current <= 0 then
return -1 -- sold out
end
-- Atomic: decrement inventory and set reservation with TTL
redis.call('DECR', inv_key)
redis.call('SET', res_key, res_val, 'EX', ttl)
return 1 -- success
RESERVE_SCRIPT = r.register_script(open('reserve.lua').read())
def reserve_inventory(product_id: str, user_id: str, ttl: int = 90) -> bool:
inv_key = f"inventory:product:{product_id}"
res_key = f"reservation:{product_id}:{user_id}"
res_val = f"{user_id}:{product_id}:{int(time.time())}"
result = RESERVE_SCRIPT(keys=[inv_key, res_key], args=[ttl, res_val])
return int(result) >= 0
Amazon uses virtual queues (waiting rooms) for high-demand product launches. Ticketmaster’s “Verified Fan” presale system uses a queue with randomized ordering to prevent bot advantage, then drains through a fixed worker pool. Both systems use reservation TTLs: if you hold a concert ticket in your cart for more than 10 minutes without completing purchase, it is released back to the pool. The TTL is the safety valve that prevents inventory from being permanently locked by abandoned carts.
Queue-Based Checkout
The queue-based checkout pattern converts the spike of 500K simultaneous HTTP requests into a bounded stream of checkout operations. The Queue Gateway accepts every incoming request immediately (returning a queue position token), then a pool of checkout worker processes drains the queue at a controlled rate - typically matching the throughput ceiling of the inventory and order systems.
The queue is implemented with Kafka or SQS. Each checkout attempt becomes a message. Consumer workers pull from the queue, attempt the full checkout flow (reserve inventory, create order, trigger payment), and ack the message only on success. On failure, the message is retried with backoff up to a configurable maximum.
# Kafka consumer worker: processes checkout queue messages
from confluent_kafka import Consumer, KafkaError
import json
import logging
log = logging.getLogger(__name__)
KAFKA_CONFIG = {
'bootstrap.servers': 'kafka-1:9092,kafka-2:9092,kafka-3:9092',
'group.id': 'checkout-workers',
'auto.offset.reset': 'earliest',
'enable.auto.commit': False, # manual commit after processing
'max.poll.interval.ms': 30000,
'session.timeout.ms': 10000,
}
def run_checkout_worker(inventory_svc, order_svc, payment_svc):
consumer = Consumer(KAFKA_CONFIG)
consumer.subscribe(['checkout-queue'])
while True:
msg = consumer.poll(timeout=1.0)
if msg is None:
continue
if msg.error():
if msg.error().code() == KafkaError._PARTITION_EOF:
continue
log.error("Kafka error: %s", msg.error())
continue
payload = json.loads(msg.value().decode('utf-8'))
user_id = payload['user_id']
product_id = payload['product_id']
idempotency_key = payload['idempotency_key']
try:
# Step 1: Reserve inventory (atomic Lua script)
reserved = inventory_svc.reserve(product_id, user_id, ttl=90)
if not reserved:
notify_user(user_id, status='sold_out')
consumer.commit(msg)
continue
# Step 2: Create order (idempotent)
order = order_svc.create_order(
user_id=user_id,
product_id=product_id,
idempotency_key=idempotency_key,
)
# Step 3: Trigger async payment
payment_svc.enqueue_payment(order_id=order.id, user_id=user_id)
notify_user(user_id, status='confirmed', order_id=order.id)
consumer.commit(msg)
except Exception as e:
log.error("Checkout failed for user %s: %s", user_id, e)
# Do NOT commit - message will be redelivered and retried
# Reservation TTL will auto-release if order creation fails repeatedly
The fairness guarantee of queue-based checkout comes from Kafka partition ordering: messages within a partition are processed in the order they were produced. By hashing checkout requests to partitions by arrival timestamp (or using a sequential token assigned at the Queue Gateway), first-come-first-served ordering is structurally enforced at the queue layer, not via application logic. Users who arrived first are in the queue first and will be processed first.
Idempotent Order Creation
When a checkout worker creates an order, it may fail mid-way: the database write succeeds, but the network drops before the worker gets the acknowledgment. The worker retries, potentially creating a duplicate order. The user ends up paying twice for one item - a catastrophic UX and accounting failure.
Idempotent order creation prevents this by attaching a unique idempotency_key to every checkout attempt. The key is generated by the Queue Gateway when the user submits their checkout - it is stable across retries for the same checkout attempt. The Order Service enforces uniqueness on this key at the database layer, turning any retry into a no-op that returns the existing order.
-- Orders table with idempotency key enforcement
CREATE TABLE orders (
order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
idempotency_key TEXT NOT NULL,
user_id TEXT NOT NULL,
product_id TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'confirmed', 'cancelled', 'fulfilled')),
quantity INTEGER NOT NULL DEFAULT 1,
unit_price_paise INTEGER NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Unique constraint enforces idempotency at the DB layer
CONSTRAINT uq_orders_idempotency UNIQUE (idempotency_key),
-- Prevent one user from having multiple pending orders per product
CONSTRAINT uq_user_product_active UNIQUE (user_id, product_id)
DEFERRABLE INITIALLY DEFERRED
);
CREATE INDEX idx_orders_user ON orders (user_id, created_at DESC);
CREATE INDEX idx_orders_product ON orders (product_id, status);
CREATE INDEX idx_orders_status ON orders (status, created_at) WHERE status = 'pending';
# Idempotent order creation using INSERT ... ON CONFLICT DO NOTHING
def create_order(db_conn, user_id: str, product_id: str,
idempotency_key: str, unit_price_paise: int) -> dict:
"""
Creates an order or returns the existing one if idempotency_key already used.
Never raises on duplicate - always returns the canonical order.
"""
result = db_conn.execute("""
INSERT INTO orders (idempotency_key, user_id, product_id, unit_price_paise)
VALUES (%s, %s, %s, %s)
ON CONFLICT (idempotency_key) DO NOTHING
RETURNING order_id, status, created_at
""", (idempotency_key, user_id, product_id, unit_price_paise))
if result.rowcount == 0:
# ON CONFLICT path: fetch the existing order
existing = db_conn.execute("""
SELECT order_id, status, created_at
FROM orders WHERE idempotency_key = %s
""", (idempotency_key,)).fetchone()
return {'order_id': existing[0], 'status': existing[1], 'duplicate': True}
row = result.fetchone()
return {'order_id': row[0], 'status': row[1], 'duplicate': False}
The idempotency key must be generated before the checkout attempt enters the queue, not inside the worker. If the worker generates the key, a retry (new worker invocation) generates a new key and the duplicate protection fails. Generate the key at the Queue Gateway when the user submits checkout, embed it in the queue message, and keep it stable for the lifetime of that checkout attempt.
Oversell Reconciliation
Even with atomic Redis counters and idempotent order creation, oversells can happen in edge cases: Redis crashes and is restored from a backup that predates several DECRs; a Lua script times out mid-execution and the Redis state is uncertain; a network partition causes the same reservation to be claimed on two Redis replicas before replication catches up.
Oversell reconciliation is the safety net that detects and corrects these anomalies after the fact. A periodic reconciliation job compares the number of confirmed orders against the inventory ceiling for each product and flags any product where orders exceed inventory.
-- Detect oversells: products where confirmed orders exceed inventory
SELECT
o.product_id,
i.quantity AS inventory_ceiling,
COUNT(o.order_id) AS confirmed_order_count,
COUNT(o.order_id) - i.quantity AS oversell_count
FROM orders o
JOIN inventory i ON i.product_id = o.product_id
WHERE o.status IN ('confirmed', 'fulfilled')
AND o.created_at >= i.sale_start_at
GROUP BY o.product_id, i.quantity
HAVING COUNT(o.order_id) > i.quantity
ORDER BY oversell_count DESC;
When an oversell is detected, the system must decide between two compensation strategies:
- Cancellation: Cancel the most recent excess orders (last-in, first-cancelled), notify users with a full refund. Simpler operationally but causes user trust damage.
- Fulfillment-first: Source additional inventory from a buffer pool or a secondary supplier to fulfill all confirmed orders. Requires pre-arrangement but maintains user trust.
The reconciliation job also resets the Redis counter to match the actual remaining inventory (inventory ceiling minus confirmed orders), healing any counter drift.
# Reconciliation job: detect oversells and reset Redis counter
def reconcile_inventory(product_id: str, db_conn, redis_client):
row = db_conn.execute("""
SELECT i.quantity,
COALESCE(COUNT(o.order_id), 0) AS confirmed_count
FROM inventory i
LEFT JOIN orders o
ON o.product_id = i.product_id
AND o.status IN ('confirmed', 'fulfilled')
AND o.created_at >= i.sale_start_at
WHERE i.product_id = %s
GROUP BY i.quantity
""", (product_id,)).fetchone()
ceiling, confirmed = row[0], row[1]
remaining = max(0, ceiling - confirmed)
redis_key = f"inventory:product:{product_id}"
# Reset Redis counter to authoritative DB value
redis_client.set(redis_key, remaining)
return {'oversell': confirmed > ceiling, 'excess': max(0, confirmed - ceiling)}
Shopify runs reconciliation jobs after every major flash sale to verify that Redis-backed inventory counters match the order ledger. When discrepancies are found, Shopify’s default strategy is to honor all confirmed orders and absorb the overstock cost, which is typically less than the reputational damage of cancelling orders. For very large oversells (more than 5% above ceiling), a manual review process is triggered before any cancellation decisions are made.
Data Model
The data model has three core tables: inventory (the ceiling and state for each sale), reservations (soft holds with TTL, mirroring the Redis reservation keys), and orders (the durable ledger of confirmed purchases).
-- Inventory table: one row per product per sale event
CREATE TABLE inventory (
product_id TEXT NOT NULL,
sale_id TEXT NOT NULL,
quantity INTEGER NOT NULL CHECK (quantity >= 0),
version BIGINT NOT NULL DEFAULT 1,
sale_start_at TIMESTAMPTZ NOT NULL,
sale_end_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (product_id, sale_id)
);
-- Sharding key: product_id routes all inventory reads/writes to one shard,
-- avoiding cross-shard joins during checkout.
CREATE INDEX idx_inventory_sale ON inventory (sale_id, sale_start_at);
-- Reservations table: soft holds during checkout (TTL enforced by app layer)
CREATE TABLE reservations (
reservation_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
product_id TEXT NOT NULL,
sale_id TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'active'
CHECK (status IN ('active', 'converted', 'expired', 'released')),
expires_at TIMESTAMPTZ NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- One active reservation per user per product per sale
CONSTRAINT uq_reservation_user_product UNIQUE (user_id, product_id, sale_id)
WHERE status = 'active'
);
CREATE INDEX idx_reservations_expiry ON reservations (expires_at)
WHERE status = 'active';
CREATE INDEX idx_reservations_user ON reservations (user_id, created_at DESC);
-- Orders table (shown in full in Idempotent Order Creation section above)
-- Sharding key: user_id for orders (user queries by their own orders)
-- product_id index for inventory reconciliation queries
The sharding strategy separates concerns: inventory shards by product_id to keep all inventory operations for one product on one shard, avoiding distributed transactions. orders shards by user_id to keep “my orders” queries fast, with a secondary index on product_id for the reconciliation job. The reservations table is small enough to fit on a single shard per sale event.
Key Algorithms and Protocols
Atomic DECR-Check Pattern (Redis Lua)
The core inventory operation must be atomic: check that inventory is greater than zero AND decrement, with no window for another process to slip in between. The Lua approach shown earlier achieves this. The Go equivalent is:
// Go: atomic inventory decrement via Redis Lua
package inventory
import (
"context"
"errors"
"github.com/redis/go-redis/v9"
)
var ErrSoldOut = errors.New("inventory: sold out")
var ErrNotInit = errors.New("inventory: sale not initialized")
var atomicDecr = redis.NewScript(`
local current = tonumber(redis.call('GET', KEYS[1]))
if current == nil then return -2 end
if current <= 0 then return -1 end
return redis.call('DECR', KEYS[1])
`)
func AtomicDecrement(ctx context.Context, rdb *redis.Client, productID string) (int64, error) {
key := "inventory:product:" + productID
result, err := atomicDecr.Run(ctx, rdb, []string{key}).Int64()
if err != nil {
return 0, err
}
switch result {
case -2:
return 0, ErrNotInit
case -1:
return 0, ErrSoldOut
default:
return result, nil
}
}
Redis Lua scripts execute atomically on the Redis server: no other command can interleave during script execution. This is stronger than a compare-and-swap because it fuses the read and write into one uninterruptible operation. The script never blocks other clients because Redis is single-threaded - it processes the Lua script to completion before handling the next command in the queue.
Two-Phase Reservation (Reserve Then Commit)
The reservation protocol has two phases. Phase 1 (reserve): atomically decrement the Redis counter and write a Redis reservation key with TTL. This is the fast path - it completes in under 5ms. Phase 2 (commit): after order creation succeeds in the database, mark the reservation as “converted” and write the final order record. If Phase 2 fails, the reservation TTL expires and Phase 1 is automatically reversed.
# Two-phase reservation with automatic rollback via TTL
import time
from dataclasses import dataclass
from enum import Enum
class ReservationStatus(Enum):
ACTIVE = "active"
CONVERTED = "converted"
EXPIRED = "expired"
@dataclass
class Reservation:
user_id: str
product_id: str
expires_at: float
status: ReservationStatus = ReservationStatus.ACTIVE
def two_phase_checkout(user_id, product_id, idempotency_key,
price_paise, inventory_svc, order_svc, db_conn):
# Phase 1: Reserve (fast, atomic, in-memory)
reserved = inventory_svc.reserve(product_id, user_id, ttl=90)
if not reserved:
return {'status': 'sold_out'}
try:
# Phase 2: Commit (durable, idempotent)
order = order_svc.create_order(
db_conn, user_id, product_id, idempotency_key, price_paise
)
inventory_svc.mark_reservation_converted(product_id, user_id)
return {'status': 'confirmed', 'order_id': order['order_id']}
except Exception as e:
# Phase 2 failed: reservation will auto-expire via TTL
# Explicitly release early to return inventory faster
inventory_svc.release_reservation(product_id, user_id)
raise
Optimistic Locking with Version Vectors
For multi-product flash sales where the inventory table is updated by a reconciliation job, optimistic locking with a version counter prevents stale reads from overwriting fresh data.
# Optimistic locking with retry on version conflict
import time
MAX_RETRIES = 3
RETRY_BASE_DELAY = 0.05 # 50ms
def update_inventory_with_retry(db_conn, product_id, delta, max_retries=MAX_RETRIES):
for attempt in range(max_retries):
row = db_conn.execute("""
SELECT quantity, version FROM inventory
WHERE product_id = %s FOR UPDATE SKIP LOCKED
""", (product_id,)).fetchone()
if not row:
raise ValueError(f"No inventory row for {product_id}")
current_qty, version = row
new_qty = current_qty + delta
if new_qty < 0:
raise ValueError("Insufficient inventory")
updated = db_conn.execute("""
UPDATE inventory
SET quantity = %s, version = version + 1, updated_at = NOW()
WHERE product_id = %s AND version = %s
""", (new_qty, product_id, version))
if updated.rowcount == 1:
return new_qty # success
# Optimistic lock conflict: back off and retry
time.sleep(RETRY_BASE_DELAY * (2 ** attempt))
raise RuntimeError(f"Failed to update inventory after {max_retries} retries")
The FOR UPDATE SKIP LOCKED clause is the key that allows multiple reconciliation workers to run in parallel without blocking each other. Each worker picks a product that is not currently being updated by another worker, processes it, and moves on. This is the same pattern PostgreSQL uses for job queues and is dramatically more efficient than FOR UPDATE (which blocks on locked rows) or no locking (which causes lost updates).
Scaling and Performance
Back-of-envelope calculations:
Given 500K concurrent users and 10K inventory units:
- Queue depth at peak: 500,000 messages arrive in the first second. Queue needs to handle 500K messages instantly - Kafka at 3 brokers handles millions of messages per second with no problem.
- Worker throughput: Each worker processes 1 checkout in ~50ms (5ms Redis + 10ms DB write + 35ms overhead). At 100 workers, throughput is 2,000 checkouts/second.
- Time to sell out: 10,000 units at 2,000/second = 5 seconds. The first 10K messages in the queue get inventory; the remaining 490K get “sold out” responses.
- Time to drain queue: 490K “sold out” responses at 2,000/second = 245 seconds. To drain faster, scale workers to 1,000 - then 490K/20,000 = 24.5 seconds to clear the queue.
- Redis DECR throughput: Redis processes ~100,000 commands/second on a single instance. At peak 50K checkouts/second, you need 2 Redis primaries or Redis Cluster with 2 shards.
- Redis Cluster sharding: Shard by product_id. For a single product flash sale, one Redis primary handles all DECRs. For multi-product sales, distribute products across shards.
- Database write throughput: Order inserts at 2,000/second require a write-optimized Postgres config (synchronous_commit=off for the order table, or use CockroachDB for horizontal write scaling).
- Memory for 500K queue entries: Each Kafka message is ~500 bytes (user_id, product_id, idempotency_key, timestamp). 500K messages = 250 MB. Trivial for Kafka.
Shopify’s infrastructure team documented handling flash sales with over 40,000 checkouts per minute for major brands like Supreme. Their approach uses a queue-based checkout system with Redis atomic counters and Kafka for durability. During the 2022 Adidas Yeezy restocks, Shopify processed tens of thousands of concurrent checkout attempts by using virtual waiting rooms that throttled the checkout rate to match backend capacity, returning queue position tokens to users rather than letting all requests hit the inventory system simultaneously.
Failure Modes and Recovery
| Failure | Detection | Impact | Recovery |
|---|---|---|---|
| Redis primary crash | Health check fails, Redis Sentinel promotes replica | Counter lost or stale; potential oversell window | Reconciliation job resets counter from DB; reservation TTLs prevent permanent lock |
| Queue overflow (Kafka partition full) | Consumer lag metric spikes | New checkout requests rejected at Queue Gateway | Scale Kafka brokers; Queue Gateway returns “try again” with backoff |
| DB primary failure | Connection errors on order writes | Order creation fails; workers retry with idempotency | Postgres failover to replica (60-120s); workers retry until DB available |
| Payment gateway timeout | Payment request times out after 30s | Order confirmed but payment not captured | Async retry job retries payment capture up to 3 times; cancel order after 3 failures |
| Worker pod crash mid-checkout | Kafka message not committed; message redelivered | Duplicate checkout attempt on redeliver | Idempotency key prevents duplicate order; Redis reservation check prevents double-DECR |
| Clock skew on reservation TTL | Reservation expires prematurely | User loses reservation before completing checkout | Use Redis TTL (server clock) not client clock for all TTL decisions |
The most dangerous failure mode is Redis recovering from a stale snapshot after a crash. If Redis restores from a backup that predates 1,000 DECRs, the counter is now 1,000 higher than reality. Every subsequent checkout will succeed against this inflated counter until the next reconciliation run. To minimize the window, run the reconciliation job on Redis startup as part of the recovery procedure, and configure Redis to use AOF (append-only file) persistence so the counter can be replayed from the log rather than restored from a point-in-time snapshot.
Comparison of Approaches
| Approach | Throughput | Oversell Risk | Complexity | Best Fit |
|---|---|---|---|---|
| Optimistic DB locking (SELECT FOR UPDATE) | Low - lock contention at 500K concurrent | Low if implemented correctly | Medium | Low-traffic sales where DB is not the bottleneck |
| Pessimistic DB locking (row-level lock) | Very low - serializes all checkouts at DB | Very low | Low | Testing and admin tools only; not production flash sales |
| Redis atomic counter (this design) | Very high - 100K ops/sec per Redis node | Near-zero with Lua script + reconciliation | Medium-High | Flash sales with burst traffic and tight inventory |
| Queue-based serialization (this design) | High and controllable - tuned by worker count | Very low - sequential processing | High | Any flash sale where fairness and correctness are required |
| Distributed lock (Redlock across 5 nodes) | Medium - lock acquisition adds 5-15ms | Very low - strict mutual exclusion | High | Multi-step operations that need strong consistency |
The Redis atomic counter and queue-based serialization are complementary, not competing. The Redis counter is the fast gate that answers “is there inventory left?” in under 1ms. The queue is the fairness and durability layer that ensures the answer is processed correctly. Together they form the core of a production flash sale system.
Key Takeaways
- Cache-backed inventory counter is the fast gate - Redis
DECRvia Lua script is atomic, sub-millisecond, and handles 100K ops/second per node. It is the only realistic mechanism for inventory checks under 500K concurrent load. - Virtual waiting room converts a spike into a stream - without a queue-based checkout, 500K simultaneous requests create thundering herd on every downstream system. The queue absorbs the burst so workers see steady-state load.
- Inventory reservation with TTL is the safety net - soft reserves prevent inventory from being permanently locked by abandoned carts. The TTL automatically releases unclaimed units back to the pool.
- Idempotency keys are non-negotiable - every checkout attempt must carry a stable idempotency key generated before the request enters the queue. Without it, Kafka redelivery creates duplicate orders and double charges.
- Optimistic locking protects the database layer - version-based conflict detection on the inventory row prevents reconciliation writes from overwriting each other when multiple jobs run in parallel.
- Distributed locks via Lua scripts are stronger than application-level locks - they execute atomically inside Redis with no window for concurrent modification, and they self-expire via TTL if the holding process crashes.
- Oversell reconciliation is the integrity backstop - even with atomic counters, edge cases (Redis restart, network partition) can cause counter drift. A periodic job comparing order count to inventory ceiling detects and heals all anomalies.
- Fairness requires queue position assignment at admission - to guarantee first-come-first-served, queue position must be assigned when the request enters the Queue Gateway, not when it is processed by a worker. Kafka partition ordering preserves this sequence through to processing.
Frequently Asked Questions
Q: Why not just use a database transaction with SELECT FOR UPDATE for inventory instead of Redis?
A: At 50,000 checkouts per second, a SELECT FOR UPDATE on the inventory row creates a global bottleneck: every checkout serializes through one database row lock. Postgres can handle roughly 5,000-10,000 row-level lock acquisitions per second under ideal conditions, so you’d need 5-10x your actual capacity just to handle the lock contention. Redis atomic operations sidestep this by using in-memory execution on a single thread - no lock manager, no transaction overhead, just sequential command processing.
Q: Why not use a distributed lock (Redlock) for the entire checkout flow?
A: Redlock acquires a lock across 5 Redis nodes, which takes 5-15ms in the best case. At 50K checkouts/second with a 15ms lock window, you’d need 750 concurrent lock holders - which defeats the serialization goal. Redlock is appropriate for protecting a critical section that runs in under 1ms. For anything longer, a queue-based approach where messages are processed sequentially provides the same mutual exclusion without the lock timeout overhead.
Q: How do you handle the case where a user’s payment fails after their order is confirmed?
A: Order confirmation and payment capture are decoupled. Confirmation means “you have a reservation and an order record.” Payment capture is async. If payment fails after 3 retries (configurable), the order transitions to “cancelled,” the reservation is released, and the inventory unit is returned to the Redis counter via an explicit INCR. This is a compensating transaction: it reverses the DECR that happened during reservation.
Q: Why use Kafka instead of SQS for the checkout queue?
A: Kafka preserves message ordering within a partition, which is critical for the fairness guarantee (first-come-first-served). SQS standard queues offer no ordering guarantee. SQS FIFO queues offer ordering but are limited to 3,000 messages/second per queue - far below the 500K messages in the first second of a flash sale. Kafka at 3 brokers handles millions of messages/second with ordering per partition.
Q: What happens if the Queue Gateway is down when the sale starts?
A: The Queue Gateway is the admission control bottleneck, so it must be highly available. Run it as a horizontally scaled stateless service behind a load balancer, with multiple replicas across availability zones. The queue position counter (used for fairness ordering) is stored in Redis with replication. If the Queue Gateway loses Redis connectivity, it falls back to a simple timestamp-based ordering from the request arrival time. If the Queue Gateway itself goes down, new checkout requests queue at the load balancer until the service recovers.
Q: How do you prevent bots from holding all reservations without completing checkout?
A: Three layers of defense. First, per-user rate limiting enforced at the load balancer (one checkout attempt per user per sale). Second, CAPTCHA or browser fingerprinting at the Queue Gateway to filter automated traffic before it enters the queue. Third, short reservation TTLs (90 seconds) so any reservation not converted to an order within 90 seconds is automatically released - bots that hold reservations without completing payment lose them quickly.
Interview Questions
Q: Walk me through how you prevent overselling when 500K users click Buy simultaneously.
Expected depth: Describe the two-layer protection: first, the Redis Lua script that atomically checks-and-decrements the counter (preventing concurrent DECRs from going below zero); second, the queue-based checkout that serializes processing so only one worker handles each inventory unit at a time. Mention that the Redis counter is the fast gate but not the final source of truth - the database inventory row with optimistic locking is the ledger. Discuss the reconciliation job as the backstop that detects counter drift caused by Redis restarts or network partitions.
Q: What happens if your Redis inventory counter crashes mid-sale and restarts with a stale count?
Expected depth: Explain that a stale Redis restart creates an inflated counter (the Redis backup predates some DECRs, so the counter shows more inventory than actually remains). This creates an oversell window until the next reconciliation run. To minimize this: configure Redis AOF persistence (append-only log) so the counter is replayed from a complete command log rather than a point-in-time snapshot; run reconciliation immediately on Redis startup as part of the recovery procedure; alert on any discrepancy between Redis count and DB confirmed order count that exceeds a threshold.
Q: How would you ensure a user who clicks Buy at 12:00:00.001 gets priority over one who clicks at 12:00:00.500?
Expected depth: The Queue Gateway must assign a sequence number or timestamp at the moment the HTTP request arrives - not when it enters the Kafka queue or when a worker picks it up. The sequence number becomes part of the queue message and determines the Kafka partition and offset. Kafka guarantees ordering within a partition, so messages with earlier sequence numbers are processed first. Workers must process messages in offset order (no parallel processing within a partition) to preserve the fairness guarantee. Discuss the tradeoff: strict ordering requires single-threaded partition consumption, which limits partition throughput. For very high fairness requirements, use more partitions with smaller user segments per partition.
Q: How would you design the system to handle a multi-product flash sale where 50 products are released simultaneously?
Expected depth: Shard the Redis inventory counters by product_id - each product gets its own Redis key, potentially on a different Redis node in a cluster. Shard the Kafka checkout queue by product_id as well - each product gets its own set of partitions, with dedicated worker groups consuming each product’s queue. This prevents a “hot product” from blocking checkout for other products. The Order Service can handle all products on the same database (with product_id indexed), but the inventory and checkout layers are fully parallel. Discuss the capacity implications: 50 products at 500K users each would require 50x the Redis and Kafka capacity, which in practice is mitigated by users distributing their interest across products.
Q: Your idempotency key deduplication uses a UNIQUE constraint. What happens under high write load when many workers hit the same constraint?
Expected depth: The ON CONFLICT DO NOTHING pattern is efficient because the conflict check is a B-tree index lookup on the unique constraint - O(log n) per insert. Under high insert rate, the B-tree gets write-contended at the leaf nodes. Mitigation: use a hash-indexed unique constraint rather than B-tree (PostgreSQL supports CREATE UNIQUE INDEX USING hash), which provides O(1) lookup. Additionally, partition the orders table by created_at date so the index is smaller and less contended. For extreme write rates (over 50K inserts/second), consider pre-sharding by hashing the idempotency_key and routing to different database shards, so the unique constraint is enforced per-shard with a fallback cross-shard check.
Premium Content
Unlock the full article along with everything else in the archive — all in one place.