The Cache That Made Things Worse

caching thundering-herd redis cache-stampede cache-warming

The Cache That Made Things Worse

97% cache hit rate. That last 3% takes down your database every 5 minutes.

⏱ 12 min read📐 Intermediate🔒 Cache Stampede

It’s Tuesday morning, 9:47 AM. The on-call alert fires before you’ve finished your first coffee. DB connection pool exhausted. The engineering Slack channel lights up with the familiar cascade: “anyone seeing errors?”, “prod is down”, “checking now”. You pull up Datadog, and the Redis dashboard is practically glowing with pride - 97% cache hit rate. The graph is beautiful. Almost suspiciously beautiful.

Then you scroll down. Every five minutes, like clockwork, there’s a vertical spike in your PostgreSQL connection graph. Not a gradual rise - a cliff face. Forty-seven milliseconds of silence, then 400 simultaneous connections hammering the database at the exact same instant. The connection pool maxes out at 100. The rest queue up, timeout after 30 seconds, and return 503s to users. By the time you’ve figured out what’s happening, the spike has passed. The Redis hit rate is back to 97%. Nothing looks wrong - until five minutes later when it happens again.

Here’s the thing about that 97% hit rate: it’s hiding the story. You added Redis to a product feed endpoint six weeks ago. Cache hit rate shot up, DB load dropped 40%, response times improved. The team shipped champagne.gif. The remaining 3% cache miss felt like noise. The math said 97 out of every 100 requests are served from Redis with sub-millisecond latency. What’s the worst a 3% miss can do?

The answer, as it turns out, involves a 5-minute TTL, 400 requests per second at peak traffic, and the geometric reality that when your cache key expires, every single request in-flight at that exact moment gets a miss simultaneously. Not one request falls through to the database. Not a handful. All 400 of them. The user feed cache key for your most popular landing page expires at 9:47:23 AM, and 400 goroutines simultaneously decide they need to be the one to repopulate it.

This is the cache stampede problem.

Why This Happens

The mechanics are straightforward, which makes them easy to underestimate. Redis TTL-based eviction is atomic and instant. When a key’s time-to-live hits zero, it disappears from the cache. There’s no grace period, no staggered expiry, no notification. The key was there on the last read; it’s gone on the next. Every reader that hits the cache in the window between that expiry and the first successful repopulation gets a miss.

At low request rates, this is a non-issue. If you get one request per second and your database query takes 50ms to execute, at most one request hits the database per cache miss event. But at 400 requests per second, the 50ms window between expiry and repopulation contains 20 requests. And unlike a database query that takes 50ms for one row, a query that gets hammered by 400 simultaneous connections doesn’t take 50ms - it takes however long it takes for the connection pool to process a 400-deep queue, which is somewhere between “a while” and “never, users see 503s”.

cache key expires (TTL=300s)
  → 400 concurrent requests hit the key simultaneously
    → all get cache MISS
      → all fire DB queries at the same instant
        → DB connection pool exhausts (max 100 connections)
          → requests queue, timeout
            → users see 503 errors

The failure pattern repeats every 300 seconds because the key gets repopulated with the same TTL each time. Your cache doesn’t remember that it just caused an outage. It cheerfully sets a new 5-minute expiry on the freshly populated key, and the countdown to the next stampede begins.

The Naive Solution (and where it breaks)

Most engineers reach for one of two levers when they first encounter this. The first is increasing the TTL. If the key expires every 5 minutes and that causes problems, make it expire every 30 minutes. The second is adding more cache nodes - maybe the miss is happening because some nodes are evicting keys under memory pressure.

Naive cache architecture showing simultaneous cache misses

Both approaches share the same flaw: they reduce the frequency of the problem without fixing the underlying mechanism. A 30-minute TTL means the stampede happens every 30 minutes instead of every 5. You’ve turned a frequent, small outage into an infrequent, larger one. And adding cache nodes doesn’t help at all if the issue is a single hot key - all requests for that key route to the same node, and the expiry still hits all concurrent readers at once.

The scale math makes this obvious once you write it out:

Small scale: 10 req/s → 1-2 DB queries on miss → manageable
Large scale: 400 req/s → 400 simultaneous DB queries on expiry → catastrophic failure

The naive solutions treat TTL as a knob to tune rather than a fundamental architectural problem. The stampede isn’t an edge case - it’s a guaranteed consequence of the TTL model under any meaningful load.

The Better Solution

The good news is that smart people have been solving this for a long time. The fix isn’t a single technique - it’s a layered set of mechanisms that you apply based on your traffic patterns and latency requirements. Each layer addresses a specific failure mode.

Mutex Lock on Miss

The simplest fix that actually works: when a cache miss occurs, only one goroutine should query the database. Everyone else should wait and retry the cache read.

Redis ships a primitive for exactly this: SET NX (set if not exists), which is atomic. First caller wins the lock; everyone else gets a “not acquired” response and knows to wait.

SET cache:lock:user_feed:123 1 NX EX 5
# if acquired: query DB, populate cache, release lock
# if not acquired: wait 50ms, retry cache read

The lock TTL (EX 5) is a safety valve - if the process holding the lock crashes before releasing it, the lock expires automatically within 5 seconds and another process can take over. Without it, a crashed process creates a permanent deadlock.

This approach converts a thundering herd of 400 simultaneous DB queries into one DB query followed by 399 cache reads. The latency for those 399 waiters is the DB query time plus a 50ms polling delay - acceptable for most applications. The database sees a spike of exactly one query per cache miss event regardless of concurrent load.

The failure mode worth knowing: if your DB query takes longer than your lock TTL, the lock expires before the cache is populated. A second process acquires the lock and fires another DB query while the first is still running. Keep the lock TTL at least 3x your P95 query time.

Probabilistic Early Expiry

Mutex locking fixes the stampede but doesn’t eliminate the cache miss itself. Probabilistic early expiry (also called the XFetch algorithm) takes a different approach: instead of waiting for the key to expire, recompute it slightly before expiry with a probability that increases as the TTL decreases.

def get_with_early_expiry(key, beta=1.0):
    value, ttl_remaining = cache.get_with_ttl(key)
    recompute_time = estimate_recompute_time()  # e.g. 50ms
    # XFetch: recompute if random score < probability
    if -recompute_time * beta * log(random()) > ttl_remaining:
        value = db.query(key)
        cache.set(key, value, ttl=300)
    return value

The beta parameter controls aggression. Higher beta means more frequent early recomputation, which reduces the chance of a miss at expiry but increases background DB load. A beta of 1.0 is usually the right starting point.

The elegant property of XFetch is that it’s probabilistic - different workers will trigger the recomputation at slightly different times, which naturally staggers the load. And because the old value is still valid when recomputation happens, readers always get fresh data with no wait.

Cache Warming

For keys where you know the access pattern in advance - like a product feed that refreshes on a schedule - cache warming sidesteps the expiry problem entirely by pre-populating the cache before the old key expires.

# Background worker pseudocode
def warm_cache():
    while True:
        for key in hot_keys:
            ttl = cache.ttl(key)
            if ttl < 60:  # less than 60s remaining
                value = db.query(key)
                cache.set(key, value, ttl=300)
        time.sleep(30)

The warmer runs every 30 seconds and repopulates any key with less than 60 seconds of TTL remaining. The old key remains valid throughout - users never see a miss. The database sees predictable, periodic load from the warmer rather than bursty load from stampedes.

Cache warming requires knowing which keys are hot. For product feeds, recommendation lists, and category pages, this is easy - they’re the same keys every time. For user-specific content, you need a mechanism to track “recently accessed with high frequency”, which adds complexity.

Staggered TTLs

The simplest technique, and often the most overlooked: add random jitter to every cache write so keys don’t expire synchronously.

base_ttl = 300
jitter = random.randint(0, 60)  # add 0-60s jitter
cache.set(key, value, ttl=base_ttl + jitter)

Without jitter, if your application starts up and warms the cache for 10,000 keys with a 300-second TTL, all 10,000 keys expire at the same time. With jitter, they expire evenly distributed across a 60-second window. The database still sees the same total query volume - it’s just spread out over time instead of concentrated in a single instant.

Staggered TTLs don’t fix the stampede for individual hot keys, but they dramatically reduce the blast radius when a large cache population expires together. Use this as the baseline - every cache write should include jitter.

Solution flow with mutex lock and probabilistic early expiry

The Full Architecture

Full cache stampede prevention architecture

Component Deep Dives

Redis Lock Manager

The mutex lock needs careful implementation to be safe under failure conditions. The standard pattern uses SET NX PX (millisecond precision) for the lock with a unique value per process so only the lock owner can release it.

# Acquire lock (unique value prevents accidental release by another process)
SET cache:lock:user_feed:123 {unique-uuid} NX PX 5000

# Release lock (Lua script ensures atomicity - check and delete in one operation)
if redis.call("get", KEYS[1]) == ARGV[1] then
  return redis.call("del", KEYS[1])
else
  return 0
end

The Lua script for release is not optional. Without it, there’s a race condition: process A checks that it owns the lock, the lock expires, process B acquires the lock, process A deletes it. The atomic check-and-delete prevents this.

XFetch Algorithm Implementation

The XFetch formula requires get_with_ttl - a Redis command that returns both the value and the remaining TTL in one round trip. In Redis 7.x, this is available via OBJECT FREQ and TTL in a pipeline. For older versions, pipeline GET and TTL together.

import math
import random

def xfetch(redis_client, key, beta=1.0, recompute_fn=None, new_ttl=300):
    pipe = redis_client.pipeline()
    pipe.get(key)
    pipe.ttl(key)
    value, ttl = pipe.execute()

    if value is None:
        # Full miss - must recompute
        value = recompute_fn(key)
        redis_client.setex(key, new_ttl, value)
        return value

    recompute_time_ms = 50  # estimated DB query time in ms
    recompute_time_s = recompute_time_ms / 1000.0

    if -recompute_time_s * beta * math.log(random.random()) > ttl:
        # Early recompute - value still valid, just refreshing
        value = recompute_fn(key)
        redis_client.setex(key, new_ttl, value)

    return value

Cache Warmer Background Worker

A production cache warmer needs to handle the case where the warmer itself is the bottleneck. If you have 50,000 hot keys and each DB query takes 50ms, a single-threaded warmer takes 2,500 seconds to complete one pass - far too slow for a 300-second TTL.

import asyncio
from collections import deque

async def warm_key(redis_client, db_client, key, min_ttl=60, new_ttl=300):
    ttl = await redis_client.ttl(key)
    if ttl < min_ttl:
        value = await db_client.query(key)
        await redis_client.setex(key, new_ttl, value)

async def cache_warmer(redis_client, db_client, hot_keys, concurrency=50):
    semaphore = asyncio.Semaphore(concurrency)
    async def bounded_warm(key):
        async with semaphore:
            await warm_key(redis_client, db_client, key)

    while True:
        tasks = [bounded_warm(key) for key in hot_keys]
        await asyncio.gather(*tasks)
        await asyncio.sleep(30)

The concurrency semaphore prevents the warmer from overwhelming the database when it first starts up with a cold cache. Cap it at roughly half your connection pool size.

TTL Jitter Strategy

Jitter works best when paired with a minimum floor to prevent keys from expiring too soon. A product feed that needs to be fresh within 5 minutes shouldn’t get a TTL of 10 seconds because random.randint(0, 300) returned 10.

def set_with_jitter(cache, key, value, base_ttl=300, jitter_pct=0.2):
    jitter_range = int(base_ttl * jitter_pct)
    ttl = base_ttl + random.randint(-jitter_range, jitter_range)
    ttl = max(ttl, base_ttl // 2)  # floor at 50% of base TTL
    cache.set(key, value, ttl=ttl)

Comparison Table

Approach	Write Complexity	Read Complexity	Latency Impact	Storage Cost	Failure Mode	Best Use Case
Simple TTL	Low	Low	None	Low	Full stampede on expiry at scale	Low traffic, non-critical caches
Increased TTL	Low	Low	None	Low	Infrequent but larger stampedes	Static content, infrequent updates
Mutex Lock on Miss	Medium	Medium	+50ms for waiters	Low (lock key only)	Lock contention; deadlock if TTL too short	High-traffic single hot keys
Probabilistic Early Expiry (XFetch)	Low	Medium	Near-zero (background recompute)	None	Occasional extra DB queries before expiry	Read-heavy, latency-sensitive endpoints
Cache Warming	High	Low	None (always warm)	Moderate (must track hot keys)	Warmer lag; cold start on restart	Known access patterns, scheduled data
Staggered TTLs	Low	Low	None	None	Still stampedes for individual hot keys	Any cache as baseline hygiene

Key Takeaways

Cache stampede is a guaranteed outcome, not an edge case - any key that expires under concurrent load will cause it.
Mutex lock on miss is the most reliable fix for individual hot keys, converting N simultaneous DB queries into one.
XFetch probabilistic early expiry eliminates the miss itself by recomputing before TTL hits, at the cost of occasional early DB queries.
Cache warming is the cleanest solution for known hot keys, but requires operational complexity to track and maintain the warmer.
Staggered TTLs should be the default for every cache write - they cost nothing and reduce blast radius significantly.
97% hit rate is not the metric you should optimize for; “P99 latency during cache expiry windows” is.
Lock TTL must be longer than your P95 DB query time, or you trade stampedes for cascading lock contention.
Combining techniques is the right answer: staggered TTLs as baseline, XFetch for hot read paths, mutex for anything that can’t tolerate early recompute.

The uncomfortable truth about caching is that the failure modes live in the 3% you’re not watching. A 97% hit rate on a high-traffic system means the cache is doing its job - right up until the moment it fails in a way that’s hard to reproduce, hard to debug, and happens on a 5-minute timer. Design for the expiry event, not the cache hit.

Frequently Asked Questions

Q: If mutex lock on miss works, why bother with XFetch? A: Mutex lock adds latency for every request that waits during the lock window. At very high throughput, that 50ms polling delay accumulates. XFetch eliminates the miss entirely - there’s no lock window because the cache is always fresh. Use mutex for simplicity; use XFetch when you need zero-miss latency guarantees.

Q: Does the cache warmer create a single point of failure? A: Yes, if you run a single warmer. In practice, run two warmers with independent schedules and let them race - the second write is idempotent. The warmer failing means you fall back to TTL-based eviction and potentially stampedes, but not data loss. Add an alert on warmer health, not just cache hit rate.

Q: What’s the right lock TTL for a Redis mutex? A: Start at 3x your P95 DB query time. If your P95 is 100ms, set the lock TTL to 300ms. The risk of too-short a TTL is multiple processes acquiring the lock sequentially, causing more DB queries than needed. The risk of too-long a TTL is delayed recovery if the lock holder crashes. A 300ms to 2 second range covers most OLTP workloads.

Q: Can I use this pattern with Redis Cluster? A: Yes, with one constraint: the lock key and the cache key must hash to the same slot, or use Redlock (the multi-node distributed lock algorithm). If they’re on different nodes, the lock may succeed while the cache write goes to a different shard. The simplest fix is using hash tags: cache:{user_feed:123}:data and cache:{user_feed:123}:lock force both keys to the same slot.

Q: How do I identify which keys are causing stampedes in production? A: Redis MONITOR command will show you command volume, but it’s expensive in prod. Better: add timing instrumentation around your cache-miss-to-DB-query path and emit a metric with the cache key name. A spike in that metric with a specific key name is your stampede source. Redis also has OBJECT FREQ (in LFU mode) to identify access frequency.

Q: Does this problem exist with Memcached? A: Yes, identically. Stampede is a property of the TTL-based eviction model, not Redis specifically. Memcached doesn’t have Lua scripting, so atomic lock release requires a CAS (check-and-set) operation instead, but the pattern is the same.

Interview Questions

Q: You’re debugging a production outage where the database is getting hammered every 5 minutes despite a Redis cache. Walk me through your diagnosis and fix.

Expected depth: Identify TTL-based cache expiry as root cause, explain the thundering herd mechanism, describe at least two mitigation strategies (mutex lock and XFetch), discuss tradeoffs between them including latency impact and operational complexity. Bonus points for mentioning staggered TTLs as baseline hygiene and cache warming for known hot keys.

Q: Implement a thread-safe cache-aside pattern in Go that prevents cache stampedes on a high-traffic endpoint.

Expected depth: Show sync.singleflight or a Redis SET NX mutex implementation, handle the case where the lock holder crashes before releasing (lock TTL), explain why naive if miss { query DB; set cache } is broken under concurrency, discuss goroutine wait strategies (polling vs. channel notification).

Q: How does the XFetch algorithm decide when to recompute a cache value early?

Expected depth: Explain the formula -delta * beta * log(rand()) > ttl_remaining, describe how beta controls aggression, explain why the logarithm creates a probability distribution that concentrates recomputes near expiry, discuss how to estimate delta (the recompute time) in practice.

Q: You have 50,000 cache keys that all expire at the same time because the cache was cold-started. How do you prevent a stampede?

Expected depth: Describe TTL jitter as the primary fix, discuss initial cache population strategy (warm incrementally with rate limiting rather than all at once), mention that a full cold-start is unavoidable but the recovery shape matters - sequential warming with backpressure is better than parallel warming that overwhelms the DB.

Q: How does Redis Cluster affect your stampede prevention strategy?

Expected depth: Discuss hash slots and the requirement that lock key and cache key must be in the same slot, explain hash tags as the solution, mention Redlock as the multi-node distributed lock option and its tradeoffs (network round trips, split-brain behavior), note that cluster sharding can actually help with stampedes by distributing hot keys across nodes.

Continue Learning

You fixed the cache stampede. But your API still takes 2 seconds. Here's exactly where to look next.

Read: The 200ms Promise: How to Rescue an API That's Bleeding Latency Premium Unlock all articles · ₹199