Build Facebook News Feed Ranking and Fanout

scalability distributed-systems performance

System Design Deep Dive

Facebook News Feed Ranking and Fanout

Serving a unique ranked feed to 3 billion users in 500ms while surviving write storms from celebrities with 200 million followers.

⏱ 14 min read📐 Advanced🏗️ Fanout

Imagine a newspaper that writes a completely different edition for each of its 3 billion readers, tailored to what each reader cares about, updated continuously throughout the day, and delivered within half a second of each reader opening their phone. That is Facebook’s News Feed - and the engineering challenge of making it work at this scale is one of the most interesting problems in distributed systems.

When someone with 200 million followers posts a photo, the naive implementation would be to write that post ID into 200 million users’ feed lists simultaneously. At 200 million writes, that is a write storm that would overwhelm any database. But if you do nothing at write time and pull from 200M followers at read time, a single feed request becomes a scatter-gather query to millions of rows. Neither extreme works.

The tension is between write amplification and read latency. Push-on-write (fan-out) makes reads fast but writes explosive. Pull-on-read makes writes cheap but reads slow. The real system uses a hybrid: regular users get fan-out on write, celebrities use pull-on-read, and a ranking layer stitches the two paths together at read time.

On top of the distribution problem, there is the ranking problem. A chronological feed was the original design, but engagement drops dramatically when stale or irrelevant posts appear first. The ranking model must score hundreds of candidate posts per user in milliseconds, using personalized affinity signals that vary for every user-author pair. We need to solve for: write fan-out without 200M database writes, read latency under 500ms with ranking applied, and per-user personalization that does not require real-time computation for all 3 billion users simultaneously.

Requirements and Constraints

Functional Requirements

Generate a personalized, ranked news feed per user on request
Accept post creation from any user and propagate to followers
Show posts from followed users, groups, and pages
Support likes, comments, and shares that update engagement signals
Paginate the feed using a cursor so users can scroll infinitely
Support feed refresh to show new posts at the top

Non-Functional Requirements

3 billion monthly active users, ~1 billion daily active users
Feed read latency: p99 under 500ms
Post fan-out: propagate a regular user’s post to all followers within 60 seconds
Celebrity post fan-out: not applicable - use pull model
Ranking: score and order up to 500 candidate posts per feed request
Feed cache: top 500 post IDs per user, TTL 7 days
Write throughput: ~500M posts/day = ~6,000 writes/second average

Constraints

Celebrity threshold: users with over 100,000 followers use pull model to avoid write amplification
Feed cache is eventually consistent - a post may take up to 60 seconds to appear
Feed ranking re-runs on every page load - cached post IDs, not cached rankings
Offline users’ caches may be stale or cold on first login after inactivity

High-Level Architecture

The system has two primary data paths: write path (user creates a post) and read path (user requests their feed). These paths share storage but are designed to scale independently.

The Post Write Service accepts new posts and writes to the Post Store (sharded MySQL). It publishes a post event to the fan-out Kafka topic. Fan-out Workers subscribe to this topic, check whether the author is a celebrity, and either push the post ID to each follower’s Redis feed cache or skip fan-out for celebrities.

At read time, the Feed Read Service fetches the user’s pre-computed feed list from Redis. For any celebrities the user follows, it fetches recent posts from those timelines directly (pull model). The merged candidate list goes to the Ranking Service, which scores each post using a lightweight ML model and returns the top 20 posts for this page. The Social Graph Service (backed by Facebook’s TAO - the graph database) provides follower/following relationships.

Key Insight

The hybrid fanout model is the architectural crux: push for regular users (expensive at write time, fast at read time) and pull for celebrities (cheap at write time, adds a scatter-gather at read time). The threshold of 100,000 followers is a tuning knob - lower it to reduce write amplification, raise it to reduce read-time scatter-gather latency.

Fan-out Workers

The fan-out worker is the engine that bridges a write event (one post) and a read benefit (fast feed loads for followers).

Think of the fan-out worker like a mail sorting room. When one person sends 500 letters to 500 recipients, a sorter reads the address list once and places a copy in each recipient’s mailbox. The recipients get instant access to their mail without needing to query the sender. The problem: if one sender has 200 million recipients, the sorting room is overwhelmed.

Fan-out workers are partitioned by author_id % num_workers. This means all posts from a given author always route to the same worker, which simplifies per-author rate limiting and ordering guarantees. Each worker processes events asynchronously from a Kafka partition.

# Fan-out worker: processes post events from Kafka
import redis
from dataclasses import dataclass
from typing import Optional

CELEBRITY_THRESHOLD = 100_000
FEED_MAX_SIZE = 500

@dataclass
class PostEvent:
    post_id: str
    author_id: int
    timestamp: float
    content_type: str  # "photo", "video", "text", "link"

class FanoutWorker:
    def __init__(self, redis_client: redis.Redis, graph_client, post_store):
        self.redis = redis_client
        self.graph = graph_client
        self.posts = post_store

    def process_post_event(self, event: PostEvent) -> None:
        follower_count = self.graph.get_follower_count(event.author_id)

        if follower_count > CELEBRITY_THRESHOLD:
            # Celebrity: write to timeline only, followers will pull on read
            self._write_to_celebrity_timeline(event)
            return

        # Regular user: push post_id to each follower's feed cache
        followers = self.graph.get_followers(event.author_id)
        self._push_to_follower_feeds(event, followers)

    def _push_to_follower_feeds(self, event: PostEvent, follower_ids: list[int]) -> None:
        # Use Redis pipeline to batch writes
        pipe = self.redis.pipeline(transaction=False)
        score = -event.timestamp  # negative for newest-first ordering

        for follower_id in follower_ids:
            feed_key = f"feed:{follower_id}"
            pipe.zadd(feed_key, {event.post_id: score})
            # Trim to max size to prevent unbounded growth
            pipe.zremrangebyrank(feed_key, FEED_MAX_SIZE, -1)
            # Refresh TTL on interaction
            pipe.expire(feed_key, 7 * 24 * 3600)  # 7 days

        pipe.execute()

    def _write_to_celebrity_timeline(self, event: PostEvent) -> None:
        timeline_key = f"timeline:{event.author_id}"
        pipe = self.redis.pipeline(transaction=False)
        pipe.zadd(timeline_key, {event.post_id: -event.timestamp})
        pipe.zremrangebyrank(timeline_key, 500, -1)  # keep 500 most recent
        pipe.expire(timeline_key, 30 * 24 * 3600)  # 30 days
        pipe.execute()

For a user with 500 followers, this means 500 Redis ZADD commands per post. With a pipelined batch, this executes in ~5ms. At 6,000 posts/second across the fleet, the fan-out tier generates ~3 million Redis writes/second - easily handled by a Redis cluster with 12 shards.

Watch Out

The celebrity threshold creates an edge case: users near the threshold flip between push and pull models when they gain or lose followers. A user crossing 100,000 followers mid-day causes inconsistency - some followers have the old push-based feed entries and some don’t. Handle this with a grace period: when an author crosses the threshold, run a one-time job to remove their post IDs from follower feed caches.

Precomputed Feed Cache

The feed cache is a Redis sorted set per user containing the IDs of posts that should appear in this user’s feed, ordered by post timestamp.

Think of it like a pre-sorted magazine pile. Instead of going to the library to search for relevant articles every time you want to read, someone has already picked and sorted your personalized pile. When you want to read, you just grab the top items from your pile.

# Feed read path: assembles candidate posts from cache + celebrity timelines
class FeedReadService:
    def __init__(self, redis_client, graph_client, post_store, ranking_service):
        self.redis = redis_client
        self.graph = graph_client
        self.posts = post_store
        self.ranker = ranking_service

    def get_feed(self, user_id: int, cursor: Optional[float] = None, page_size: int = 20) -> dict:
        # 1. Fetch pre-computed feed candidates from cache
        feed_key = f"feed:{user_id}"
        score_min = "-inf"
        score_max = f"({cursor}" if cursor else "+inf"  # exclusive cursor

        # Redis ZRANGEBYSCORE in reverse order (newest first)
        cached_ids = self.redis.zrangebyscore(
            feed_key,
            min=score_min,
            max=score_max,
            start=0,
            num=200,  # fetch 200, rank down to 20
            withscores=True
        )

        # 2. Fetch celebrity posts (pull model)
        celebrity_ids = self.graph.get_celebrities_followed_by(user_id)
        celebrity_posts = []
        for celeb_id in celebrity_ids[:20]:  # cap at 20 celebrities per feed request
            timeline_key = f"timeline:{celeb_id}"
            posts = self.redis.zrangebyscore(
                timeline_key,
                min=score_min,
                max=score_max,
                start=0,
                num=10,
                withscores=True
            )
            celebrity_posts.extend(posts)

        # 3. Merge and deduplicate candidate post IDs
        all_candidate_ids = list(set(
            [post_id for post_id, _ in cached_ids] +
            [post_id for post_id, _ in celebrity_posts]
        ))

        # 4. Fetch full post objects for candidates
        post_objects = self.posts.batch_get(all_candidate_ids)

        # 5. Apply ranking model to score candidates
        ranked_posts = self.ranker.rank(user_id, post_objects)

        # 6. Return top page_size with cursor for next page
        page = ranked_posts[:page_size]
        next_cursor = page[-1].timestamp if len(ranked_posts) > page_size else None

        return {"posts": page, "next_cursor": next_cursor}

Cold cache handling is critical. When a user who has been offline for more than 7 days (TTL on feed cache) returns, their feed key is missing. The read service detects this and falls back to a feed regeneration job: scan the user’s follow list, fetch recent posts from each author’s timeline, and hydrate the feed cache. This regeneration runs synchronously on first load (adding ~2 seconds) but is acceptable since it only happens for truly cold users.

Real World

Facebook’s actual system uses TAO (The Associations and Objects) as the graph data store, which is a distributed key-value + graph database optimized for social graph traversals. The feed cache at Facebook uses Memcached rather than Redis at the lowest layer, with application-level sorted list semantics maintained by the application server. Instagram uses a more direct Redis sorted set approach similar to what we described.

Feed Ranking Model

The ranking model’s job is to score each candidate post and order them so the most engaging content appears first for this specific user.

Ranking is like a personal assistant who knows you well: they do not show you every piece of mail that arrived, they sort it so the things you actually want to read are on top. The assistant’s intuition comes from observing your past behavior - what you opened, what you replied to, what you skipped.

The model takes as input features of the post, features of the author-user relationship (affinity), and features of the user’s recent context, and outputs a scalar score.

# Feed ranking: scores candidate posts for a user
import numpy as np
from dataclasses import dataclass
from typing import NamedTuple

@dataclass
class PostFeatures:
    post_id: str
    content_type: str           # "photo", "video", "text", "link"
    age_seconds: float          # seconds since post was created
    engagement_count: int       # total likes + comments + shares
    engagement_velocity: float  # engagement per hour in last 4h
    author_id: int
    is_celebrity: bool

@dataclass
class AffinityFeatures:
    # How much has user interacted with this author historically
    like_rate_30d: float        # P(user likes author's posts)
    comment_rate_30d: float     # P(user comments)
    view_rate_30d: float        # P(user actually reads)
    mutual_friends_count: int
    followed_days_ago: int      # how long user has followed this author

@dataclass
class UserContextFeatures:
    preferred_content_types: dict  # {"video": 0.6, "photo": 0.3, "text": 0.1}
    time_of_day: float             # 0.0 to 24.0
    days_since_last_active: int

class FeedRankingModel:
    # Simplified linear model - production uses gradient boosted trees + DNN
    WEIGHTS = {
        "affinity": 2.5,
        "engagement_velocity": 1.8,
        "content_type_pref": 1.5,
        "recency": 1.2,
        "engagement_count_log": 0.8,
        "mutual_friend_boost": 0.6,
    }

    def score(self, post: PostFeatures, affinity: AffinityFeatures, context: UserContextFeatures) -> float:
        # Affinity score: combined interaction probability
        affinity_score = (
            affinity.like_rate_30d * 0.4 +
            affinity.comment_rate_30d * 0.4 +
            affinity.view_rate_30d * 0.2
        )

        # Recency decay: half-life of 6 hours for most content
        recency_score = np.exp(-post.age_seconds / (6 * 3600))

        # Content type preference match
        content_pref = context.preferred_content_types.get(post.content_type, 0.1)

        # Engagement signals (log-normalized)
        engagement_score = np.log1p(post.engagement_count) / 10.0
        velocity_score = min(post.engagement_velocity / 1000.0, 1.0)

        # Mutual friends interacting with this post (social proof)
        mutual_boost = min(affinity.mutual_friends_count / 10.0, 1.0)

        score = (
            self.WEIGHTS["affinity"] * affinity_score +
            self.WEIGHTS["engagement_velocity"] * velocity_score +
            self.WEIGHTS["content_type_pref"] * content_pref +
            self.WEIGHTS["recency"] * recency_score +
            self.WEIGHTS["engagement_count_log"] * engagement_score +
            self.WEIGHTS["mutual_friend_boost"] * mutual_boost
        )
        return float(score)

    def rank(self, user_id: int, posts: list, affinity_store, user_context) -> list:
        scored = []
        for post in posts:
            affinity = affinity_store.get(user_id, post.author_id)
            score = self.score(post, affinity, user_context)
            scored.append((score, post))
        scored.sort(reverse=True, key=lambda x: x[0])
        return [post for _, post in scored]

The ranking model runs on CPU (not GPU) because it processes at most 200-500 candidate posts per request, not millions. At 200 candidates with ~50 features each, the scoring computation takes under 10ms per request. Model weights are updated daily via offline training on engagement data.

Key Insight

The ranking model does not receive raw posts at query time - it receives pre-computed feature vectors from a feature store. Affinity features (how much user A interacts with author B) are pre-computed and cached, because computing them in real time from raw engagement events would require a full table scan per user-author pair at request time.

Data Model

-- Posts table - sharded by post_id (hash sharding)
CREATE TABLE posts (
    post_id         BIGINT NOT NULL,        -- Snowflake ID, encodes timestamp
    author_id       BIGINT NOT NULL,
    content_type    ENUM('text','photo','video','link') NOT NULL,
    content_ref     TEXT NOT NULL,          -- S3 key or text content
    created_at      TIMESTAMPTZ NOT NULL,
    privacy         ENUM('public','friends','friends_of_friends') NOT NULL DEFAULT 'friends',
    like_count      INT NOT NULL DEFAULT 0,
    comment_count   INT NOT NULL DEFAULT 0,
    share_count     INT NOT NULL DEFAULT 0,
    is_deleted      BOOLEAN NOT NULL DEFAULT FALSE,
    PRIMARY KEY (post_id)
) ENGINE=InnoDB;

CREATE INDEX idx_posts_author_created ON posts(author_id, created_at DESC);

-- Affinity table - precomputed per user-author pair
-- Updated asynchronously by engagement processing pipeline
CREATE TABLE user_author_affinity (
    user_id         BIGINT NOT NULL,
    author_id       BIGINT NOT NULL,
    like_rate_30d   FLOAT NOT NULL DEFAULT 0.0,
    comment_rate_30d FLOAT NOT NULL DEFAULT 0.0,
    view_rate_30d   FLOAT NOT NULL DEFAULT 0.0,
    last_interaction TIMESTAMPTZ,
    computed_at     TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (user_id, author_id)
) ENGINE=InnoDB;

CREATE INDEX idx_affinity_user ON user_author_affinity(user_id, last_interaction DESC);

-- Celebrity threshold registry (cached in application layer)
CREATE TABLE author_metadata (
    author_id       BIGINT PRIMARY KEY,
    follower_count  BIGINT NOT NULL DEFAULT 0,
    is_celebrity    BOOLEAN GENERATED ALWAYS AS (follower_count > 100000) STORED,
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Feed generation audit log (for debugging cold cache regeneration)
CREATE TABLE feed_generation_log (
    user_id         BIGINT NOT NULL,
    generated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    trigger         TEXT NOT NULL,      -- 'cold_cache', 'forced_refresh'
    candidates_fetched INT NOT NULL,
    duration_ms     INT NOT NULL,
    PRIMARY KEY (user_id, generated_at)
) PARTITION BY RANGE (generated_at);

The posts table is sharded by post_id using hash-based consistent sharding. post_id is a Snowflake-style ID that encodes the creation timestamp, so range queries by author and time hit a predictable shard.

The user_author_affinity table is the most read-heavy table in the system - every feed read for every user requires affinity lookups. This table is replicated to read replicas and also cached in the feature store (a Redis cluster keyed by user_id:author_id) with a 15-minute TTL.

Key Algorithms and Protocols

Hybrid Fanout Decision

The decision of whether to push or pull for a given author is a runtime check based on cached follower count:

def should_push_fanout(author_id: int, follower_count_cache: dict) -> bool:
    count = follower_count_cache.get(author_id)
    if count is None:
        # Cache miss: fetch from DB and update cache
        count = db.query("SELECT follower_count FROM author_metadata WHERE author_id = %s", author_id)
        follower_count_cache[author_id] = count  # TTL 5 minutes

    return count <= CELEBRITY_THRESHOLD

# At post creation time:
event = PostEvent(post_id=new_id, author_id=user_id, timestamp=now())
if should_push_fanout(event.author_id, follower_count_cache):
    fanout_queue.publish(event)   # async fan-out workers pick this up
else:
    celebrity_timeline.write(event)  # direct write, no fan-out

Time complexity: Fan-out for a user with F followers = O(F) Redis writes. At 500 followers and 6,000 posts/second, this is 3 million writes/second. Each write takes ~50 bytes in Redis. Storage for the fan-out tier: 3B users * 500 posts * (8 bytes for post_id + 8 bytes for score) = ~24TB of Redis data.

Feed Pagination with Stable Cursors

Cursor-based pagination must be stable - if new posts arrive while the user is scrolling, they should not disrupt the current scroll position:

@dataclass
class FeedCursor:
    # Encode the timestamp of the last-seen post as the cursor
    # This is stable: new posts have newer timestamps and appear before this cursor
    last_seen_timestamp: float  # unix timestamp as float

    def encode(self) -> str:
        import base64, json
        return base64.b64encode(json.dumps({"ts": self.last_seen_timestamp}).encode()).decode()

    @classmethod
    def decode(cls, cursor_str: str) -> "FeedCursor":
        import base64, json
        data = json.loads(base64.b64decode(cursor_str))
        return cls(last_seen_timestamp=data["ts"])

# In the feed read service:
def get_next_page(user_id: int, cursor_str: Optional[str]) -> dict:
    if cursor_str:
        cursor = FeedCursor.decode(cursor_str)
        # Fetch posts with timestamp strictly before the cursor
        max_score = -cursor.last_seen_timestamp  # negated because we store as negative
    else:
        max_score = float("inf")  # first page: no upper bound

    post_ids = redis.zrangebyscore(f"feed:{user_id}", "-inf", max_score, start=0, num=200)
    # ... rank and return top 20

Key Insight

Using a timestamp-based cursor (rather than an offset-based one like page=2) makes pagination stable under concurrent writes. A timestamp cursor says “give me posts older than X” - new posts arriving at the top do not push existing posts to different offset positions, so the next page always shows the correct continuation point.

Write Amplification Math

Understanding the write amplification factor helps size the fan-out tier:

Write Amplification Analysis:
  - Average follower count: 350 (median ~150, mean skewed by celebrities)
  - Posts per second: ~6,000
  - Fan-out writes per second = 6,000 * 350 = 2.1M Redis writes/second (mean)
  - Peak (assume 3x): 6.3M Redis writes/second
  - Each Redis write: ~70 bytes (key + member + score)
  - Network bandwidth for fan-out: 6.3M * 70 bytes = 441 MB/second

  For a celebrity with 200M followers:
  - Old naive cost: 200M * 70 bytes = 14GB per single post event
  - Avoided by pull model: 0 fan-out writes
  - Cost paid at read time: ~20 timeline fetches per user who follows this celebrity
    * Only paid when the user actually opens their feed
    * 1 billion daily active users / 1B requests/day = 1 request/user/day on average
    * 200M followers * 20 keys fetched = 4B key lookups/day = 46K reads/second
    * Much cheaper than 14GB per post for every new celebrity post

Scaling and Performance

Capacity Estimation:
  - Monthly active users: 3B; Daily active: 1B
  - Feed reads: 5 per DAU/day = 5B feed reads/day = ~58K reads/second
  - Post writes: 500M posts/day = ~6K writes/second
  - Fan-out: 6K posts/sec * 350 avg followers = 2.1M Redis writes/second
  - Feed cache size: 3B users * 500 posts * 16 bytes = 24TB Redis
    (only ~500M active users cached; others regenerated on demand = 4TB active)
  - Post store: 500M posts/day * 365 days * 1KB avg = 182TB/year
  - Ranking service: 58K reads * 200 candidates * 50 features = 580B features/second
    Each computation ~0.001ms => 580B * 0.001ms = 580 CPU-seconds/second = ~600 cores

The fan-out tier scales horizontally by adding Kafka partitions and workers. The Redis feed cache cluster scales by adding shards (consistent hashing). The ranking service is stateless and scales with a simple horizontal pod autoscaler.

The dominant bottleneck at scale is the fan-out Redis write rate. If all 1 billion daily active users posted on the same day, and each had 350 average followers, that would be 350 billion Redis writes in a day - 4 million/second. The key mitigations: celebrity threshold keeps the highest-amplification users out of fan-out, and asynchronous queue-based fan-out provides backpressure.

Real World

Facebook’s original 2009 News Feed used a simple push fan-out for all users. The celebrity problem became apparent as influencers with millions of followers grew on the platform. The hybrid model (push for regular, pull for celebrity) was an engineering solution to a business problem - enabling celebrities to post on the platform without causing infrastructure incidents. Twitter (now X) uses a similar hybrid model described in their 2013 engineering blog.

Failure Modes and Recovery

Failure	Detection	Impact	Recovery
Fan-out worker crash	Kafka consumer lag metric spike	Fan-out backlog grows; followers’ feeds not updated	Kubernetes restarts worker; Kafka retains events for retry
Redis shard failure	Cache miss rate spike on affected key range	Feed reads for those users fall back to cold regeneration	Promote Redis replica to primary; regenerate feeds for affected users
Ranking service overload	p99 latency spike; dropped requests	Feed served unranked (fallback to timestamp order)	Auto-scale ranking pods; load shed low-priority requests
Post store shard failure	Write errors on post creation	Posts fail to save; fan-out not triggered	Route writes to replica; retry failed posts from client
Celebrity threshold misconfigured	Fan-out queue depth growing unboundedly	Eventual OOM on fan-out workers	Kill switch: pause fan-out for accounts above threshold X; deploy fix
Cold cache storm	All users returning after site-down	Feed regeneration jobs saturate post store reads	Rate-limit regeneration per user; serve empty feed with refresh CTA

Watch Out

The cold cache storm is the most operationally dangerous scenario. If the site goes down and 500M users log back in within 5 minutes, and every user’s feed cache has expired (or was flushed), the regeneration load on the post store is catastrophic. Always implement a rate-limited regeneration queue with jitter so the storm is spread over minutes rather than seconds.

Comparison of Approaches

Approach	Write Cost	Read Cost	Consistency	Celebrity Handling	Best Fit
Push (fan-out on write)	O(followers) per post	O(1) from cache	Eventual (minutes)	Catastrophic write storm	Low-follower users
Pull (fan-out on read)	O(1) per post	O(followed) per request	Near-real-time	No write storm	Celebrity accounts
Hybrid push+pull	O(followers) for regular, O(1) for celebrity	O(1) cache + O(celebrities) pull	Eventual + near-RT	Handled by pull path	Production at scale
Lambda hybrid (computed feed)	O(1) write	O(followed * recency window)	Strong	Complex to optimize	Full control over ranking

The hybrid push+pull model is the production choice at Facebook scale because it makes the common case (regular users with a few hundred followers) fast and predictable, while gracefully handling the celebrity outlier case. The cost is complexity: two code paths, two storage patterns, and merge logic at read time.

Key Takeaways

Hybrid fanout - push-on-write for regular users and pull-on-read for celebrities is the only viable architecture when follower counts span six orders of magnitude (10 to 200M).
Feed cache stores only post IDs in a sorted set, not full post objects - this keeps the memory footprint manageable and allows ranking to re-order posts at read time using fresh signals.
Ranking runs at read time using pre-computed affinity features - running ranking at write time would require storing a separate feed cache for every possible ranking function.
Write amplification math is your sizing guide - mean followers * posts/second gives you the Redis write throughput you need to provision.
Celebrity threshold is an engineering constant that hides a business tradeoff - lowering it reduces write amplification but increases read latency for users who follow many celebrities.
Cursor-based pagination using timestamps is stable under concurrent writes - offset-based pagination breaks when new posts arrive.
Cold cache regeneration must be rate-limited - a returning user surge can overwhelm the post store if every user triggers full feed regeneration simultaneously.
Affinity features are the most computationally expensive part of ranking and must be pre-computed and cached; real-time affinity computation at query time is not viable at 58K requests/second.

The counter-intuitive lesson is that Facebook’s News Feed is primarily a data locality problem, not a computation problem. The ranking math is simple; the hard work is getting the right post IDs and their features to the right place (the reader’s feed cache) before the reader asks for them.

Frequently Asked Questions

Q: Why store post IDs in the feed cache rather than full post objects? A: Post objects are mutable - a post can be edited, liked 1000 more times, or deleted after it appears in someone’s feed. If you cached full post objects, you would need to invalidate and update every follower’s cache on every engagement event. By storing only the post ID and fetching fresh post data at read time, the feed cache becomes write-once (fan-out appends the ID, never updates it), and the post store is the single source of truth for all mutable fields.

Q: How does the ranking handle the case where a user follows 5,000 accounts? A: The feed cache is capped at 500 post IDs. For users who follow 5,000 accounts, the fan-out writes compete for those 500 slots. The ranking model’s output determines which posts survive in the cache - low-affinity posts get trimmed during the ZREMRANGEBYRANK step. At read time, the system still fetches 200 candidates from the cache, not all 500. The 500-cap is a deliberate memory bound, not a quality bound.

Q: What happens to the fan-out for a user who is offline for a month? A: Fan-out still writes to the offline user’s feed cache. The cache TTL is 7 days, so after a week, the cache expires and the 30 days of accumulated writes are lost. When the user returns, the system detects a cache miss and regenerates the feed by scanning recent posts from all followed authors. This cold start path is slower (2-3 seconds) but only triggers on true cold cache misses.

Q: How do you prevent the ranking model from creating a filter bubble? A: The feed ranking model at Facebook includes a diversity signal that penalizes showing too many posts from the same author consecutively, and a serendipity term that occasionally promotes content from authors the user rarely interacts with. The diversity controls are tunable business levers separate from the pure engagement-maximizing objective. This is a product decision, not a systems decision.

Q: Why not use Redis Streams instead of sorted sets for the feed cache? A: Redis Streams are optimized for append-only message queues with consumer groups - they lack the O(1) rank trimming and cursor-based range query that sorted sets provide natively. ZRANGEBYSCORE with a timestamp cursor and ZREMRANGEBYRANK for capping at 500 are exactly the operations we need, and sorted sets execute both in O(log N). Streams would require manual offset tracking and can’t trim to a max size without external logic.

Q: How does the system handle posts that go viral after they are already in the feed cache? A: The feed cache stores post IDs, not scores. The ranking model re-scores posts at read time using current engagement metrics. So a post that was ranked low when it was first inserted (because it had 5 likes) can appear higher in the ranking after it goes viral (10,000 likes). The feed cache is just a candidate pool - the ranking at read time reflects the current state of engagement signals.

Interview Questions

Q: Design Facebook’s News Feed for 3 billion users. Expected depth: Cover the write vs read amplification tradeoff, the push-on-write fan-out architecture, the celebrity problem and hybrid pull model, feed cache structure (Redis sorted sets with post IDs), and ranking model design. Discuss capacity estimation for Redis storage. Name the cursor-based pagination approach and explain why it is stable.

Q: A celebrity with 200 million followers posts a video. Walk me through exactly what happens. Expected depth: Post write service saves to post store, publishes event. Fan-out worker checks celebrity threshold (>100K followers). Skips fan-out, writes to celebrity timeline sorted set only. At read time: any user who follows this celebrity fetches from the celebrity’s timeline. Merge into candidate list with pre-computed feed. Ranking model scores the celebrity post using affinity signal (how often this user has engaged with this celebrity). Top 20 posts served.

Q: How would you implement the affinity score between a user and an author? Expected depth: Affinity is a decay-weighted probability of engagement. For each user-author pair, track likes, comments, views in a rolling 30-day window with per-event timestamps. Compute weighted interaction rate. Store in a feature store (Redis or similar) keyed by user_id:author_id. Background job re-computes affinity daily from raw engagement log. Cold start: new follows get a default affinity signal based on content-type preferences and mutual friends.

Q: The fan-out queue is backing up and falling behind. What do you do? Expected depth: First check if it is a celebrity causing the backlog (one author with huge follower list). If yes, enforce the celebrity threshold immediately. If it is general backlog: add fan-out worker pods, check if Redis cluster is the bottleneck (write latency), and consider enabling write batching (pipeline multiple followers per Redis round trip). As a temporary measure, increase the celebrity threshold to route more authors to pull model and reduce fan-out volume.

Q: How does feed pagination work when new posts arrive while a user is scrolling? Expected depth: Timestamp-based cursor: the cursor encodes the timestamp of the last-seen post. New posts have newer timestamps and appear before the cursor, so they show at the top but do not affect the pagination continuation. The next page request fetches posts with timestamp < cursor, which is stable regardless of new arrivals. Contrast with offset-based pagination (page=2): if a new post arrives between page 1 and page 2, the entire result set shifts by one, causing duplicates or missed posts at page boundaries.

Premium Content

Unlock the full article along with everything else in the archive — all in one place.

In-depth analysis Expert insights Full archive access

Unlock Full Article