Replication Strategies: Keeping Copies of Your Data in Sync


Your database primary goes down at 2am. You have a replica. The replica has all the data up to 3 seconds ago. Those 3 seconds of writes are gone. Your on-call engineer promotes the replica to primary. The application reconnects. Most users never notice. Three customers lost their last action.

Was that acceptable? It depends entirely on what those 3 seconds contained and what your replication strategy was. Replication is not just about having a backup - it is about choosing exactly what guarantees you make about data durability, consistency, and availability when things go wrong.

What replication is

Replication means keeping copies of the same data on multiple nodes. The goals:

  • Durability - If one node fails, data is not lost
  • Availability - If one node fails, the system keeps serving requests
  • Read scaling - Distribute read traffic across multiple nodes
  • Latency - Serve reads from a node geographically close to the user

Every replication strategy makes tradeoffs between these goals.

Single-leader replication

One node is the leader (primary, master). All writes go to the leader. The leader replicates changes to followers (replicas, secondaries). Reads can go to the leader or followers.

This is the most common model. PostgreSQL, MySQL, MongoDB (replica sets), and Redis all use it.

graph TB
subgraph single["Single-Leader Replication"]
  W["Write client"] -->|"all writes"| L["Leader
Primary"]
  L -->|"replication stream"| F1["Follower 1
Replica"]
  L -->|"replication stream"| F2["Follower 2
Replica"]
  R1["Read client"] -->|"reads"| L
  R2["Read client"] -->|"reads"| F1
  R3["Read client"] -->|"reads"| F2
end

style L fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style F1 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style F2 fill:#E1F5EE,stroke:#0F6E56,color:#085041

Synchronous vs asynchronous replication

Synchronous replication - The leader waits for at least one follower to confirm it has received the write before acknowledging the write to the client. Guarantees no data loss on leader failure (the follower has the data). Cost: higher write latency, and if the synchronous follower is slow or down, writes block.

Asynchronous replication - The leader acknowledges the write immediately and replicates in the background. Lower write latency. Risk: if the leader fails before replication completes, those writes are lost. This is the default for most databases.

Semi-synchronous - One follower is synchronous, the rest are asynchronous. Guarantees at least one copy of every write exists somewhere. PostgreSQL’s synchronous_standby_names implements this.

Replication lag

In async replication, followers are always slightly behind the leader. This lag is usually milliseconds but can grow to seconds or minutes under heavy load or network issues. A read from a follower might return stale data.

This causes the “read your own writes” problem: you write something, immediately read it back from a follower, and see the old value. Solutions: route reads to the leader for a short window after a write, or use a replication position token to ensure the follower has caught up before serving the read.

Multi-leader replication

Multiple nodes accept writes. Each leader replicates to the others. Used when you need writes in multiple geographic regions.

Use cases:

  • Multi-datacenter deployments where you want writes to be local (low latency)
  • Offline-capable clients (each device is a leader, syncs when online)
  • Collaborative editing (each user’s device accepts writes)

The fundamental problem: two leaders can accept conflicting writes simultaneously. User A updates their name to “Alice” on the EU leader. User B (or the same user on a different device) updates it to “Alicia” on the US leader. Both writes succeed. Now you have a conflict.

Conflict resolution strategies:

  • Last-write-wins (LWW) - The write with the highest timestamp wins. Simple but loses data.
  • Application-level merge - Surface the conflict to the application and let it decide. Used by CouchDB.
  • CRDTs - Data structures that merge automatically without conflicts. Used for counters, sets, and some document types.
graph LR
subgraph multi["Multi-Leader Replication"]
  EU["EU Leader"] -->|"async replication"| US["US Leader"]
  US -->|"async replication"| EU
  WEU["EU Write client"] -->|"writes"| EU
  WUS["US Write client"] -->|"writes"| US
  CONF["Conflict!
Same row updated
on both leaders"]
end

style EU fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style US fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style CONF fill:#FCEBEB,stroke:#A32D2D,color:#791F1F

Leaderless replication

No single node is the leader. Clients write to multiple nodes simultaneously. Reads also go to multiple nodes. Consistency is achieved through quorums.

With N replicas, write to W nodes, read from R nodes. If W + R > N, reads and writes overlap - you are guaranteed to read at least one node that has the latest write.

Common configuration: N=3, W=2, R=2. Write to 2 of 3 nodes, read from 2 of 3 nodes. Can tolerate one node failure for both reads and writes.

Cassandra, DynamoDB, and Riak use leaderless replication. The advantage: no single point of failure, no failover needed. The tradeoff: eventual consistency by default, conflict resolution required.

Read repair and anti-entropy

When a read hits multiple nodes and gets different values (one node is stale), the client can write the latest value back to the stale node. This is read repair. It keeps replicas in sync lazily.

Anti-entropy is a background process that compares replicas using Merkle trees and syncs differences. Cassandra’s nodetool repair runs this explicitly.

Where it breaks or gets interesting

Replication lag and monotonic reads

Without careful routing, a user can read from follower A (which is 1 second behind), then read from follower B (which is 5 seconds behind), and see data going backward in time. This violates monotonic reads.

Fix: route each user’s reads to the same replica consistently (sticky reads). Or use a replication position token.

The split-brain problem in single-leader

If the leader becomes unreachable (network partition), followers might elect a new leader. If the old leader is still running (just partitioned), you now have two leaders. Both accept writes. When the partition heals, you have conflicting data.

Prevention: require a quorum for leader election (Raft, Paxos). The old leader, unable to reach a quorum, stops accepting writes. Only one leader can exist at a time.

Replication and schema changes

Adding a column to a replicated table requires careful ordering. If you add the column to the leader first, the replication stream contains the new column but followers do not have it yet - replication breaks. The safe order: add the column to followers first (as nullable), then add it to the leader. This is why online schema migrations are complex.

Cascading replication

Instead of all followers replicating directly from the leader, some followers replicate from other followers. This reduces load on the leader but increases replication lag for downstream followers. Used when you have many replicas or when replicas are geographically distributed.

Real-world systems

PostgreSQL - Single-leader with streaming replication. Supports synchronous and asynchronous modes. Logical replication allows replicating specific tables or applying transformations. Patroni adds automatic failover using etcd or ZooKeeper for leader election.

MySQL - Single-leader with binary log replication. Group Replication adds multi-primary support with conflict detection. Used by most web companies at some point.

MongoDB - Replica sets with single-leader. Automatic leader election using Raft. Supports read preferences (primary, primaryPreferred, secondary, nearest).

Cassandra - Leaderless with tunable consistency. Replication factor per keyspace. Supports multi-datacenter replication with LOCAL_QUORUM for datacenter-local consistency.

CockroachDB - Multi-leader at the range level (each range has a Raft leader). Globally consistent using hybrid logical clocks. Automatic rebalancing.

Redis - Single-leader async replication. Redis Sentinel adds automatic failover. Redis Cluster adds sharding with per-shard replication.

How to apply it in practice

Choosing a replication strategy

Single-leader async - Default for most applications. Simple, low write latency, eventual consistency for reads from replicas. Acceptable for most use cases.

Single-leader sync (semi-sync) - When you cannot afford to lose any writes. Payment systems, financial data. Accept higher write latency.

Multi-leader - When you need writes in multiple regions with low latency. Accept conflict resolution complexity.

Leaderless - When you need high write availability and can tolerate eventual consistency. Cassandra for high-throughput writes.

Replication factor selection

  • Development: 1 (no replication, simplicity)
  • Production minimum: 3 (tolerate one failure)
  • Critical data: 5 across 3 AZs (tolerate AZ failure)

Monitoring replication health

Key metrics to watch:

  • Replication lag - How far behind are replicas? Alert if lag exceeds your RPO (recovery point objective).
  • Replication slot bloat (PostgreSQL) - Unused replication slots prevent WAL cleanup. Can fill your disk.
  • Replica count - Are all expected replicas connected?

FAQ

Q: What is the difference between replication and backup?

Replication is live copying of data to another node for availability and read scaling. Changes replicate in near-real-time. If you accidentally delete a table, the deletion replicates to all replicas immediately. Backups are point-in-time snapshots stored separately. They protect against accidental deletion, corruption, and ransomware. You need both: replication for availability, backups for recovery from logical errors.

Q: How does PostgreSQL logical replication differ from physical replication?

Physical (streaming) replication copies the raw WAL bytes. The replica is a byte-for-byte copy of the primary. You cannot replicate to a different PostgreSQL version or apply transformations. Logical replication decodes the WAL into logical operations (INSERT, UPDATE, DELETE) and replicates those. You can replicate specific tables, replicate to a different PostgreSQL version, or replicate to a non-PostgreSQL system. Logical replication is used for zero-downtime major version upgrades and for feeding data to analytics systems.

Q: What happens to replication during a network partition?

In single-leader async replication: the leader continues accepting writes. Followers stop receiving updates. When the partition heals, followers catch up from where they left off. No data is lost (the leader has everything). In single-leader sync replication: if the synchronous follower is on the other side of the partition, writes block until the partition heals or you demote the follower to async. In leaderless replication: writes succeed if they reach W nodes. If some nodes are partitioned, writes succeed on the reachable nodes and sync to the partitioned nodes when they reconnect (hinted handoff in Cassandra).

Interview questions

Q1: Your PostgreSQL primary has 3 async replicas. The primary crashes. How do you decide which replica to promote and what data might you lose?

Strong answer: Check the replication lag of each replica - the one with the least lag (closest to the primary’s last LSN) has the most data. Promote that one. The data lost is the writes that were on the primary but had not yet replicated to any replica - this is your RPO (recovery point objective). With async replication, this could be seconds of data. To minimize this: use semi-synchronous replication so at least one replica always has the latest write. Use Patroni or similar for automated failover with proper fencing (STONITH - shoot the other node in the head) to prevent the old primary from coming back and causing split-brain. After promotion, the other replicas need to be re-pointed to the new primary and may need to replay some WAL.

Q2: You are building a global social network. Users in Asia should see low write latency. How do you design replication?

Strong answer: Use multi-leader replication with one leader per region (US, EU, Asia). Writes go to the local leader for low latency. Leaders replicate to each other asynchronously. The challenge is conflict resolution: if two users update the same post simultaneously from different regions, you have a conflict. For most social data (posts, comments, likes), last-write-wins is acceptable - losing a like is not catastrophic. For critical data (account settings, payment info), route all writes to a single authoritative region and accept the latency. Use CRDTs for counters (like counts, view counts) so they merge automatically. This is roughly how Facebook and Twitter handle global replication.

Q3: Explain how quorum reads and writes work in a leaderless system and what consistency guarantees they provide.

Strong answer: With N replicas, a write quorum W means the write must succeed on W nodes before returning success. A read quorum R means the read must query R nodes and return the most recent value. If W + R is greater than N, the write and read sets must overlap - at least one node in the read set has the latest write. With N=3, W=2, R=2: write to 2 nodes, read from 2 nodes. The overlap is at least 1 node. This gives you strong consistency for that operation. With W=1, R=3: fast writes, slow reads. With W=3, R=1: slow writes, fast reads. If W + R is less than or equal to N (e.g., W=1, R=1), you might read from a node that does not have the latest write - eventual consistency. The tradeoff is always between consistency (higher W+R) and availability/performance (lower W+R).