Read Replicas: Scaling Database Reads Without Sharding

Your database primary is at 80% CPU. You profile the queries. 90% of the load is reads: product pages, user profiles, search results. Only 10% is writes. You are about to shard your database - a complex, months-long project - when a colleague suggests adding a read replica. Two hours later, your primary is at 20% CPU.

Read replicas are one of the highest-leverage scaling techniques for read-heavy applications. They are simpler than sharding, cheaper than vertical scaling, and solve the most common database bottleneck.

What read replicas are

A read replica is a copy of your primary database that receives a continuous stream of changes from the primary and stays in sync. Reads can be directed to replicas. Writes always go to the primary.

The primary sends its write-ahead log (WAL) or binary log to replicas. Replicas apply the changes in order, staying a few milliseconds to seconds behind the primary.

graph TB
subgraph writes["Write Path"]
  APP_W["Application
Write requests"] -->|"INSERT UPDATE DELETE"| PRIMARY["Primary
Database"]
end

subgraph reads["Read Path"]
  APP_R["Application
Read requests"] -->|"SELECT"| LB["Read Load
Balancer"]
  LB --> R1["Replica 1"]
  LB --> R2["Replica 2"]
  LB --> R3["Replica 3"]
end

PRIMARY -->|"replication stream
async"| R1
PRIMARY -->|"replication stream
async"| R2
PRIMARY -->|"replication stream
async"| R3

style PRIMARY fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style R1 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style R2 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style R3 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style LB fill:#F1EFE8,stroke:#888780,color:#444441

What read replicas solve

Read throughput - Each replica can handle the same read load as the primary. With 3 replicas, you have 4x the read capacity (primary + 3 replicas).

Read latency - Geographic replicas serve reads from a location close to the user. A replica in Singapore serves Asian users with 5ms latency instead of 200ms to US-East.

Primary isolation - Heavy analytical queries (reports, aggregations) can run on a replica without impacting the primary’s performance for transactional queries.

High availability - If the primary fails, a replica can be promoted to primary. This is the basis of most database failover strategies.

What read replicas do not solve

Write throughput - All writes still go to the primary. If writes are the bottleneck, read replicas do not help. You need sharding or a different database.

Write latency - Writes still go to one node. Read replicas do not reduce write latency.

Storage - Each replica stores a full copy of the data. 3 replicas = 4x storage cost.

Replication lag: the fundamental tradeoff

Async replication means replicas are always slightly behind the primary. A write to the primary takes some time to propagate to replicas. This lag is usually milliseconds but can grow to seconds or minutes under heavy load.

The consequence: a read from a replica might return stale data. A user writes a post and immediately reads it back - if the read goes to a replica that has not yet received the write, the post appears to be missing.

Handling replication lag in the application

Read-your-writes consistency - After a write, route subsequent reads for that user to the primary for a short window (1-5 seconds). After that, reads can go to replicas. This ensures users always see their own writes.

Replication position tokens - After a write, return the replication position (LSN in PostgreSQL, GTID in MySQL). On subsequent reads, pass this token. The replica checks if it has caught up to that position before serving the read. If not, it either waits or routes to the primary.

Sticky reads - Route all reads from a given user to the same replica. If that replica is up to date, the user sees consistent data. If the replica is behind, the user might see stale data, but at least they will not see data going backward in time (monotonic reads).

Accept eventual consistency - For many use cases, brief staleness is acceptable. A product page showing a price that is 2 seconds old is fine. A bank balance showing a transaction that is 2 seconds old is not.

graph LR
subgraph lag["Replication Lag Scenarios"]
  W["User writes post"] -->|"write to primary"| P["Primary
post exists"]
  P -->|"async replication
100ms lag"| R["Replica
post not yet here"]
  W -->|"immediately reads"| R
  R -->|"post not found!"| W
end

subgraph fix["Fix: Read-Your-Writes"]
  W2["User writes post"] -->|"write to primary"| P2["Primary"]
  W2 -->|"reads for 5s
after write"| P2
  W2 -->|"reads after 5s"| R2["Replica
now in sync"]
end

style R fill:#FCEBEB,stroke:#A32D2D,color:#791F1F
style P2 fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style R2 fill:#E1F5EE,stroke:#0F6E56,color:#085041

Where it breaks or gets interesting

Replica lag under heavy write load

During a write spike (bulk import, batch job), the primary generates WAL faster than replicas can apply it. Replica lag grows. Reads from replicas become increasingly stale. If lag grows too large, you might need to route all reads to the primary temporarily, defeating the purpose of replicas.

Monitor replica lag continuously. Alert if it exceeds your acceptable staleness threshold. Consider pausing heavy batch jobs during peak traffic.

Cascading replica failures

If the primary fails and you promote a replica, the other replicas need to be re-pointed to the new primary. This requires them to find the point in the replication stream where the new primary diverged from the old primary and catch up from there. If the new primary was slightly ahead of the other replicas, they need to apply the missing changes. Tools like Patroni (PostgreSQL) and Orchestrator (MySQL) automate this.

Read replica for analytics: the right use case

Running analytical queries (aggregations, full table scans) on the primary blocks OLTP queries. A dedicated analytics replica handles these queries without impacting production. The replica can have different indexes optimized for analytical queries. This is a common pattern: one replica for the application, one replica for analytics/reporting.

Connection pooling with replicas

Each replica needs its own connection pool. Your application needs to know which pool to use for reads vs writes. Libraries like PgBouncer (PostgreSQL) and ProxySQL (MySQL) can handle this routing transparently.

Real-world systems

AWS RDS - Supports up to 15 read replicas per primary. Replicas can be in different regions (cross-region replicas). Aurora supports up to 15 replicas with sub-10ms replication lag.

Google Cloud SQL - Read replicas with automatic failover. Cross-region replicas for disaster recovery.

PostgreSQL with Patroni - Patroni manages primary election and replica configuration. Replicas automatically re-point to the new primary after failover.

MySQL with ProxySQL - ProxySQL routes read queries to replicas and write queries to the primary based on query type. Transparent to the application.

MongoDB - Replica sets with one primary and multiple secondaries. Read preference can be set to secondary to route reads to replicas. Supports nearest to route to the lowest-latency replica.

Instagram - Uses PostgreSQL with multiple read replicas. Reads are distributed across replicas. The application uses a custom connection router that tracks replication lag and routes reads to replicas that are sufficiently caught up.

How to apply it in practice

When to add read replicas

Add read replicas when:

Primary CPU is consistently above 60-70% and reads dominate the workload
You need geographic distribution for read latency
You need to isolate analytical queries from OLTP queries
You need a hot standby for failover

Do not add read replicas when:

Writes are the bottleneck (replicas do not help)
Your application cannot handle eventual consistency
The operational complexity is not worth the benefit at your scale

Routing reads to replicas

Options for routing:

Application-level routing - The application explicitly chooses primary or replica based on the operation type. Most control, most code complexity.
Connection proxy - ProxySQL, PgBouncer, or RDS Proxy routes queries based on read/write detection. Transparent to the application.
ORM-level routing - Some ORMs (Django, Rails) support read/write splitting natively.

Monitoring replica health

Key metrics:

Replication lag - How far behind is each replica? Alert if lag exceeds your SLA.
Replica count - Are all expected replicas connected and replicating?
Read distribution - Are reads evenly distributed across replicas?
Replica CPU/memory - Is any replica overloaded?

FAQ

Q: How many read replicas should you have?

Start with 1-2 replicas. Add more as read load grows. Each replica adds operational complexity (monitoring, failover configuration, connection pool management). More than 5-6 replicas is unusual for most applications. If you need more, consider whether caching (Redis) or sharding would be more appropriate.

Q: Can you write to a read replica?

No. Read replicas are read-only. Attempting to write to a replica returns an error. This is enforced at the database level. The only exception is if you promote the replica to primary, at which point it accepts writes.

Q: What is the difference between a read replica and a standby replica?

A read replica serves read traffic. A standby replica (hot standby) is kept in sync for failover purposes but may or may not serve read traffic. In PostgreSQL, hot standbys can serve reads. In MySQL, replicas can serve reads. The distinction is more about intent: a read replica is primarily for scaling reads, a standby is primarily for high availability. In practice, most standbys also serve reads to get double value from the infrastructure.

Interview questions

Q1: Your application has 10x more reads than writes. The primary database is at 90% CPU. How do you scale?

Strong answer: Add read replicas. With 10x more reads than writes, the primary is overwhelmed by reads. Adding 2-3 read replicas and routing reads to them should bring the primary CPU down to 20-30%. The implementation: add replicas (AWS RDS makes this a few clicks), add a connection proxy (ProxySQL or RDS Proxy) to route reads to replicas and writes to the primary, update the application connection string to point to the proxy. Monitor replication lag and add read-your-writes logic for operations where staleness is not acceptable (user profile updates, order confirmations). If the primary is still overloaded after adding replicas, check if there are write-heavy operations that can be optimized or moved to async processing.

Q2: A user updates their profile and immediately sees the old data. What is happening and how do you fix it?

Strong answer: The write went to the primary. The read went to a replica that has not yet received the replication update. This is the read-your-writes consistency problem. Fix: after a write, route subsequent reads for that user to the primary for a short window (5-10 seconds). Implementation options: store a “last write timestamp” in the user’s session. On reads, if the timestamp is recent, route to the primary. After the window expires, route to replicas. Alternatively, use replication position tokens: after a write, return the WAL LSN. On subsequent reads, pass the LSN. The replica checks if it has applied up to that LSN before serving the read. If not, it routes to the primary. This is more precise but requires database-level support.

Q3: You have 3 read replicas. One replica is consistently 30 seconds behind the primary. What do you do?

Strong answer: First, investigate the cause. Check the replica’s CPU, memory, and disk I/O - it might be overloaded. Check if there are long-running queries on the replica blocking replication. Check network bandwidth between primary and replica. Check if the primary is generating WAL faster than the replica can apply it (write spike). Short-term: remove the lagging replica from the read pool so users do not get stale data. Route its traffic to the other replicas. Long-term: fix the root cause. If the replica is overloaded, upgrade its instance size or reduce the queries it handles. If it is a network issue, investigate the network path. If the primary is generating too much WAL, optimize write patterns or add more replicas to distribute the replication load. Monitor lag continuously and alert before it becomes a problem.