Sharding and Partitioning: Splitting Data Across Multiple Nodes
Your PostgreSQL database has grown to 2TB. Queries that used to take 5ms now take 500ms. You have added every index you can think of. You have tuned the query planner. You have upgraded to the largest available instance. The table has 800 million rows and it is still growing at 10 million rows per day. You are out of vertical scaling options.
This is when you shard. And this is when your simple, elegant database architecture becomes a distributed systems problem.
What sharding actually is
Sharding (also called horizontal partitioning) splits a large dataset across multiple database instances, each holding a subset of the data. Each shard is a fully independent database - it has its own storage, its own connections, its own indexes.
Partitioning is the broader term. It includes:
- Horizontal partitioning (sharding) - Split rows across multiple databases. Each database has the same schema but different rows.
- Vertical partitioning - Split columns across multiple databases. User profile data in one database, user activity data in another.
- Functional partitioning - Split by feature. Orders database, users database, inventory database.
graph TB subgraph before["Before Sharding - Single Database"] DB["Users Table 800M rows All users"] end subgraph after["After Sharding - 4 Shards"] S1["Shard 1 Users A-G 200M rows"] S2["Shard 2 Users H-N 200M rows"] S3["Shard 3 Users O-T 200M rows"] S4["Shard 4 Users U-Z 200M rows"] end subgraph router["Shard Router"] R["Application or Proxy Layer Routes by shard key"] end R --> S1 R --> S2 R --> S3 R --> S4 style DB fill:#FCEBEB,stroke:#A32D2D,color:#791F1F style S1 fill:#EEEDFE,stroke:#534AB7,color:#3C3489 style S2 fill:#EEEDFE,stroke:#534AB7,color:#3C3489 style S3 fill:#EEEDFE,stroke:#534AB7,color:#3C3489 style S4 fill:#EEEDFE,stroke:#534AB7,color:#3C3489 style R fill:#E1F5EE,stroke:#0F6E56,color:#085041
Sharding strategies
Range-based sharding
Divide the key space into ranges. Users with IDs 1-1,000,000 go to shard 1, IDs 1,000,001-2,000,000 go to shard 2, and so on.
Pros: Simple to understand. Range queries are efficient - all data for a range is on one shard.
Cons: Hotspots. If new users always get the highest IDs, all writes go to the last shard. Time-series data is the classic example: all new events go to the “current” shard while old shards sit idle.
Hash-based sharding
Hash the shard key and use modulo to determine the shard: shard = hash(user_id) % num_shards.
Pros: Even distribution. No hotspots from sequential keys.
Cons: Range queries require hitting all shards. Adding shards requires remapping most keys (use consistent hashing to mitigate this).
Directory-based sharding
Maintain a lookup table that maps each key (or key range) to a shard. The application queries the directory to find the right shard.
Pros: Maximum flexibility. You can move individual keys between shards. Supports non-uniform distribution.
Cons: The directory is a single point of failure and a bottleneck. Must be cached aggressively.
Geographic sharding
Route data based on the user’s location. European users go to EU shards, US users go to US shards.
Pros: Data residency compliance (GDPR). Lower latency for users.
Cons: Cross-region queries are expensive. Users who travel create complications.
graph LR subgraph strategies["Sharding Strategy Comparison"] RNG["Range Simple, hotspot risk"] HSH["Hash Even, no range queries"] DIR["Directory Flexible, extra lookup"] GEO["Geographic Compliance, latency"] end subgraph best["Best For"] B1["Time-independent data with range queries"] B2["Write-heavy workloads needing even distribution"] B3["Non-uniform data or frequent resharding"] B4["Multi-region apps with data residency needs"] end RNG --- B1 HSH --- B2 DIR --- B3 GEO --- B4 style RNG fill:#FAEEDA,stroke:#854F0B,color:#633806 style HSH fill:#E1F5EE,stroke:#0F6E56,color:#085041 style DIR fill:#EEEDFE,stroke:#534AB7,color:#3C3489 style GEO fill:#F1EFE8,stroke:#888780,color:#444441
Choosing a shard key
The shard key is the most important decision in your sharding design. A bad shard key causes hotspots, makes queries inefficient, or makes resharding painful.
Good shard key properties:
- High cardinality (many distinct values)
- Even distribution of access patterns
- Appears in most queries (so you can route to the right shard)
- Does not change over time (changing a shard key requires moving the row)
Common shard keys:
user_id- Good for user-centric applications. All data for a user is on one shard.tenant_id- Good for multi-tenant SaaS. All data for a tenant is on one shard.created_at(time) - Risky. All new writes go to the current time shard.order_id- Good for order data if orders are accessed by ID, not by user.
Where it breaks or gets interesting
Cross-shard queries
The biggest operational pain of sharding. If you need to query data across multiple shards (find all users who signed up this week, count total orders), you must query all shards and aggregate the results in the application. This is slow, complex, and hard to paginate correctly.
Design your shard key so that the most common queries only need one shard. Accept that some queries (analytics, reporting) will be slow or need to run against a separate data warehouse.
Cross-shard transactions
ACID transactions across multiple shards require distributed transactions (two-phase commit). Two-phase commit is slow, fragile, and a single point of failure. Most sharded systems avoid cross-shard transactions entirely by designing the data model so that related data lives on the same shard.
If you need cross-shard transactions, consider: eventual consistency with compensating transactions (Saga pattern), or a database that handles distributed transactions natively (CockroachDB, Google Spanner).
Resharding
Your initial shard count was 4. Now you need 8. With hash-based sharding, you need to move half your data. This is a major operational event. Options:
- Double the shards - Split each shard into two. Each shard’s data is divided in half. Requires careful coordination.
- Consistent hashing - Minimizes data movement when adding shards.
- Logical shards - Create many more logical shards than physical shards (e.g., 1024 logical shards on 4 physical nodes). When you add physical nodes, move logical shards rather than individual rows. This is how Vitess and some Redis Cluster implementations work.
The hotspot problem in practice
Even with hash-based sharding, hotspots happen. A celebrity user with millions of followers generates far more reads than an average user. Their shard becomes a hotspot. Solutions: cache the celebrity’s data aggressively, replicate hot rows to multiple shards, or use application-level sharding for specific hot keys.
Real-world systems
MySQL with Vitess - Vitess adds a sharding layer on top of MySQL. It handles query routing, resharding, and connection pooling. Used by YouTube, Slack, and GitHub.
PostgreSQL with Citus - Citus extends PostgreSQL with sharding. Distributes tables across worker nodes. Supports distributed queries and joins.
MongoDB - Built-in sharding. Choose a shard key per collection. The mongos router handles query routing. Supports range and hash sharding.
Cassandra - Sharding is built into the core via consistent hashing. Every node owns a portion of the token ring. No separate routing layer needed.
DynamoDB - Fully managed sharding. You specify a partition key (shard key). DynamoDB handles splitting and rebalancing automatically. You never see the shards.
Instagram - Sharded PostgreSQL. User data sharded by user ID. 12,000 logical shards mapped to a smaller number of physical PostgreSQL instances. Logical shards can be moved between physical instances for rebalancing.
How to apply it in practice
Do not shard prematurely
Sharding adds enormous complexity. Before sharding:
- Optimize queries and indexes
- Add read replicas for read scaling
- Vertically scale the primary
- Consider a different database (columnar for analytics, document for flexible schema)
Shard only when you have exhausted these options and have a clear bottleneck that sharding solves.
The logical shard pattern
Create 256 or 1024 logical shards at the start, even if you only have 4 physical nodes. Map logical shards to physical nodes in a config table. When you need to scale, move logical shards to new physical nodes without changing application code. This is much easier than resharding.
Keep related data together
Design your data model so that data accessed together lives on the same shard. In a multi-tenant SaaS app, shard by tenant ID so all of a tenant’s data is on one shard. This eliminates cross-shard queries for the common case.
FAQ
Q: What is the difference between sharding and replication?
Replication copies the same data to multiple nodes for redundancy and read scaling. Every replica has all the data. Sharding splits different data across multiple nodes for write scaling and storage scaling. Each shard has a subset of the data. Most production systems use both: each shard has multiple replicas.
Q: Should I shard at the application layer or use a database that handles it?
Application-layer sharding gives you full control but requires significant engineering effort. Database-native sharding (Cassandra, DynamoDB, MongoDB) handles the complexity for you but constrains your query patterns. Middleware sharding (Vitess, Citus) is a middle ground. For most teams, a database with built-in sharding is the right choice. Application-layer sharding is justified when you need fine-grained control over data placement or when you are already heavily invested in a database that does not support native sharding.
Q: How do you handle auto-increment IDs with sharding?
Auto-increment IDs from a single database do not work across shards - each shard would generate conflicting IDs. Solutions: use UUIDs (globally unique but large and random, which hurts index locality), use a centralized ID generator (Twitter’s Snowflake, Instagram’s ID scheme), or use a composite key (shard ID + local sequence). Instagram’s approach: 64-bit IDs with timestamp + shard ID + sequence number. This gives globally unique, roughly time-ordered IDs without a central coordinator.
Interview questions
Q1: You are sharding a social network’s posts table by user ID. A user with 50 million followers posts something. What happens and how do you handle it?
Strong answer: The post is written to the shard for that user’s ID - that part is fine. The problem is fan-out: delivering the post to 50 million followers’ feeds. If you do this synchronously, the write takes forever. If you do it asynchronously via a queue, the queue gets 50 million messages for one post. The standard solution is a hybrid fan-out strategy: for regular users, push the post to followers’ feed caches at write time (write fan-out). For celebrity users (above a follower threshold), do not pre-compute the feed. Instead, at read time, fetch the celebrity’s recent posts and merge them with the pre-computed feed from regular users (read fan-out). Twitter uses this approach. The threshold is typically around 1 million followers.
Q2: Your sharded database has 8 shards. You need to add 4 more shards to handle growth. Walk through the resharding process.
Strong answer: The safest approach is to double to 16 shards rather than going to 12, because doubling means each existing shard splits exactly in half with no data movement between non-adjacent shards. Process: first, set up 8 new empty shards. Then, for each existing shard, split it: copy half the data to the new shard while the old shard continues serving traffic. Use a dual-write period where writes go to both old and new shards. Once the new shard is caught up, switch reads to the new shard. Then stop writes to the old shard. This is a rolling migration that avoids downtime. The whole process takes days to weeks depending on data volume. Tools like Vitess automate much of this.
Q3: How would you design the sharding strategy for a multi-tenant SaaS application with 10,000 tenants of wildly different sizes?
Strong answer: Shard by tenant ID, but use a directory-based approach rather than hash-based. This lets you place large tenants on dedicated shards and pack many small tenants onto shared shards. Maintain a tenant-to-shard mapping table (cached in Redis). Small tenants (say, under 10,000 rows) share shards - maybe 100 small tenants per shard. Medium tenants get their own shard. Large enterprise tenants get dedicated shards or even dedicated database instances. This is the approach used by Salesforce and many enterprise SaaS companies. The directory gives you flexibility to move tenants between shards for rebalancing without changing application code.