Database Sharding and Partitioning
Partitioning divides a large table into smaller, more manageable pieces within a single DBMS instance. Sharding extends this concept across multiple database servers, where each shard is an independent database holding a subset of the data. Both techniques improve scalability, query performance, and manageability for large datasets.
Partitioning (Single Instance)
Horizontal Partitioning
Splits rows across partitions based on a partition key.
-- PostgreSQL declarative partitioning
CREATE TABLE orders (
id SERIAL,
order_date DATE,
customer_id INT,
amount DECIMAL
) PARTITION BY RANGE (order_date);
CREATE TABLE orders_2025 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
CREATE TABLE orders_2026 PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');Partition strategies:
| Strategy | How it works | Best for |
|---|---|---|
| Range | Key falls within a range (dates, IDs) | Time-series, archival |
| Hash | hash(key) mod N | Even distribution |
| List | Key matches explicit values | Categorical data (region, status) |
| Composite | Combination of the above | Complex access patterns |
Vertical Partitioning
Splits columns across tables. Frequently-accessed columns in one partition, rarely-accessed or large columns (BLOBs) in another. Both share the same Primary Key.
-- Hot data (frequently queried)
CREATE TABLE user_profile (id, name, email, status);
-- Cold data (rarely queried)
CREATE TABLE user_details (id, bio, avatar_blob, preferences_json);Benefits of Partitioning
- Partition pruning — The optimizer scans only relevant partitions
- Parallel scans — Different partitions can be scanned concurrently
- Maintenance — Drop or archive entire partitions instead of DELETE operations
- Index size — Smaller per-partition indexes fit in memory
Sharding (Multi-Instance)
Sharding distributes partitions across separate database servers. Each shard is a standalone DBMS handling its own subset of data.
Shard Key Selection
The shard key determines which shard holds each row. Critical design decision:
- High cardinality — Shard key should have many distinct values for even distribution
- Query alignment — Most queries should target a single shard (avoid cross-shard joins)
- Write distribution — Avoid hotspots where one shard receives disproportionate writes
Sharding Strategies
- Range-based — Shard by key ranges (user IDs 1-1M on shard 1, 1M-2M on shard 2). Risk: uneven distribution if ranges aren’t balanced.
- Hash-based — hash(shard_key) mod N. Even distribution but range queries hit all shards.
- Directory-based — A lookup table maps keys to shards. Most flexible but the directory is a single point of failure.
- Geographic — Shard by user region for data locality compliance (GDPR) and lower latency.
Challenges
- Cross-shard queries — Joins across shards require scatter-gather, dramatically slower
- Cross-shard transactions — Require Two-Phase Commit (2PC) or application-level coordination
- Rebalancing — Adding/removing shards requires data migration (consistent hashing reduces this)
- Schema changes — Must be coordinated across all shards
- No global indexes — Each shard maintains its own indexes; global uniqueness requires coordination
Consistent Hashing
Maps both shard nodes and keys onto a hash ring. When a node is added or removed, only 1/N of keys need to be remapped (vs. nearly all keys with simple modular hashing). Used by Cassandra, DynamoDB, and Redis Cluster.
When to Use What
| Scenario | Approach |
|---|---|
| Table too large for efficient queries | Partitioning |
| Single server at capacity | Sharding |
| Time-series data with retention policies | Range partitioning |
| Multi-region deployment | Geographic sharding |
| Need cross-row transactions | Partitioning (avoid sharding) |
Sources
- Lecture-50-60.732 Partitioning and Sharding in Relational Databases
- Overview Database Architectures CS5200 38744 Database Management Sys SEC 04 Spring 2026 VTL-1-OL
Related
- Sharding — Deep-dive on horizontal, vertical, and geographic sharding with strategies and diagrams
- Fragmentation — Theoretical foundation: horizontal, vertical, hybrid fragmentation with relational algebra