Database Sharding and Partitioning

Partitioning divides a large table into smaller, more manageable pieces within a single DBMS instance. Sharding extends this concept across multiple database servers, where each shard is an independent database holding a subset of the data. Both techniques improve scalability, query performance, and manageability for large datasets.

Partitioning (Single Instance)

Horizontal Partitioning

Splits rows across partitions based on a partition key.

-- PostgreSQL declarative partitioning
CREATE TABLE orders (
    id SERIAL,
    order_date DATE,
    customer_id INT,
    amount DECIMAL
) PARTITION BY RANGE (order_date);
 
CREATE TABLE orders_2025 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
 
CREATE TABLE orders_2026 PARTITION OF orders
    FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');

Partition strategies:

Strategy	How it works	Best for
Range	Key falls within a range (dates, IDs)	Time-series, archival
Hash	hash(key) mod N	Even distribution
List	Key matches explicit values	Categorical data (region, status)
Composite	Combination of the above	Complex access patterns

Vertical Partitioning

Splits columns across tables. Frequently-accessed columns in one partition, rarely-accessed or large columns (BLOBs) in another. Both share the same Primary Key.

-- Hot data (frequently queried)
CREATE TABLE user_profile (id, name, email, status);
 
-- Cold data (rarely queried)
CREATE TABLE user_details (id, bio, avatar_blob, preferences_json);

Benefits of Partitioning

Partition pruning — The optimizer scans only relevant partitions
Parallel scans — Different partitions can be scanned concurrently
Maintenance — Drop or archive entire partitions instead of DELETE operations
Index size — Smaller per-partition indexes fit in memory

Sharding (Multi-Instance)

Sharding distributes partitions across separate database servers. Each shard is a standalone DBMS handling its own subset of data.

Shard Key Selection

The shard key determines which shard holds each row. Critical design decision:

High cardinality — Shard key should have many distinct values for even distribution
Query alignment — Most queries should target a single shard (avoid cross-shard joins)
Write distribution — Avoid hotspots where one shard receives disproportionate writes

Sharding Strategies

Range-based — Shard by key ranges (user IDs 1-1M on shard 1, 1M-2M on shard 2). Risk: uneven distribution if ranges aren’t balanced.
Hash-based — hash(shard_key) mod N. Even distribution but range queries hit all shards.
Directory-based — A lookup table maps keys to shards. Most flexible but the directory is a single point of failure.
Geographic — Shard by user region for data locality compliance (GDPR) and lower latency.

Challenges

Cross-shard queries — Joins across shards require scatter-gather, dramatically slower
Cross-shard transactions — Require Two-Phase Commit (2PC) or application-level coordination
Rebalancing — Adding/removing shards requires data migration (consistent hashing reduces this)
Schema changes — Must be coordinated across all shards
No global indexes — Each shard maintains its own indexes; global uniqueness requires coordination

Consistent Hashing

Maps both shard nodes and keys onto a hash ring. When a node is added or removed, only 1/N of keys need to be remapped (vs. nearly all keys with simple modular hashing). Used by Cassandra, DynamoDB, and Redis Cluster.

When to Use What

Scenario	Approach
Table too large for efficient queries	Partitioning
Single server at capacity	Sharding
Time-series data with retention policies	Range partitioning
Multi-region deployment	Geographic sharding
Need cross-row transactions	Partitioning (avoid sharding)

Sources

Sharding — Deep-dive on horizontal, vertical, and geographic sharding with strategies and diagrams
Fragmentation — Theoretical foundation: horizontal, vertical, hybrid fragmentation with relational algebra

Quartz 5

Explorer

Database Sharding and Partitioning

Database Sharding and Partitioning

Partitioning (Single Instance)

Horizontal Partitioning

Vertical Partitioning

Benefits of Partitioning

Sharding (Multi-Instance)

Shard Key Selection

Sharding Strategies

Challenges

Consistent Hashing

When to Use What

Sources

Graph View

Table of Contents

Backlinks

Quartz 5

Explorer

Database Sharding and Partitioning

Database Sharding and Partitioning

Partitioning (Single Instance)

Horizontal Partitioning

Vertical Partitioning

Benefits of Partitioning

Sharding (Multi-Instance)

Shard Key Selection

Sharding Strategies

Challenges

Consistent Hashing

When to Use What

Sources

Related

Graph View

Table of Contents

Backlinks