Sharding

Sharding is a horizontal scaling strategy that distributes data across multiple independent database servers (shards). Each shard holds a subset of the total data and operates as an autonomous database. Unlike partitioning (which splits data within a single instance), sharding splits data across machines, enabling virtually unlimited growth in storage and throughput.

Why Shard

  • A single server has hit its CPU, memory, or disk ceiling (vertical scaling limit reached)
  • Write throughput exceeds what one machine can handle
  • Data must be geographically distributed for latency or compliance (GDPR, data residency)
  • Read/write ratio demands more parallelism than replicas alone can provide

Horizontal Sharding

The most common form. Rows are distributed across shards based on a shard key. Each shard has the same schema but holds different rows.

graph TD
    App[Application] --> Router[Shard Router / Proxy]
    Router --> S1[(Shard 1<br/>users A-F)]
    Router --> S2[(Shard 2<br/>users G-M)]
    Router --> S3[(Shard 3<br/>users N-S)]
    Router --> S4[(Shard 4<br/>users T-Z)]

    style S1 fill:#e1f5fe
    style S2 fill:#e8f5e9
    style S3 fill:#fff3e0
    style S4 fill:#fce4ec

Shard Key Selection

The shard key is the single most critical design decision. It determines which shard holds each row.

RequirementWhy It Matters
High cardinalityMany distinct values → even distribution across shards
Query alignmentMost queries should target a single shard (avoid scatter-gather)
Write distributionAvoid hotspots where one shard gets disproportionate writes
StabilityShard key should rarely change (changing it means moving the row)

Good shard keys: tenant_id (multi-tenant SaaS), user_id (social apps), region (geo-distributed)

Bad shard keys: created_at (all new writes hit the latest shard), status (low cardinality), auto-increment id (sequential writes create hotspots)

Horizontal Sharding Strategies

Range-Based Sharding

Rows are assigned to shards by key ranges.

graph LR
    subgraph Shard_1["Shard 1"]
        R1["user_id 1 – 1,000,000"]
    end
    subgraph Shard_2["Shard 2"]
        R2["user_id 1,000,001 – 2,000,000"]
    end
    subgraph Shard_3["Shard 3"]
        R3["user_id 2,000,001 – 3,000,000"]
    end
ProsCons
Range queries stay on one shardUneven distribution if data isn’t uniform
Simple to understand and implementNew data clusters on the “latest” shard (hotspot)
Easy rebalancing by splitting rangesRequires manual range management

Hash-Based Sharding

shard_id = hash(shard_key) mod N

graph TD
    K1["user_id = 42"] -->|"hash(42) mod 4 = 2"| S3["Shard 2"]
    K2["user_id = 99"] -->|"hash(99) mod 4 = 3"| S4["Shard 3"]
    K3["user_id = 7"] -->|"hash(7) mod 4 = 3"| S4
    K4["user_id = 200"] -->|"hash(200) mod 4 = 0"| S1["Shard 0"]
ProsCons
Even distribution regardless of key patternsRange queries must hit all shards
No hotspots from sequential keysAdding/removing shards remaps almost all keys (unless consistent hashing)
Simple routing logicHarder to reason about data locality

Directory-Based Sharding

A centralized Lookup Table maps each key (or key range) to a specific shard.

graph TD
    App[Application] --> Dir[(Shard Directory<br/>key → shard mapping)]
    Dir --> S1[(Shard 1)]
    Dir --> S2[(Shard 2)]
    Dir --> S3[(Shard 3)]
ProsCons
Maximum flexibility — any key can go anywhereDirectory is a single point of failure
Easy rebalancing (update the directory)Extra lookup on every query (latency)
Supports complex placement policiesDirectory itself must be highly available

Geographic Sharding

Shard by user region for data locality and compliance.

graph TD
    subgraph US["US Region"]
        US_DB[(US Shard<br/>us-east-1)]
    end
    subgraph EU["EU Region"]
        EU_DB[(EU Shard<br/>eu-west-1)]
    end
    subgraph APAC["APAC Region"]
        APAC_DB[(APAC Shard<br/>ap-southeast-1)]
    end
    US_User[US User] --> US_DB
    EU_User[EU User] --> EU_DB
    APAC_User[APAC User] --> APAC_DB

Ideal for GDPR compliance (EU data stays in EU) and reducing latency to end users.

Vertical Sharding

Instead of splitting rows, vertical sharding splits tables (or groups of tables) across different servers. Each shard has a different schema serving different functional areas.

graph TD
    App[Application] --> Auth[(Auth Shard<br/>users, sessions, roles)]
    App --> Orders[(Orders Shard<br/>orders, order_items, payments)]
    App --> Catalog[(Catalog Shard<br/>products, categories, reviews)]
    App --> Analytics[(Analytics Shard<br/>events, metrics, logs)]

    style Auth fill:#e1f5fe
    style Orders fill:#e8f5e9
    style Catalog fill:#fff3e0
    style Analytics fill:#fce4ec

When Vertical Sharding Makes Sense

  • Functional decomposition — Each microservice owns its own data store
  • Different access patterns — Auth needs fast reads, Analytics needs bulk writes
  • Different DBMS per shard — Auth on PostgreSQL, Catalog on MongoDB, Analytics on ClickHouse
  • Team ownership — Each team manages its own shard independently

Vertical + Horizontal Combined

Real-world systems often combine both. First split by domain (vertical), then shard large tables within each domain (horizontal):

graph TD
    App[Application Layer] --> Auth[Auth Service]
    App --> Orders[Orders Service]

    Auth --> Auth_DB[(Auth DB<br/>single instance)]

    Orders --> Router[Shard Router]
    Router --> OS1[(Orders Shard 1<br/>tenant A-M)]
    Router --> OS2[(Orders Shard 2<br/>tenant N-Z)]

Consistent Hashing

Standard modular hashing (hash(key) mod N) breaks when N changes — nearly every key gets remapped. Consistent hashing solves this by mapping both nodes and keys onto a hash ring.

graph TD
    subgraph Hash_Ring["Hash Ring (0 — 2^32)"]
        direction TB
        N1["Node A<br/>position 1000"]
        N2["Node B<br/>position 4000"]
        N3["Node C<br/>position 7000"]
    end

    K1["Key X<br/>hash = 2500"] -.->|"next node clockwise"| N2
    K2["Key Y<br/>hash = 5500"] -.->|"next node clockwise"| N3
    K3["Key Z<br/>hash = 8500"] -.->|"wraps around"| N1

When a node is added, only keys between the new node and its predecessor are remapped (~1/N of total keys). When a node is removed, its keys shift to the next node clockwise. Used by Cassandra, DynamoDB, Redis Cluster, and Amazon S3.

Virtual nodes further improve balance — each physical node owns multiple positions on the ring, smoothing out distribution.

Implementation Phases

A structured approach to deploying sharding:

graph LR
    A[1. Data Modeling] --> B[2. Shard Creation]
    B --> C[3. Data Migration]
    C --> D[4. Query Routing]
    D --> E[5. Shard Management]
    E --> F[6. Monitoring]
PhaseActivities
1. Data ModelingChoose shard key based on access patterns, cardinality, write distribution
2. Shard CreationProvision physical/virtual/container databases per shard
3. Data MigrationScripts and tools to redistribute existing data across shards
4. Query RoutingDeploy middleware/proxy layer that routes queries to the correct shard
5. Shard ManagementAdd/remove shards, rebalance data, handle shard failures
6. MonitoringTrack uneven shard sizes, high latencies, hardware failures; alert and auto-rebalance

Challenges

Cross-Shard Queries

Joins across shards require a scatter-gather pattern — query all shards, merge results at the application or proxy layer. Dramatically slower than single-shard queries.

Mitigation: Denormalize data so common queries stay within one shard. Co-locate related data on the same shard (e.g., user + user’s orders on the same shard).

Cross-Shard Transactions

ACID transactions spanning multiple shards require Two-Phase Commit (2PC) or application-level coordination (Saga pattern). Both add latency and complexity.

Rebalancing

When a shard becomes a hotspot or storage fills up, data must be migrated to new shards. Strategies:

  • Consistent hashing — Minimizes data movement
  • Dynamic splitting — Split a shard into two when it exceeds a threshold (like Cassandra’s virtual nodes)
  • Online migration — Dual-write to old and new shard, then cut over

Schema Changes

DDL must be coordinated across all shards — ALTER TABLE on 50 shards simultaneously. Tools like gh-ost (GitHub) and pt-online-schema-change (Percona) help.

Global Secondary Indexes

Each shard has its own local indexes. A query filtering on a non-shard-key column must hit every shard. Solutions:

  • Global index table — Separate table mapping secondary key → shard + primary key
  • Broadcast queries — Accept the fan-out cost for rare queries
  • Denormalize — Copy the indexed attribute into a secondary Lookup Table sharded differently

Sharding in Practice

SystemSharding Model
CassandraAutomatic consistent hashing with virtual nodes
MongoDBBuilt-in range or hash sharding with config servers
PostgreSQLManual (Citus extension for automatic), or application-level
MySQLVitess (YouTube’s sharding proxy), ProxySQL, or application-level
CockroachDBAutomatic range-based splitting with Raft consensus
Amazon DynamoDBAutomatic hash-based partitioning, fully managed
Redis ClusterHash slots (16384 slots distributed across nodes)

Decision Framework

flowchart TD
    A[Is one server at capacity?] -->|No| B[Use partitioning or read replicas]
    A -->|Yes| C{What's the bottleneck?}
    C -->|Write throughput| D[Horizontal sharding by write-heavy key]
    C -->|Storage| E[Horizontal sharding by data volume]
    C -->|Different workloads| F[Vertical sharding by domain]
    C -->|Latency / compliance| G[Geographic sharding]
    D --> H{Need cross-shard joins?}
    E --> H
    H -->|Rarely| I[Hash-based sharding]
    H -->|Often| J[Range-based or directory sharding<br/>co-locate related data]

Sources