Distributed Databases

A Distributed Database Management System (DDBMS) manages a collection of logically interrelated databases spread across multiple sites connected by a network. Each site runs its own local DBMS instance, but the system presents a unified logical database to users. Distributed databases address the need for scalability, fault tolerance, and data locality.

Parallel DBMS

Before distribution across sites, parallelism within a single cluster improves performance through concurrent operations:

TypeDescriptionExample
Data ParallelismData sharded across nodes; each processes its subset independentlyLarge dataset split into blocks, each processed on a different node
Query ParallelismDifferent parts of a single query execute simultaneously on different nodesOne node performs a join, another handles aggregation
Transaction ParallelismMultiple transactions execute in parallel on different nodesFinancial system processing thousands of concurrent transfers

Architectures

Shared-Nothing

Each node has its own CPU, memory, and storage. Nodes communicate only via network. Most scalable architecture.

  • Examples: CockroachDB, YugabyteDB, Cassandra, Amazon Aurora
  • Scaling: Add more nodes; data redistributed automatically

Shared-Disk

All nodes share access to the same storage (typically SAN/NAS). Each node has its own CPU and memory.

  • Examples: Oracle RAC, Amazon Aurora (storage layer)
  • Trade-off: Simpler data management but storage becomes a bottleneck

Shared-Memory

All processors share the same memory and storage. Limited to a single machine.

  • Examples: Multi-core database servers, SAP HANA
  • Limitation: Cannot scale beyond a single machine

Data Distribution Strategies

Fragmentation

Breaking a relation R into pieces distributed across sites. See Fragmentation for the full deep-dive with examples, mermaid diagrams, and design process. Expressed formally in Relational Algebra:

Horizontal fragmentation — Rows split by a predicate:

Vertical fragmentation — Columns split across sites (each fragment must include the Primary Key for reconstruction):

Hybrid (mixed) fragmentation — Horizontal fragments further divided vertically, or vice versa. Reconstruction requires both union and join operations.

Replication

Maintaining copies of data across multiple sites:

  • Full replication — Entire database copied to every site. Maximum availability, highest write overhead.
  • Partial replication — Selected tables/fragments replicated to specific sites.
  • No replication — Each fragment on exactly one site. Simplest but single point of failure.

Sharding

A specific form of horizontal fragmentation where each shard is an independent database:

  • Range sharding — Partition by key range (A-M on shard 1, N-Z on shard 2)
  • Hash sharding — Partition by hash(key) mod N
  • List sharding — Partition by explicit value lists

Transparency

A well-designed DDBMS hides distribution complexity:

  • Location transparency — Users don’t know where data is physically stored
  • Replication transparency — Users don’t know data is replicated
  • Fragmentation transparency — Users don’t know data is fragmented
  • Transaction transparencyACID guarantees across distributed transactions
  • Failure transparency — System handles node failures gracefully

Distributed Transactions

Two-Phase Commit (2PC)

Ensures atomicity across multiple sites:

  1. Prepare phase — Coordinator asks all participants to vote (commit or abort)
  2. Commit phase — If all vote commit, coordinator tells all to commit; otherwise all abort

Drawback: Blocking — if the coordinator crashes after prepare but before commit, participants are stuck.

Three-Phase Commit (3PC)

Adds a pre-commit phase to reduce blocking. Rarely used in practice due to complexity.

Consistency Models

ModelDescription
Strong consistencyAll nodes see the same data at the same time (ACID)
Eventual consistencyAll nodes converge to the same state given no new updates (BASE)
Causal consistencyOperations causally related are seen in order
Read-your-writesA process always sees its own writes

See CAP Theorem for the fundamental trade-offs.

Distributed Concurrency Control

Standard concurrency control becomes more complex in distributed environments:

Distributed Locking

ApproachHow It WorksTrade-off
Centralized lockingSingle site manages all locksSimple but single point of failure / bottleneck
Distributed lockingEach site manages its own locks, coordinates globallyNo bottleneck but complex coordination
Primary copy lockingOne node holds the primary copy and manages locks for that data, even if replicas exist elsewhereSimpler than full distribution but higher latency for non-primary sites

Distributed Deadlock

Deadlock detection in distributed systems is harder because the wait-for graph spans multiple nodes:

  • Centralized detection — One designated site collects wait-for data from all nodes. Simple but single point of failure.
  • Distributed detection — Each node detects local deadlocks and uses edge-chasing (probe messages sent across transaction cycles) to detect global deadlocks.
  • Hierarchical detection — Nodes organized in a hierarchy; each level checks for deadlocks in its group and escalates.
  • Path-pushing — Each node pushes its local wait-for graph to neighbors for incremental cycle detection.

Distributed MVCC

Multi-version concurrency control in distributed settings keeps multiple versions across nodes, allowing readers to access the most recently committed version while writers create new versions. Requires efficient version management and garbage collection across sites.

Distributed Query Processing

Queries spanning multiple sites require optimization for network cost:

  • Query decomposition — Break a high-level query into sub-operations executable on local nodes
  • Data localization — Reduce fragments to only those relevant to the query
  • Join ordering — Minimize data transfer by joining smaller intermediate results first
  • Semi-join reduction — Send only the join column values to the remote site, filter there, then transfer matching rows
  • Cost-based optimization — Estimate communication costs alongside I/O and CPU costs when selecting execution plans

Fragmentation vs Partitioning vs Sharding

These terms are related but distinct:

TermScopeContext
FragmentationBreaking a relation into pieces for distributionDDBMS design theory
PartitioningDividing a table into independent partsCan be within a single DBMS or across multiple
ShardingHorizontal partitioning across separate serversSpecifically for scaling out

Sharding is a form of horizontal fragmentation applied to scale-out architectures. Fragmentation is the broader theoretical concept used in DDBMS design. Partitioning encompasses both and can apply to centralized and distributed environments.

Homogeneous vs Heterogeneous

  • Homogeneous — All sites run the same DBMS software. Simpler coordination.
  • Heterogeneous — Sites run different DBMS (e.g., PostgreSQL + MySQL + MongoDB). Requires middleware for translation and coordination.

Sources