Distributed Databases

A Distributed Database Management System (DDBMS) manages a collection of logically interrelated databases spread across multiple sites connected by a network. Each site runs its own local DBMS instance, but the system presents a unified logical database to users. Distributed databases address the need for scalability, fault tolerance, and data locality.

Parallel DBMS

Before distribution across sites, parallelism within a single cluster improves performance through concurrent operations:

Type	Description	Example
Data Parallelism	Data sharded across nodes; each processes its subset independently	Large dataset split into blocks, each processed on a different node
Query Parallelism	Different parts of a single query execute simultaneously on different nodes	One node performs a join, another handles aggregation
Transaction Parallelism	Multiple transactions execute in parallel on different nodes	Financial system processing thousands of concurrent transfers

Architectures

Shared-Nothing

Each node has its own CPU, memory, and storage. Nodes communicate only via network. Most scalable architecture.

Examples: CockroachDB, YugabyteDB, Cassandra, Amazon Aurora
Scaling: Add more nodes; data redistributed automatically

Shared-Disk

All nodes share access to the same storage (typically SAN/NAS). Each node has its own CPU and memory.

Examples: Oracle RAC, Amazon Aurora (storage layer)
Trade-off: Simpler data management but storage becomes a bottleneck

Shared-Memory

All processors share the same memory and storage. Limited to a single machine.

Examples: Multi-core database servers, SAP HANA
Limitation: Cannot scale beyond a single machine

Data Distribution Strategies

Fragmentation

Breaking a relation R into pieces distributed across sites. See Fragmentation for the full deep-dive with examples, mermaid diagrams, and design process. Expressed formally in Relational Algebra:

Horizontal fragmentation — Rows split by a predicate:

$R_{i} = σ_{condition_{i}} (R) Reconstruct: R = ⋃_{i = 1}^{n} R_{i}$

Vertical fragmentation — Columns split across sites (each fragment must include the Primary Key for reconstruction):

$R_{i} = π_{attributes_{i}} (R) Reconstruct: R = R_{1} ⋈ R_{2} ⋈ \dots ⋈ R_{n}$

Hybrid (mixed) fragmentation — Horizontal fragments further divided vertically, or vice versa. Reconstruction requires both union and join operations.

Replication

Maintaining copies of data across multiple sites:

Full replication — Entire database copied to every site. Maximum availability, highest write overhead.
Partial replication — Selected tables/fragments replicated to specific sites.
No replication — Each fragment on exactly one site. Simplest but single point of failure.

Sharding

A specific form of horizontal fragmentation where each shard is an independent database:

Range sharding — Partition by key range (A-M on shard 1, N-Z on shard 2)
Hash sharding — Partition by hash(key) mod N
List sharding — Partition by explicit value lists

Transparency

A well-designed DDBMS hides distribution complexity:

Location transparency — Users don’t know where data is physically stored
Replication transparency — Users don’t know data is replicated
Fragmentation transparency — Users don’t know data is fragmented
Transaction transparency — ACID guarantees across distributed transactions
Failure transparency — System handles node failures gracefully

Distributed Transactions

Two-Phase Commit (2PC)

Ensures atomicity across multiple sites:

Prepare phase — Coordinator asks all participants to vote (commit or abort)
Commit phase — If all vote commit, coordinator tells all to commit; otherwise all abort

Drawback: Blocking — if the coordinator crashes after prepare but before commit, participants are stuck.

Three-Phase Commit (3PC)

Adds a pre-commit phase to reduce blocking. Rarely used in practice due to complexity.

Consistency Models

Model	Description
Strong consistency	All nodes see the same data at the same time (ACID)
Eventual consistency	All nodes converge to the same state given no new updates (BASE)
Causal consistency	Operations causally related are seen in order
Read-your-writes	A process always sees its own writes

See CAP Theorem for the fundamental trade-offs.

Distributed Concurrency Control

Standard concurrency control becomes more complex in distributed environments:

Distributed Locking

Approach	How It Works	Trade-off
Centralized locking	Single site manages all locks	Simple but single point of failure / bottleneck
Distributed locking	Each site manages its own locks, coordinates globally	No bottleneck but complex coordination
Primary copy locking	One node holds the primary copy and manages locks for that data, even if replicas exist elsewhere	Simpler than full distribution but higher latency for non-primary sites

Distributed Deadlock

Deadlock detection in distributed systems is harder because the wait-for graph spans multiple nodes:

Centralized detection — One designated site collects wait-for data from all nodes. Simple but single point of failure.
Distributed detection — Each node detects local deadlocks and uses edge-chasing (probe messages sent across transaction cycles) to detect global deadlocks.
Hierarchical detection — Nodes organized in a hierarchy; each level checks for deadlocks in its group and escalates.
Path-pushing — Each node pushes its local wait-for graph to neighbors for incremental cycle detection.

Distributed MVCC

Multi-version concurrency control in distributed settings keeps multiple versions across nodes, allowing readers to access the most recently committed version while writers create new versions. Requires efficient version management and garbage collection across sites.

Distributed Query Processing

Queries spanning multiple sites require optimization for network cost:

Query decomposition — Break a high-level query into sub-operations executable on local nodes
Data localization — Reduce fragments to only those relevant to the query
Join ordering — Minimize data transfer by joining smaller intermediate results first
Semi-join reduction — Send only the join column values to the remote site, filter there, then transfer matching rows
Cost-based optimization — Estimate communication costs alongside I/O and CPU costs when selecting execution plans

Fragmentation vs Partitioning vs Sharding

These terms are related but distinct:

Term	Scope	Context
Fragmentation	Breaking a relation into pieces for distribution	DDBMS design theory
Partitioning	Dividing a table into independent parts	Can be within a single DBMS or across multiple
Sharding	Horizontal partitioning across separate servers	Specifically for scaling out

Sharding is a form of horizontal fragmentation applied to scale-out architectures. Fragmentation is the broader theoretical concept used in DDBMS design. Partitioning encompasses both and can apply to centralized and distributed environments.

Homogeneous vs Heterogeneous

Homogeneous — All sites run the same DBMS software. Simpler coordination.
Heterogeneous — Sites run different DBMS (e.g., PostgreSQL + MySQL + MongoDB). Requires middleware for translation and coordination.

Quartz 5

Explorer

Distributed Databases

Distributed Databases

Parallel DBMS

Architectures

Shared-Nothing

Shared-Disk

Shared-Memory

Data Distribution Strategies

Fragmentation

Replication

Sharding

Transparency

Distributed Transactions

Two-Phase Commit (2PC)

Three-Phase Commit (3PC)

Consistency Models

Distributed Concurrency Control

Distributed Locking

Distributed Deadlock

Distributed MVCC

Distributed Query Processing

Fragmentation vs Partitioning vs Sharding

Homogeneous vs Heterogeneous

Sources

Graph View

Table of Contents

Backlinks