Distributed Databases
A Distributed Database Management System (DDBMS) manages a collection of logically interrelated databases spread across multiple sites connected by a network. Each site runs its own local DBMS instance, but the system presents a unified logical database to users. Distributed databases address the need for scalability, fault tolerance, and data locality.
Parallel DBMS
Before distribution across sites, parallelism within a single cluster improves performance through concurrent operations:
| Type | Description | Example |
|---|---|---|
| Data Parallelism | Data sharded across nodes; each processes its subset independently | Large dataset split into blocks, each processed on a different node |
| Query Parallelism | Different parts of a single query execute simultaneously on different nodes | One node performs a join, another handles aggregation |
| Transaction Parallelism | Multiple transactions execute in parallel on different nodes | Financial system processing thousands of concurrent transfers |
Architectures
Shared-Nothing
Each node has its own CPU, memory, and storage. Nodes communicate only via network. Most scalable architecture.
- Examples: CockroachDB, YugabyteDB, Cassandra, Amazon Aurora
- Scaling: Add more nodes; data redistributed automatically
Shared-Disk
All nodes share access to the same storage (typically SAN/NAS). Each node has its own CPU and memory.
- Examples: Oracle RAC, Amazon Aurora (storage layer)
- Trade-off: Simpler data management but storage becomes a bottleneck
Shared-Memory
All processors share the same memory and storage. Limited to a single machine.
- Examples: Multi-core database servers, SAP HANA
- Limitation: Cannot scale beyond a single machine
Data Distribution Strategies
Fragmentation
Breaking a relation R into pieces distributed across sites. See Fragmentation for the full deep-dive with examples, mermaid diagrams, and design process. Expressed formally in Relational Algebra:
Horizontal fragmentation — Rows split by a predicate:
Vertical fragmentation — Columns split across sites (each fragment must include the Primary Key for reconstruction):
Hybrid (mixed) fragmentation — Horizontal fragments further divided vertically, or vice versa. Reconstruction requires both union and join operations.
Replication
Maintaining copies of data across multiple sites:
- Full replication — Entire database copied to every site. Maximum availability, highest write overhead.
- Partial replication — Selected tables/fragments replicated to specific sites.
- No replication — Each fragment on exactly one site. Simplest but single point of failure.
Sharding
A specific form of horizontal fragmentation where each shard is an independent database:
- Range sharding — Partition by key range (A-M on shard 1, N-Z on shard 2)
- Hash sharding — Partition by hash(key) mod N
- List sharding — Partition by explicit value lists
Transparency
A well-designed DDBMS hides distribution complexity:
- Location transparency — Users don’t know where data is physically stored
- Replication transparency — Users don’t know data is replicated
- Fragmentation transparency — Users don’t know data is fragmented
- Transaction transparency — ACID guarantees across distributed transactions
- Failure transparency — System handles node failures gracefully
Distributed Transactions
Two-Phase Commit (2PC)
Ensures atomicity across multiple sites:
- Prepare phase — Coordinator asks all participants to vote (commit or abort)
- Commit phase — If all vote commit, coordinator tells all to commit; otherwise all abort
Drawback: Blocking — if the coordinator crashes after prepare but before commit, participants are stuck.
Three-Phase Commit (3PC)
Adds a pre-commit phase to reduce blocking. Rarely used in practice due to complexity.
Consistency Models
| Model | Description |
|---|---|
| Strong consistency | All nodes see the same data at the same time (ACID) |
| Eventual consistency | All nodes converge to the same state given no new updates (BASE) |
| Causal consistency | Operations causally related are seen in order |
| Read-your-writes | A process always sees its own writes |
See CAP Theorem for the fundamental trade-offs.
Distributed Concurrency Control
Standard concurrency control becomes more complex in distributed environments:
Distributed Locking
| Approach | How It Works | Trade-off |
|---|---|---|
| Centralized locking | Single site manages all locks | Simple but single point of failure / bottleneck |
| Distributed locking | Each site manages its own locks, coordinates globally | No bottleneck but complex coordination |
| Primary copy locking | One node holds the primary copy and manages locks for that data, even if replicas exist elsewhere | Simpler than full distribution but higher latency for non-primary sites |
Distributed Deadlock
Deadlock detection in distributed systems is harder because the wait-for graph spans multiple nodes:
- Centralized detection — One designated site collects wait-for data from all nodes. Simple but single point of failure.
- Distributed detection — Each node detects local deadlocks and uses edge-chasing (probe messages sent across transaction cycles) to detect global deadlocks.
- Hierarchical detection — Nodes organized in a hierarchy; each level checks for deadlocks in its group and escalates.
- Path-pushing — Each node pushes its local wait-for graph to neighbors for incremental cycle detection.
Distributed MVCC
Multi-version concurrency control in distributed settings keeps multiple versions across nodes, allowing readers to access the most recently committed version while writers create new versions. Requires efficient version management and garbage collection across sites.
Distributed Query Processing
Queries spanning multiple sites require optimization for network cost:
- Query decomposition — Break a high-level query into sub-operations executable on local nodes
- Data localization — Reduce fragments to only those relevant to the query
- Join ordering — Minimize data transfer by joining smaller intermediate results first
- Semi-join reduction — Send only the join column values to the remote site, filter there, then transfer matching rows
- Cost-based optimization — Estimate communication costs alongside I/O and CPU costs when selecting execution plans
Fragmentation vs Partitioning vs Sharding
These terms are related but distinct:
| Term | Scope | Context |
|---|---|---|
| Fragmentation | Breaking a relation into pieces for distribution | DDBMS design theory |
| Partitioning | Dividing a table into independent parts | Can be within a single DBMS or across multiple |
| Sharding | Horizontal partitioning across separate servers | Specifically for scaling out |
Sharding is a form of horizontal fragmentation applied to scale-out architectures. Fragmentation is the broader theoretical concept used in DDBMS design. Partitioning encompasses both and can apply to centralized and distributed environments.
Homogeneous vs Heterogeneous
- Homogeneous — All sites run the same DBMS software. Simpler coordination.
- Heterogeneous — Sites run different DBMS (e.g., PostgreSQL + MySQL + MongoDB). Requires middleware for translation and coordination.