Fragmentation
Fragmentation is the process of breaking a relation into smaller pieces called fragments, which can be stored and managed across different sites in a distributed database. Each fragment contains a subset of the original relation’s data. The goal is to place data close to where it’s most frequently accessed, reducing network traffic and query latency while enabling parallel processing.
Types of Fragmentation
Horizontal Fragmentation
Divides a relation into subsets of rows based on selection predicates. Each fragment holds tuples that satisfy a specific condition. The fragments are disjoint (no row appears in more than one fragment) and complete (every row appears in exactly one fragment).
Formal definition using Relational Algebra:
Reconstruction (lossless — original relation recoverable via union):
Example — A multinational company fragments its employees table by country:
-- Fragment 1: US employees (stored on US server)
CREATE TABLE employees_us AS
SELECT * FROM employees WHERE country = 'US';
-- Fragment 2: EU employees (stored on EU server)
CREATE TABLE employees_eu AS
SELECT * FROM employees WHERE country IN ('DE', 'FR', 'NL', 'ES');
-- Fragment 3: APAC employees (stored on APAC server)
CREATE TABLE employees_apac AS
SELECT * FROM employees WHERE country IN ('JP', 'IN', 'AU', 'SG');graph TD R["employees (full relation)"] --> F1["employees_us<br/>σ country='US'"] R --> F2["employees_eu<br/>σ country∈EU"] R --> F3["employees_apac<br/>σ country∈APAC"] F1 --- S1[(US Server)] F2 --- S2[(EU Server)] F3 --- S3[(APAC Server)]
Best for: Geographically distributed operations, region-specific access patterns, compliance (GDPR — EU data stays in EU).
Vertical Fragmentation
Divides a relation into subsets of columns. Each fragment contains a projection of the original relation’s attributes. Every fragment must include the Primary Key to enable reconstruction via join.
Formal definition:
Reconstruction (lossless — original relation recoverable via natural join):
Example — An HR system separates frequently-accessed profile data from sensitive payroll data:
-- Fragment 1: Profile data (high-frequency reads)
CREATE TABLE emp_profile AS
SELECT emp_id, name, email, department, hire_date FROM employees;
-- Fragment 2: Payroll data (restricted access, different server)
CREATE TABLE emp_payroll AS
SELECT emp_id, salary, tax_id, bank_account, bonus FROM employees;graph LR R["employees<br/>(emp_id, name, email, dept, hire_date, salary, tax_id, bank_account, bonus)"] R --> F1["emp_profile<br/>π (emp_id, name, email, dept, hire_date)"] R --> F2["emp_payroll<br/>π (emp_id, salary, tax_id, bank_account, bonus)"] F1 --- S1[(App Server)] F2 --- S2[(Secure Payroll Server)]
Best for: Separating hot (frequently queried) from cold (rarely accessed) data, isolating sensitive columns for security, reducing I/O by scanning only needed columns.
Hybrid (Mixed) Fragmentation
Combines horizontal and vertical fragmentation. A relation can be first horizontally fragmented and then each horizontal fragment vertically fragmented, or vice versa. Reconstruction requires both union and join operations.
Example — Patient records fragmented by state (horizontal), then each state fragment split into demographics vs medical history (vertical):
graph TD R[patients] -->|Horizontal| H1["patients_CA<br/>σ state='CA'"] R -->|Horizontal| H2["patients_NY<br/>σ state='NY'"] H1 -->|Vertical| V1A["patients_CA_demo<br/>π (id, name, dob, address)"] H1 -->|Vertical| V1B["patients_CA_medical<br/>π (id, diagnosis, treatment, lab_results)"] H2 -->|Vertical| V2A["patients_NY_demo<br/>π (id, name, dob, address)"] H2 -->|Vertical| V2B["patients_NY_medical<br/>π (id, diagnosis, treatment, lab_results)"]
Correctness Rules
For any fragmentation to be valid, three rules must hold:
| Rule | Requirement |
|---|---|
| Completeness | Every data item in R must appear in at least one fragment R_i |
| Reconstruction | The original relation R must be fully recoverable from the fragments (lossless) |
| Disjointness | For horizontal fragmentation, fragments should not overlap (a tuple belongs to exactly one fragment). For vertical fragmentation, overlap is allowed only on the Primary Key. |
Fragmentation vs Partitioning vs Sharding
| Term | Scope | Context | Managed By |
|---|---|---|---|
| Fragmentation | Breaking a relation into logical pieces | DDBMS design theory | The distributed DBMS |
| Partitioning | Dividing a table into independent parts | Single DBMS or distributed | The local DBMS (e.g., PARTITION BY in PostgreSQL) |
| Sharding | Horizontal partitioning across separate servers | Scale-out architectures | Application/middleware or distributed DB |
Sharding is a specific form of horizontal fragmentation applied to scale-out architectures. Fragmentation is the broader theoretical concept. Partitioning encompasses both and can apply to centralized and distributed environments.
Advantages
- Localized data management — Data managed close to where it’s accessed, reducing response times and bandwidth
- Increased parallelism — Queries run in parallel on different fragments across different nodes
- Improved security and privacy — Sensitive data isolated in secure fragments (vertical fragmentation of PII)
- Reduced I/O — Queries scan only relevant fragments instead of the full table
Challenges
- Complexity in management — Maintaining integrity and consistency across fragments, especially with hybrid fragmentation
- Query processing overhead — Queries requiring data from multiple fragments incur additional join/union costs and network transfer
- Fragment allocation — Deciding which fragment goes to which site requires analysis of access patterns, update frequencies, and network topology
- Replication interaction — Fragmentation combined with replication (see Distributed Databases) multiplies the consistency coordination needed
Design Process
flowchart TD A[Analyze access patterns] --> B{Which attributes/rows<br/>are accessed together?} B -->|"Rows by region/time/category"| C[Horizontal Fragmentation] B -->|"Columns by function/sensitivity"| D[Vertical Fragmentation] B -->|"Both patterns"| E[Hybrid Fragmentation] C --> F[Define predicates for each fragment] D --> G[Group attributes into fragments<br/>include PK in each] E --> H[Apply horizontal then vertical<br/>or vice versa] F --> I[Allocate fragments to sites] G --> I H --> I I --> J[Verify: Completeness + Reconstruction + Disjointness]