Fragmentation

Fragmentation is the process of breaking a relation into smaller pieces called fragments, which can be stored and managed across different sites in a distributed database. Each fragment contains a subset of the original relation’s data. The goal is to place data close to where it’s most frequently accessed, reducing network traffic and query latency while enabling parallel processing.

Types of Fragmentation

Horizontal Fragmentation

Divides a relation into subsets of rows based on selection predicates. Each fragment holds tuples that satisfy a specific condition. The fragments are disjoint (no row appears in more than one fragment) and complete (every row appears in exactly one fragment).

Formal definition using Relational Algebra:

$R_{i} = σ_{condition_{i}} (R)$

Reconstruction (lossless — original relation recoverable via union):

$R = R_{1} \cup R_{2} \cup \dots \cup R_{n}$

Example — A multinational company fragments its employees table by country:

-- Fragment 1: US employees (stored on US server)
CREATE TABLE employees_us AS
SELECT * FROM employees WHERE country = 'US';
 
-- Fragment 2: EU employees (stored on EU server)
CREATE TABLE employees_eu AS
SELECT * FROM employees WHERE country IN ('DE', 'FR', 'NL', 'ES');
 
-- Fragment 3: APAC employees (stored on APAC server)
CREATE TABLE employees_apac AS
SELECT * FROM employees WHERE country IN ('JP', 'IN', 'AU', 'SG');

graph TD
    R["employees (full relation)"] --> F1["employees_us<br/>σ country='US'"]
    R --> F2["employees_eu<br/>σ country∈EU"]
    R --> F3["employees_apac<br/>σ country∈APAC"]

    F1 --- S1[(US Server)]
    F2 --- S2[(EU Server)]
    F3 --- S3[(APAC Server)]

Best for: Geographically distributed operations, region-specific access patterns, compliance (GDPR — EU data stays in EU).

Vertical Fragmentation

Divides a relation into subsets of columns. Each fragment contains a projection of the original relation’s attributes. Every fragment must include the Primary Key to enable reconstruction via join.

Formal definition:

$R_{i} = π_{attributes_{i}} (R)$

Reconstruction (lossless — original relation recoverable via natural join):

$R = R_{1} ⋈ R_{2} ⋈ \dots ⋈ R_{n}$

Example — An HR system separates frequently-accessed profile data from sensitive payroll data:

-- Fragment 1: Profile data (high-frequency reads)
CREATE TABLE emp_profile AS
SELECT emp_id, name, email, department, hire_date FROM employees;
 
-- Fragment 2: Payroll data (restricted access, different server)
CREATE TABLE emp_payroll AS
SELECT emp_id, salary, tax_id, bank_account, bonus FROM employees;

graph LR
    R["employees<br/>(emp_id, name, email, dept, hire_date, salary, tax_id, bank_account, bonus)"]
    R --> F1["emp_profile<br/>π (emp_id, name, email, dept, hire_date)"]
    R --> F2["emp_payroll<br/>π (emp_id, salary, tax_id, bank_account, bonus)"]

    F1 --- S1[(App Server)]
    F2 --- S2[(Secure Payroll Server)]

Best for: Separating hot (frequently queried) from cold (rarely accessed) data, isolating sensitive columns for security, reducing I/O by scanning only needed columns.

Hybrid (Mixed) Fragmentation

Combines horizontal and vertical fragmentation. A relation can be first horizontally fragmented and then each horizontal fragment vertically fragmented, or vice versa. Reconstruction requires both union and join operations.

Example — Patient records fragmented by state (horizontal), then each state fragment split into demographics vs medical history (vertical):

graph TD
    R[patients] -->|Horizontal| H1["patients_CA<br/>σ state='CA'"]
    R -->|Horizontal| H2["patients_NY<br/>σ state='NY'"]
    H1 -->|Vertical| V1A["patients_CA_demo<br/>π (id, name, dob, address)"]
    H1 -->|Vertical| V1B["patients_CA_medical<br/>π (id, diagnosis, treatment, lab_results)"]
    H2 -->|Vertical| V2A["patients_NY_demo<br/>π (id, name, dob, address)"]
    H2 -->|Vertical| V2B["patients_NY_medical<br/>π (id, diagnosis, treatment, lab_results)"]

Correctness Rules

For any fragmentation to be valid, three rules must hold:

Rule	Requirement
Completeness	Every data item in R must appear in at least one fragment R_i
Reconstruction	The original relation R must be fully recoverable from the fragments (lossless)
Disjointness	For horizontal fragmentation, fragments should not overlap (a tuple belongs to exactly one fragment). For vertical fragmentation, overlap is allowed only on the Primary Key.

Fragmentation vs Partitioning vs Sharding

Term	Scope	Context	Managed By
Fragmentation	Breaking a relation into logical pieces	DDBMS design theory	The distributed DBMS
Partitioning	Dividing a table into independent parts	Single DBMS or distributed	The local DBMS (e.g., `PARTITION BY` in PostgreSQL)
Sharding	Horizontal partitioning across separate servers	Scale-out architectures	Application/middleware or distributed DB

Sharding is a specific form of horizontal fragmentation applied to scale-out architectures. Fragmentation is the broader theoretical concept. Partitioning encompasses both and can apply to centralized and distributed environments.

Advantages

Localized data management — Data managed close to where it’s accessed, reducing response times and bandwidth
Increased parallelism — Queries run in parallel on different fragments across different nodes
Improved security and privacy — Sensitive data isolated in secure fragments (vertical fragmentation of PII)
Reduced I/O — Queries scan only relevant fragments instead of the full table

Challenges

Complexity in management — Maintaining integrity and consistency across fragments, especially with hybrid fragmentation
Query processing overhead — Queries requiring data from multiple fragments incur additional join/union costs and network transfer
Fragment allocation — Deciding which fragment goes to which site requires analysis of access patterns, update frequencies, and network topology
Replication interaction — Fragmentation combined with replication (see Distributed Databases) multiplies the consistency coordination needed

Design Process

flowchart TD
    A[Analyze access patterns] --> B{Which attributes/rows<br/>are accessed together?}
    B -->|"Rows by region/time/category"| C[Horizontal Fragmentation]
    B -->|"Columns by function/sensitivity"| D[Vertical Fragmentation]
    B -->|"Both patterns"| E[Hybrid Fragmentation]
    C --> F[Define predicates for each fragment]
    D --> G[Group attributes into fragments<br/>include PK in each]
    E --> H[Apply horizontal then vertical<br/>or vice versa]
    F --> I[Allocate fragments to sites]
    G --> I
    H --> I
    I --> J[Verify: Completeness + Reconstruction + Disjointness]

Quartz 5

Explorer

Fragmentation

Fragmentation

Types of Fragmentation

Horizontal Fragmentation

Vertical Fragmentation

Hybrid (Mixed) Fragmentation

Correctness Rules

Fragmentation vs Partitioning vs Sharding

Advantages

Challenges

Design Process

Graph View

Table of Contents

Backlinks