Relational Algebra

Relational Algebra is a procedural query language that forms the theoretical foundation for SQL and relational DBMS query processing. It operates on relations (tables) and produces new relations as output. Every SQL query has an equivalent relational algebra expression, and query optimizers internally translate SQL into relational algebra trees for optimization.

Primitive Operators

Five fundamental operators from which all other relational algebra operations can be derived:

Selection (σ)

Filters tuples (rows) that satisfy a predicate.

SQL equivalent: WHERE clause

σ_{salary > 50000}(Employee)
-- equivalent to: SELECT * FROM Employee WHERE salary > 50000

Projection (π)

Selects specific attributes (columns), eliminating duplicates.

SQL equivalent: SELECT DISTINCT column_list

π_{name, department}(Employee)

Union (∪)

Combines tuples from two union-compatible relations.

Requires union compatibility — both relations must have the same number of attributes with compatible domains.

Set Difference (−)

Returns tuples in R that are not in S.

SQL equivalent: EXCEPT

Cartesian Product (×)

Combines every tuple of R with every tuple of S.

Produces |R| × |S| tuples. Rarely used alone — typically combined with selection to form a join.

Derived Operators

Join (⋈)

The most used derived operator. Combines Cartesian Product with Selection.

Natural Join: Joins on all common attribute names.

Theta Join: Joins on an arbitrary condition.

Equi-Join: Theta join where the condition uses only equality.

See SQL Joins for the SQL implementation of these operations.

Intersection (∩)

Returns tuples common to both relations.

Division (÷)

Returns tuples in R that are associated with every tuple in S. Useful for “for all” type queries.

Rename (ρ)

Renames a relation or its attributes.

Aggregation (𝔉)

Applies aggregate functions (COUNT, SUM, AVG, MIN, MAX) with optional grouping.

Where G is the grouping attributes and F is the aggregate function. Maps to GROUP BY in SQL.

Properties

  • Closure — Every operation takes relations as input and produces a relation as output, enabling composition
  • Equivalence Rules — Multiple algebra expressions can produce the same result, enabling query optimization
    • σ conditions can be pushed down (pushed closer to leaf nodes)
    • π can be pushed down if attributes are preserved
    • Joins are commutative and associative

SQL Tables Are Bags, Not Sets

An important caveat: relational algebra operates on sets (no duplicate tuples). SQL tables are bags (multisets — duplicates allowed). This means several relational algebra theorems don’t perfectly hold in SQL:

  • Union in RA eliminates duplicates; UNION ALL in SQL preserves them
  • SELECT without DISTINCT can return duplicate rows
  • This is why DISTINCT exists — it forces set semantics

The query optimizer must account for these differences when translating SQL to internal algebra trees.

Relation to Tuple Relational Calculus

Tuple Relational Calculus (TRC) is a non-procedural alternative — it describes what to retrieve without specifying how. Relational algebra and TRC are equivalent in expressive power (Codd’s theorem). SQL draws from both: its declarative syntax resembles TRC while its execution follows relational algebra.

Sources