Database Concurrency
Concurrency Control in Database Management Systems (DBMS) is very important to facilitate multi user interactions and enable multiple processes access and modify the database simultaneously without causing data inconsistencies, integrity violations and other issues.
Concurrency Control in Databases
- Simultaneous Access - In a multi user database, multiple users or processes can attempt to access and modify the database simultaneously, without proper control this can result in data corruption and inconsistency issues.
- Concurrency Problems: Several issues can arise when multiple transactions (sequences of database operations) run concurrently
- Lost Updates: When two transactions try to update the data, one transaction can overwrite the other, resulting in data loss.
- Dirty Reads: One transaction can read the uncommitted changes by a different transaction, and can result in inconsistent data if the transaction was rolled back.
- Uncommitted Data: A transaction can read the data that is being modified by a different transaction but has not committed yet, resulting in incorrect data retrieval
- Inconsistent Retrieval: Inconsistent data can be read if one transaction reads some data before another transaction updates it, leading to a view of the database that doesn’t reflect any single consistent state.
- Concurrency Control Techniques
- Locking Transactions lock the data they want to access or modify. Locks prevent other transactions from accessing the same data until the lock is released. This ensures that only one transaction can modify a piece of data at a time.
- Isolation Levels DBMS offers different isolation levels (e.g., Read Uncommitted, Read Committed, Repeatable Read, Serializable) that define the degree of isolation between concurrent transactions. Each level balances data consistency with performance and concurrency.
- Two-Phase Locking (2PL)
- Timestamp Ordering : Each Transactions is assigned a unique timestamp to preserve the the order of transactions, the older transactions get the higher priority to avoid conflicts with the newer transactions
- Conflict Resolution When conflicts between transactions occur, the Database Management Systems (DBMS) employs conflict resolution techniques such as waiting, aborting or rolling back a transaction to ensure data integrity.
- Transaction Isolation The choice of isolation level determines how transactions interact with each other, the higher levels of isolation is provide stronger guarantees but can affect system performance due to increase in locking and blocking.
- Commit and Rollback Transactions in the Database Management Systems (DBMS) can be committed (making their changes permanent) or rolled back (undoing the changes). Concurrency control ensures that transactions can be safely committed without conflicting with other transactions.
Conflict Types
Every concurrency anomaly stems from one of three fundamental conflict patterns between operations from different transactions on the same data item:
| Conflict | Pattern | Anomaly Caused |
|---|---|---|
| Write-Write (WW) | T₁ writes X, T₂ writes X | Lost Update — one write overwrites the other |
| Write-Read (WR) | T₁ writes X, T₂ reads X | Dirty Read — T₂ reads uncommitted data from T₁ |
| Read-Write (RW) | T₁ reads X, T₂ writes X | Non-Repeatable Read — T₁ re-reads and gets a different value |
Phantom reads are a special case: T₁ executes a range query, T₂ inserts a new row matching the range, T₁ re-executes and finds new rows. Prevented only at the Serializable isolation level.
Understanding these conflicts is the foundation for Serializability theory — see the precedence graph test for determining if an interleaved schedule is safe.
Concurrency Control Deep Dives
For detailed treatment of each technique, see:
- Two-Phase Locking (2PL) — The classic lock-based protocol with growing/shrinking phases
- MVCC — Version-based concurrency used by PostgreSQL, MySQL InnoDB, and SQLite WAL mode
- SQL Isolation Levels — The four standard levels (Read Uncommitted through Serializable) with their anomaly trade-offs
Deadlocks
A Deadlock occurs when two or more transactions each hold a lock that the other needs. The DBMS detects cycles in the wait-for graph and aborts one transaction (the victim) to break the deadlock. Prevention strategies include:
- Wait-Die — Older transactions wait; younger ones abort and retry
- Wound-Wait — Older transactions force younger ones to abort; younger ones wait
- Timeouts — Abort transactions that wait too long
In short, concurrency control in DBMS is essential for managing simultaneous access to a database to maintain data consistency and integrity. It involves techniques like locking, isolation levels, and conflict resolution to prevent issues like lost updates and dirty reads when multiple transactions operate concurrently. The choice of concurrency control method should consider the specific requirements of the application and balance data consistency with system performance.