NoSQL Databases
NoSQL (“Not Only SQL”) databases are non-relational data stores designed for specific data models with flexible schemas. They emerged to address limitations of relational databases in handling massive scale, unstructured data, and rapid development cycles. NoSQL systems typically prioritize horizontal scalability and availability over strict ACID compliance, often following a BASE consistency model.
Paradigms
Document Databases
Store data as semi-structured documents (JSON, BSON, XML). Each document is self-contained with its own schema.
- Examples: MongoDB, CouchDB, Amazon DocumentDB
- Best for: Content management, user profiles, catalogs, event logging
- Query: Rich queries on document fields, indexing on nested attributes
- Trade-off: Denormalized data leads to duplication but avoids expensive joins
Key-Value Stores
Simplest NoSQL model — a hash map of keys to opaque values. Fastest reads/writes for known keys. Programming interface is simple functions (get, set, update) — no query language.
- Examples: Redis, Amazon DynamoDB, Riak, Memcached
- Best for: Session storage, caching, queueing, real-time leaderboards, configuration settings
- Trade-off: No query capability beyond key lookup (value is opaque to the store)
- Most commonly deployed as a local data cache, not an operational store
# Redis example
import redis
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
r.set('user:1000', '{"name": "John", "age": 30}')
value = r.get('user:1000')Wide-Column (Column-Family) Stores
Data stored in column families — containers for rows sharing a common column set. Each row can belong to multiple column families, and new columns can be added at any time without schema changes.
- Examples: Cassandra, HBase, Google Bigtable
- Best for: Time-series data, IoT, analytics at scale, write-heavy workloads
- Trade-off: Complex data modeling, limited join support
- CQL (Cassandra Query Language) — SQL-like syntax tailored to Cassandra’s distributed architecture. Supports tunable consistency per query (
ONE,QUORUM,ALL) — see CAP Theorem.
-- Cassandra CQL
CREATE KEYSPACE example WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
CREATE TABLE example.users (
user_id uuid PRIMARY KEY,
name text,
email text
);
INSERT INTO example.users (user_id, name, email) VALUES (uuid(), 'Jane', 'jane@co.com');Graph Databases
Model data as nodes (entities) and edges (relationships) with properties (key-value pairs) on both. Relationship-first design — relationships are first-class citizens, equally important as the data itself.
- Examples: Neo4j, Amazon Neptune, ArangoDB
- Best for: Social networks, fraud detection, recommendation engines, knowledge graphs, supply chain management
- Trade-off: Not optimized for bulk analytics or aggregate queries
NewSQL Databases
Modern relational databases providing NoSQL-like horizontal scalability while maintaining ACID guarantees and the SQL interface.
- Examples: Google Spanner, CockroachDB
- Best for: Financial services (high-frequency trading), e-commerce, real-time analytics, IoT
- Key feature: Horizontal scaling across distributed nodes without sacrificing transactional consistency
Object-Oriented Databases
Store data as objects with state (fields) and behavior (methods), supporting class hierarchies, inheritance, and encapsulation. Each object has a unique OID (Object Identifier).
- Examples: db4o, ObjectDB
- Best for: CAD, telecommunications, scientific simulations, multimedia databases
Time-Series Databases
Optimized for timestamped data with high write throughput and time-range queries.
- Examples: InfluxDB, TimescaleDB, Prometheus
- Best for: Monitoring, metrics, IoT sensor data, financial ticks
When Relational Is Not Ideal
- Unstructured or polymorphic data — varying schemas across records (emails, multimedia, social media)
- Massive horizontal scale — billions of rows across hundreds of nodes; relational databases struggle to scale horizontally
- Rapidly evolving data models — schema changes without ALTER TABLE migrations
- Complex relationship traversal — graph queries are orders of magnitude faster than recursive joins
- Geographically distributed systems — requiring low-latency reads across regions; relational DBs have inherent replication latency
- High-performance real-time processing — gaming, high-frequency trading, real-time analytics need in-memory speed
- Highly concurrent low-latency access — locking and transaction mechanisms in relational DBs can bottleneck under massive concurrency
- Cost constraints — relational DBs need significant hardware and expertise at scale; simpler NoSQL options can be lighter-weight
Decision Framework
| Factor | Relational | NoSQL |
|---|---|---|
| Data structure | Well-defined, tabular | Flexible, varied |
| ACID compliance | Native | Varies (often BASE) |
| Scalability | Primarily vertical | Horizontal by design |
| Query complexity | Complex joins, aggregations | Simple lookups, denormalized reads |
| Schema evolution | ALTER TABLE migrations | Schema-on-read |
| Transactions | Multi-table transactions | Often single-document atomic |
Hybrid Approaches
Modern systems blur the line:
- PostgreSQL supports JSONB for document-style storage alongside relational tables
- MongoDB added multi-document ACID transactions
- CockroachDB and YugabyteDB are distributed SQL databases offering horizontal scale with ACID guarantees