What is SQL
SQL (Structured Query Language) is a standardized programming language for managing relational databases.
Features
- Predefined schema (tables, indices, triggers)
- Tables with rows and columns
- ACID compliance
- Complex queries and relationships
Fundamental Data Structure
- Tables — organized in rows and columns
- Primary Keys — unique identifiers for each row
- Foreign Keys — link tables together
- Schemas — strict structure defined upfront
- Normalization — data divided to reduce redundancy
CUSTOMERS Table: ┌────────────┬──────────────┬─────────────────────┐ │ customer_id│ name │ email │ ├────────────┼──────────────┼─────────────────────┤ │ 1 │ John Smith │ [john@email.com](mailto:john@email.com) │ │ 2 │ Jane Doe │ [jane@email.com](mailto:jane@email.com) │ └────────────┴──────────────┴─────────────────────┘ ORDERS Table: ┌──────────┬────────────┬───────────┬─────────┐ │ order_id │ customer_id│ product │ amount │ ├──────────┼────────────┼───────────┼─────────┤ │ 101 │ 1 │ Laptop │ 999.99 │ │ 102 │ 1 │ Mouse │ 29.99 │ │ 103 │ 2 │ Keyboard │ 79.99 │ └──────────┴────────────┴───────────┴─────────┘
Popular Relational Databases
- MySQL
- PostgreSQL
- Oracle Database
B-Tree (Primary Structure)
B-Tree is a data structure used for database indexing.
- Balanced height — all leaf nodes at same level, O(log n) search time
- Sorted order — enables efficient range queries
- Sequential reads — leaf nodes often linked
- Multiple keys per node — fewer reads, faster operations
Read More
- PostgreSQL — 65.1. B-Tree Indexes
- B-Tree — Understanding B-Tree Indexes in PostgreSQL: A Comprehensive Guide— Part 1
Other SQL Index Structures
- Hash
- GiST/GIN
NoSQL
NoSQL (Not Only SQL) refers to database management systems that differ from traditional relational databases.
Features
- Dynamic structure
- Various data models (documents, key-value, graph, column-family)
- High volume, speed, and flexibility
Fundamental Data Structure
Document Databases
- JSON-like documents with nested structures (BSON) — Explaining BSON with Examples
- Faster single document reads — no joins, data co-located
- Flexible schema
Trade-offs
- Slower complex queries
- Data duplication — same data may exist in multiple documents
Document Storage: ┌─────────────────────────────────────────┐ │ B-Tree Index on _id │ │ │ │ [ObjectId_500] │ │ / \ │ │ [ObjectId_300] [ObjectId_700] │ │ ↓ ↓ │ │ [Document 1] [Document 2] │ └─────────────────────────────────────────┘ Document 1 (BSON format): ┌──────────────────────────────────────┐ │ { │ │ "_id": ObjectId("..."), │ │ "name": "John", │ │ "address": { │ │ "city": "NYC", │ │ "zip": "10001" │ │ }, │ │ "orders": [...] │ │ } │ └──────────────────────────────────────┘
{ "_id": ObjectId("507f1f77bcf86cd799439011"), "customer_id": 1, "name": "John Smith", "email": "[john@email.com](mailto:john@email.com)", "address": { "street": "123 Main St", "city": "New York", "zip": "10001" }, "orders": [ { "order_id": 101, "product": "Laptop", "amount": 999.99, "date": "2024-01-15" }, { "order_id": 102, "product": "Mouse", "amount": 29.99, "date": "2024-01-20" } ] }
Key-Value Stores
Redis (Most Common)
In-memory data storage designed for large-scale, high-performance workloads. Ideal for caching and quick memory access. Built for AI agent memory systems with the fastest vector database implementation.
Structure
Structure in memory: ┌─────────────────┬─────────────────────────────┐ │ Key │ Value │ ├─────────────────┼─────────────────────────────┤ │ user:1000 │ {"name":"John","age":30} │ │ session:abc123 │ "authenticated" │ │ counter:visits │ 15847 │ │ cache:page:home │ "<html>...</html>" │ └─────────────────┴─────────────────────────────┘
Skip List Structure (Redis Sorted Sets): Level 3: [Header] ────────────────────────→ [90] Level 2: [Header] ─────────→ [30] ─────────→ [90] Level 1: [Header] ──→ [10]──→ [30]──→ [50]──→ [90] ↓ ↓ ↓ ↓ score score score score 5 15 25 35
Primary structures:
- Hash table — instant lookup for simple operations (get/set)
- Skip list (for sorted sets) — like a balanced tree but simpler; enables range queries and concurrent access — Skip List | Brilliant Math & Science Wiki
- Linked list — for queues
Column Stores
Apache Cassandra
Apache Cassandra | Apache Cassandra Documentation
Apache Cassandra is an open-source NoSQL distributed database. Data is partitioned across nodes (peer-to-peer) to prevent single points of failure. Unlike relational databases where each primary key is tied to a strict schema, Cassandra uses partition keys and replication factors to determine how datasets are distributed across nodes.
Key characteristics:
- High volume
- High write throughput
- Horizontal scalability
- Fault tolerance
Storage Structure
Storage Engine | Apache Cassandra Documentation
LSM Tree Architecture: Write Path: ┌──────────────────────────────────────────┐ │ 1. MemTable (In-Memory Sorted Structure) │ │ - B-Tree or Skip List │ │ - Fast writes: O(log n) │ └───────────────┬──────────────────────────┘ │ (When full, flush to disk) ↓ ┌──────────────────────────────────────────┐ │ 2. SSTable (Sorted String Table on Disk) │ │ │ │ SSTable 1: [10][20][30][40] │ │ SSTable 2: [15][25][35][45] │ │ SSTable 3: [5][12][28][50] │ └───────────────┬──────────────────────────┘ │ (Periodic compaction) ↓ ┌──────────────────────────────────────────┐ │ 3. Compacted SSTable │ │ [5][10][12][15][20][25][28][30]... │ └──────────────────────────────────────────┘
- Append to memory, then compact and batch to disk
- Sequential writes — faster than random writes
Bloom Filters | Apache Cassandra Documentation
Graph Databases
Neo4j
Neo4j uses a native graph structure: nodes point to properties and relationships, which also point to properties. Schema is optional. Features labeled nodes and relationships. Optimal for social networks, recommendation engines, and relationship-heavy queries.
Nodes and Relationships: (John:Person {name: "John Smith", age: 30}) | |--[FRIENDS_WITH {since: 2020}]-->(Jane:Person {name: "Jane Doe"}) | |--[PURCHASED]-->(Laptop:Product {price: 999.99}) | |--[LIVES_IN]-->(NYC:City {name: "New York"}) (Jane)--[WORKS_AT]-->(Company:Organization {name: "Tech Corp"})
Native Graph Storage: Node Record: ┌─────────────────────────────────────┐ │ Node ID: 123 │ │ Label: Person │ │ Property Pointer → [name: "John"] │ │ First Relationship Pointer → 456 │ └─────────────────────────────────────┘ Relationship Record: ┌─────────────────────────────────────────┐ │ Relationship ID: 456 │ │ Type: FRIENDS_WITH │ │ Start Node → 123 │ │ End Node → 789 │ │ Next Relationship (from 123) → 457 │ │ Property Pointer → [since: 2020] │ └─────────────────────────────────────────┘
Performance Comparison
Traditional (SQL with B-Tree): To find friends of friends:
- Find user in User table: O(log n)
- Find friendships in Friends table: O(log n)
- Find friends' friends: O(k log n)
Total: O(k log n)
Neo4j Native Graph:
- Get node record: O(1) with index
- Follow relationship pointer: O(1) pointer dereference
- Get next node: O(1) pointer dereference
- Repeat for each relationship: O(k)
Total: O(k) — depends only on data traversed
Search/Document Stores
Elasticsearch
Search or document stores are optimized for full-text search. Indices point to documents. Also excellent for log analysis and real-time analytics.
Traditional Index (B-Tree): Document ID → Document Content Inverted Index: Term → List of Document IDs Example Documents: Doc 1: "The quick brown fox" Doc 2: "The lazy brown dog" Doc 3: "Quick brown foxes" Inverted Index: ┌──────────────┬────────────────────────────────┐ │ Term │ Posting List (Doc IDs) │ ├──────────────┼────────────────────────────────┤ │ brown │ [1, 2, 3] │ │ dog │ [2] │ │ fox │ [1] │ │ foxes │ [3] │ │ lazy │ [2] │ │ quick │ [1, 3] (normalized to lowercase)│ │ the │ [1, 2] │ └──────────────┴────────────────────────────────┘
Quick Comparison
When to Use Each Database
| Database | Core Strength | Weakness | Choose When You Need |
|---|---|---|---|
| PostgreSQL | ACID guarantees, complex queries, joins | Vertical scaling limits | Consistency over availability, referential integrity |
| MongoDB | Schema flexibility, fast single-doc reads | Join operations expensive | Rapid prototyping, evolving schemas, nested data |
| Redis | Sub-millisecond latency, in-memory speed | Limited persistence, memory cost | Caching, session store, real-time leaderboards |
| Cassandra | Write-optimized, linear scalability | Eventually consistent, no joins | Time-series data, high write throughput, always-on |
| Neo4j | Relationship traversal, graph algorithms | Not optimized for aggregations | Social networks, fraud detection, recommendations |
| Elasticsearch | Full-text search, near real-time indexing | Not a primary data store | Log analysis, product search, autocomplete |
Performance Rankings by Operation
Write Operations
Fastest → Slowest
1. Redis In-memory hash, O(1) writes
2. Cassandra Append-only LSM, sequential writes
3. MongoDB Document updates, B-Tree overhead
4. Elasticsearch Bulk indexing optimized
5. Neo4j Relationship updates require pointer management
6. PostgreSQL ACID overhead, WAL, index maintenance
Point Reads (single record lookup)
Fastest → Slowest
1. Redis O(1) hash lookup, in-memory
2. PostgreSQL O(log n) B-Tree index, well-optimized
3. MongoDB O(log n) B-Tree, single document fetch
4. Neo4j O(1) with index, pointer following
5. Cassandra Read amplification from multiple SSTables
6. Elasticsearch Optimized for search, not point lookups
Range Queries
Fastest → Slowest
1. PostgreSQL B-Tree optimized for ranges
2. MongoDB B-Tree with cursor support
3. Cassandra Efficient within partition
4. Elasticsearch Good with filters
5. Redis Skip list for sorted sets only
6. Neo4j Not designed for range queries
Text Search
Fastest → Slowest
1. Elasticsearch Native inverted index, scoring, analyzers
2. PostgreSQL GIN/GiST indexes, full-text support
3. MongoDB Text indexes available but slower
4. Others Require external search layer
Relationship Traversal
Fastest → Slowest
1. Neo4j Native graph, O(1) pointer hops
2. PostgreSQL Requires joins, O(k log n)
3. MongoDB Lookup/unwind, multiple queries
4. Others Not designed for this workload
Horizontal Scalability
Easiest → Hardest
1. Cassandra Peer-to-peer, no single point of failure
2. MongoDB Sharding + replica sets built-in
3. Elasticsearch Distributed by design
4. Neo4j Causal clustering available
5. Redis Requires Redis Cluster or Sentinel
6. PostgreSQL Requires external tools (Citus, partitioning)