What is SQL

SQL (Structured Query Language) is a standardized programming language for managing relational databases.

Features

Predefined schema (tables, indices, triggers)
Tables with rows and columns
ACID compliance
Complex queries and relationships

Fundamental Data Structure

Tables — organized in rows and columns
Primary Keys — unique identifiers for each row
Foreign Keys — link tables together
Schemas — strict structure defined upfront
Normalization — data divided to reduce redundancy

CUSTOMERS Table:
┌────────────┬──────────────┬─────────────────────┐
│ customer_id│ name         │ email               │
├────────────┼──────────────┼─────────────────────┤
│ 1          │ John Smith   │ [john@email.com](mailto:john@email.com)      │
│ 2          │ Jane Doe     │ [jane@email.com](mailto:jane@email.com)      │
└────────────┴──────────────┴─────────────────────┘

ORDERS Table:
┌──────────┬────────────┬───────────┬─────────┐
│ order_id │ customer_id│ product   │ amount  │
├──────────┼────────────┼───────────┼─────────┤
│ 101      │ 1          │ Laptop    │ 999.99  │
│ 102      │ 1          │ Mouse     │ 29.99   │
│ 103      │ 2          │ Keyboard  │ 79.99   │
└──────────┴────────────┴───────────┴─────────┘

Popular Relational Databases

MySQL
PostgreSQL
Oracle Database

B-Tree (Primary Structure)

B-Tree is a data structure used for database indexing.

Balanced height — all leaf nodes at same level, O(log n) search time
Sorted order — enables efficient range queries
Sequential reads — leaf nodes often linked
Multiple keys per node — fewer reads, faster operations

Read More

PostgreSQL — 65.1. B-Tree Indexes
B-Tree — Understanding B-Tree Indexes in PostgreSQL: A Comprehensive Guide— Part 1

Other SQL Index Structures

Hash
GiST/GIN

NoSQL

NoSQL (Not Only SQL) refers to database management systems that differ from traditional relational databases.

Features

Dynamic structure
Various data models (documents, key-value, graph, column-family)
High volume, speed, and flexibility

Fundamental Data Structure

Document Databases

JSON-like documents with nested structures (BSON) — Explaining BSON with Examples
Faster single document reads — no joins, data co-located
Flexible schema

Trade-offs

Slower complex queries
Data duplication — same data may exist in multiple documents

Document Storage:
┌─────────────────────────────────────────┐
│ B-Tree Index on _id                     │
│                                         │
│     [ObjectId_500]                      │
│        /        \                       │
│   [ObjectId_300] [ObjectId_700]         │
│      ↓              ↓                   │
│  [Document 1]   [Document 2]            │
└─────────────────────────────────────────┘

Document 1 (BSON format):
┌──────────────────────────────────────┐
│ {                                    │
│   "_id": ObjectId("..."),            │
│   "name": "John",                    │
│   "address": {                       │
│     "city": "NYC",                   │
│     "zip": "10001"                   │
│   },                                 │
│   "orders": [...]                    │
│ }                                    │
└──────────────────────────────────────┘

{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "customer_id": 1,
  "name": "John Smith",
  "email": "[john@email.com](mailto:john@email.com)",
  "address": {
    "street": "123 Main St",
    "city": "New York",
    "zip": "10001"
  },
  "orders": [
    {
      "order_id": 101,
      "product": "Laptop",
      "amount": 999.99,
      "date": "2024-01-15"
    },
    {
      "order_id": 102,
      "product": "Mouse",
      "amount": 29.99,
      "date": "2024-01-20"
    }
  ]
}

Key-Value Stores

Redis (Most Common)

In-memory data storage designed for large-scale, high-performance workloads. Ideal for caching and quick memory access. Built for AI agent memory systems with the fastest vector database implementation.

Structure

Structure in memory:
┌─────────────────┬─────────────────────────────┐
│ Key             │ Value                       │
├─────────────────┼─────────────────────────────┤
│ user:1000       │ {"name":"John","age":30}    │
│ session:abc123  │ "authenticated"             │
│ counter:visits  │ 15847                       │
│ cache:page:home │ "<html>...</html>"          │
└─────────────────┴─────────────────────────────┘

Skip List Structure (Redis Sorted Sets):

Level 3:  [Header] ────────────────────────→ [90]
                                                  
Level 2:  [Header] ─────────→ [30] ─────────→ [90]
                                                  
Level 1:  [Header] ──→ [10]──→ [30]──→ [50]──→ [90]
                        ↓       ↓      ↓      ↓
                      score   score  score  score
                        5       15     25     35

Primary structures:

Hash table — instant lookup for simple operations (get/set)
Skip list (for sorted sets) — like a balanced tree but simpler; enables range queries and concurrent access — Skip List | Brilliant Math & Science Wiki
Linked list — for queues

Column Stores

Apache Cassandra

Apache Cassandra | Apache Cassandra Documentation

Apache Cassandra is an open-source NoSQL distributed database. Data is partitioned across nodes (peer-to-peer) to prevent single points of failure. Unlike relational databases where each primary key is tied to a strict schema, Cassandra uses partition keys and replication factors to determine how datasets are distributed across nodes.

Key characteristics:

High volume
High write throughput
Horizontal scalability
Fault tolerance

Storage Structure

Storage Engine | Apache Cassandra Documentation

LSM Tree Architecture:

Write Path:
┌──────────────────────────────────────────┐
│ 1. MemTable (In-Memory Sorted Structure) │
│    - B-Tree or Skip List                 │
│    - Fast writes: O(log n)               │
└───────────────┬──────────────────────────┘
                │ (When full, flush to disk)
                ↓
┌──────────────────────────────────────────┐
│ 2. SSTable (Sorted String Table on Disk) │
│                                          │
│    SSTable 1: [10][20][30][40]           │
│    SSTable 2: [15][25][35][45]           │
│    SSTable 3: [5][12][28][50]            │
└───────────────┬──────────────────────────┘
                │ (Periodic compaction)
                ↓
┌──────────────────────────────────────────┐
│ 3. Compacted SSTable                     │
│    [5][10][12][15][20][25][28][30]...    │
└──────────────────────────────────────────┘

Append to memory, then compact and batch to disk
Sequential writes — faster than random writes

Bloom Filters | Apache Cassandra Documentation

Graph Databases

Neo4j

neo4j.com

Neo4j uses a native graph structure: nodes point to properties and relationships, which also point to properties. Schema is optional. Features labeled nodes and relationships. Optimal for social networks, recommendation engines, and relationship-heavy queries.

Nodes and Relationships:

(John:Person {name: "John Smith", age: 30})
    |
    |--[FRIENDS_WITH {since: 2020}]-->(Jane:Person {name: "Jane Doe"})
    |
    |--[PURCHASED]-->(Laptop:Product {price: 999.99})
    |
    |--[LIVES_IN]-->(NYC:City {name: "New York"})

(Jane)--[WORKS_AT]-->(Company:Organization {name: "Tech Corp"})

Native Graph Storage:

Node Record:
┌─────────────────────────────────────┐
│ Node ID: 123                        │
│ Label: Person                       │
│ Property Pointer → [name: "John"]   │
│ First Relationship Pointer → 456    │
└─────────────────────────────────────┘

Relationship Record:
┌─────────────────────────────────────────┐
│ Relationship ID: 456                    │
│ Type: FRIENDS_WITH                      │
│ Start Node → 123                        │
│ End Node → 789                          │
│ Next Relationship (from 123) → 457      │
│ Property Pointer → [since: 2020]        │
└─────────────────────────────────────────┘

Performance Comparison

Traditional (SQL with B-Tree): To find friends of friends:

Find user in User table: O(log n)
Find friendships in Friends table: O(log n)
Find friends' friends: O(k log n)

Total: O(k log n)

Neo4j Native Graph:

Get node record: O(1) with index
Follow relationship pointer: O(1) pointer dereference
Get next node: O(1) pointer dereference
Repeat for each relationship: O(k)

Total: O(k) — depends only on data traversed

Search/Document Stores

Elasticsearch

Search or document stores are optimized for full-text search. Indices point to documents. Also excellent for log analysis and real-time analytics.

Traditional Index (B-Tree):
Document ID → Document Content

Inverted Index:
Term → List of Document IDs

Example Documents:
Doc 1: "The quick brown fox"
Doc 2: "The lazy brown dog"
Doc 3: "Quick brown foxes"

Inverted Index:
┌──────────────┬────────────────────────────────┐
│ Term         │ Posting List (Doc IDs)         │
├──────────────┼────────────────────────────────┤
│ brown        │ [1, 2, 3]                      │
│ dog          │ [2]                            │
│ fox          │ [1]                            │
│ foxes        │ [3]                            │
│ lazy         │ [2]                            │
│ quick        │ [1, 3] (normalized to lowercase)│
│ the          │ [1, 2]                         │
└──────────────┴────────────────────────────────┘

Quick Comparison

When to Use Each Database

Database	Core Strength	Weakness	Choose When You Need
PostgreSQL	ACID guarantees, complex queries, joins	Vertical scaling limits	Consistency over availability, referential integrity
MongoDB	Schema flexibility, fast single-doc reads	Join operations expensive	Rapid prototyping, evolving schemas, nested data
Redis	Sub-millisecond latency, in-memory speed	Limited persistence, memory cost	Caching, session store, real-time leaderboards
Cassandra	Write-optimized, linear scalability	Eventually consistent, no joins	Time-series data, high write throughput, always-on
Neo4j	Relationship traversal, graph algorithms	Not optimized for aggregations	Social networks, fraud detection, recommendations
Elasticsearch	Full-text search, near real-time indexing	Not a primary data store	Log analysis, product search, autocomplete

Performance Rankings by Operation

Write Operations

Fastest → Slowest

1. Redis          In-memory hash, O(1) writes
2. Cassandra      Append-only LSM, sequential writes
3. MongoDB        Document updates, B-Tree overhead
4. Elasticsearch  Bulk indexing optimized
5. Neo4j          Relationship updates require pointer management
6. PostgreSQL     ACID overhead, WAL, index maintenance

Point Reads (single record lookup)

Fastest → Slowest

1. Redis          O(1) hash lookup, in-memory
2. PostgreSQL     O(log n) B-Tree index, well-optimized
3. MongoDB        O(log n) B-Tree, single document fetch
4. Neo4j          O(1) with index, pointer following
5. Cassandra      Read amplification from multiple SSTables
6. Elasticsearch  Optimized for search, not point lookups

Range Queries

Fastest → Slowest

1. PostgreSQL     B-Tree optimized for ranges
2. MongoDB        B-Tree with cursor support
3. Cassandra      Efficient within partition
4. Elasticsearch  Good with filters
5. Redis          Skip list for sorted sets only
6. Neo4j          Not designed for range queries

Text Search

Fastest → Slowest

1. Elasticsearch  Native inverted index, scoring, analyzers
2. PostgreSQL     GIN/GiST indexes, full-text support
3. MongoDB        Text indexes available but slower
4. Others         Require external search layer

Relationship Traversal

Fastest → Slowest

1. Neo4j          Native graph, O(1) pointer hops
2. PostgreSQL     Requires joins, O(k log n)
3. MongoDB        Lookup/unwind, multiple queries
4. Others         Not designed for this workload

Horizontal Scalability

Easiest → Hardest

1. Cassandra      Peer-to-peer, no single point of failure
2. MongoDB        Sharding + replica sets built-in
3. Elasticsearch  Distributed by design
4. Neo4j          Causal clustering available
5. Redis          Requires Redis Cluster or Sentinel
6. PostgreSQL     Requires external tools (Citus, partitioning)