SQL vs NoSQL: A Comprehensive Guide

SQL vs NoSQL: A Comprehensive Guide

8 min read

What is SQL

SQL (Structured Query Language) is a standardized programming language for managing relational databases.

Features

  • Predefined schema (tables, indices, triggers)
  • Tables with rows and columns
  • ACID compliance
  • Complex queries and relationships

Fundamental Data Structure

  • Tables — organized in rows and columns
  • Primary Keys — unique identifiers for each row
  • Foreign Keys — link tables together
  • Schemas — strict structure defined upfront
  • Normalization — data divided to reduce redundancy
CUSTOMERS Table: ┌────────────┬──────────────┬─────────────────────┐ │ customer_id│ name │ email │ ├────────────┼──────────────┼─────────────────────┤ 1 │ John Smith │ [john@email.com](mailto:john@email.com)2 │ Jane Doe │ [jane@email.com](mailto:jane@email.com)└────────────┴──────────────┴─────────────────────┘ ORDERS Table: ┌──────────┬────────────┬───────────┬─────────┐ │ order_id │ customer_id│ product │ amount │ ├──────────┼────────────┼───────────┼─────────┤ 1011 │ Laptop │ 999.991021 │ Mouse │ 29.991032 │ Keyboard │ 79.99└──────────┴────────────┴───────────┴─────────┘

Popular Relational Databases

  • MySQL
  • PostgreSQL
  • Oracle Database

B-Tree (Primary Structure)

B-Tree is a data structure used for database indexing.

  • Balanced height — all leaf nodes at same level, O(log n) search time
  • Sorted order — enables efficient range queries
  • Sequential reads — leaf nodes often linked
  • Multiple keys per node — fewer reads, faster operations

Read More

Other SQL Index Structures

  • Hash
  • GiST/GIN

NoSQL

NoSQL (Not Only SQL) refers to database management systems that differ from traditional relational databases.

Features

  • Dynamic structure
  • Various data models (documents, key-value, graph, column-family)
  • High volume, speed, and flexibility

Fundamental Data Structure

Document Databases

  • JSON-like documents with nested structures (BSON) — Explaining BSON with Examples
  • Faster single document reads — no joins, data co-located
  • Flexible schema

Trade-offs

  • Slower complex queries
  • Data duplication — same data may exist in multiple documents
Document Storage: ┌─────────────────────────────────────────┐ │ B-Tree Index on _id │ │ │ [ObjectId_500]│ / \[ObjectId_300] [ObjectId_700]│ ↓ ↓ │ [Document 1] [Document 2]└─────────────────────────────────────────┘ Document 1 (BSON format): ┌──────────────────────────────────────┐ {"_id": ObjectId("..."), │ "name": "John", │ "address": {"city": "NYC", │ "zip": "10001"}, │ "orders": [...]}└──────────────────────────────────────┘
{ "_id": ObjectId("507f1f77bcf86cd799439011"), "customer_id": 1, "name": "John Smith", "email": "[john@email.com](mailto:john@email.com)", "address": { "street": "123 Main St", "city": "New York", "zip": "10001" }, "orders": [ { "order_id": 101, "product": "Laptop", "amount": 999.99, "date": "2024-01-15" }, { "order_id": 102, "product": "Mouse", "amount": 29.99, "date": "2024-01-20" } ] }

Key-Value Stores

Redis (Most Common)

In-memory data storage designed for large-scale, high-performance workloads. Ideal for caching and quick memory access. Built for AI agent memory systems with the fastest vector database implementation.

Structure

Structure in memory: ┌─────────────────┬─────────────────────────────┐ │ Key │ Value │ ├─────────────────┼─────────────────────────────┤ │ user:1000 │ {"name":"John","age":30}│ session:abc123 │ "authenticated"│ counter:visits │ 15847│ cache:page:home │ "<html>...</html>"└─────────────────┴─────────────────────────────┘
Skip List Structure (Redis Sorted Sets): Level 3: [Header] ────────────────────────→ [90] Level 2: [Header] ─────────→ [30] ─────────→ [90] Level 1: [Header] ──→ [10]──→ [30]──→ [50]──→ [90] ↓ ↓ ↓ ↓ score score score score 5 15 25 35

Primary structures:

  • Hash table — instant lookup for simple operations (get/set)
  • Skip list (for sorted sets) — like a balanced tree but simpler; enables range queries and concurrent access — Skip List | Brilliant Math & Science Wiki
  • Linked list — for queues

Column Stores

Apache Cassandra

Apache Cassandra | Apache Cassandra Documentation

Apache Cassandra is an open-source NoSQL distributed database. Data is partitioned across nodes (peer-to-peer) to prevent single points of failure. Unlike relational databases where each primary key is tied to a strict schema, Cassandra uses partition keys and replication factors to determine how datasets are distributed across nodes.

Key characteristics:

  • High volume
  • High write throughput
  • Horizontal scalability
  • Fault tolerance

Storage Structure

Storage Engine | Apache Cassandra Documentation

LSM Tree Architecture: Write Path: ┌──────────────────────────────────────────┐ 1. MemTable (In-Memory Sorted Structure)│ - B-Tree or Skip List │ │ - Fast writes: O(log n)└───────────────┬──────────────────────────┘ (When full, flush to disk) ┌──────────────────────────────────────────┐ 2. SSTable (Sorted String Table on Disk)│ │ │ SSTable 1: [10][20][30][40]│ SSTable 2: [15][25][35][45]│ SSTable 3: [5][12][28][50]└───────────────┬──────────────────────────┘ (Periodic compaction) ┌──────────────────────────────────────────┐ 3. Compacted SSTable │ [5][10][12][15][20][25][28][30]... │ └──────────────────────────────────────────┘
  • Append to memory, then compact and batch to disk
  • Sequential writes — faster than random writes

Bloom Filters | Apache Cassandra Documentation

Graph Databases

Neo4j

neo4j.com

Neo4j uses a native graph structure: nodes point to properties and relationships, which also point to properties. Schema is optional. Features labeled nodes and relationships. Optimal for social networks, recommendation engines, and relationship-heavy queries.

Nodes and Relationships: (John:Person {name: "John Smith", age: 30}) | |--[FRIENDS_WITH {since: 2020}]-->(Jane:Person {name: "Jane Doe"}) | |--[PURCHASED]-->(Laptop:Product {price: 999.99}) | |--[LIVES_IN]-->(NYC:City {name: "New York"}) (Jane)--[WORKS_AT]-->(Company:Organization {name: "Tech Corp"})
Native Graph Storage: Node Record: ┌─────────────────────────────────────┐ │ Node ID: 123│ Label: Person │ │ Property Pointer → [name: "John"]│ First Relationship Pointer → 456└─────────────────────────────────────┘ Relationship Record: ┌─────────────────────────────────────────┐ │ Relationship ID: 456│ Type: FRIENDS_WITH │ │ Start Node → 123│ End Node → 789│ Next Relationship (from 123)457│ Property Pointer → [since: 2020]└─────────────────────────────────────────┘

Performance Comparison

Traditional (SQL with B-Tree): To find friends of friends:

  1. Find user in User table: O(log n)
  2. Find friendships in Friends table: O(log n)
  3. Find friends' friends: O(k log n)

Total: O(k log n)

Neo4j Native Graph:

  1. Get node record: O(1) with index
  2. Follow relationship pointer: O(1) pointer dereference
  3. Get next node: O(1) pointer dereference
  4. Repeat for each relationship: O(k)

Total: O(k) — depends only on data traversed

Search/Document Stores

Elasticsearch

Search or document stores are optimized for full-text search. Indices point to documents. Also excellent for log analysis and real-time analytics.

Traditional Index (B-Tree): Document ID → Document Content Inverted Index: Term → List of Document IDs Example Documents: Doc 1: "The quick brown fox" Doc 2: "The lazy brown dog" Doc 3: "Quick brown foxes" Inverted Index: ┌──────────────┬────────────────────────────────┐ │ Term │ Posting List (Doc IDs)├──────────────┼────────────────────────────────┤ │ brown │ [1, 2, 3]│ dog │ [2]│ fox │ [1]│ foxes │ [3]│ lazy │ [2]│ quick │ [1, 3] (normalized to lowercase)│ the │ [1, 2]└──────────────┴────────────────────────────────┘

Quick Comparison

When to Use Each Database

DatabaseCore StrengthWeaknessChoose When You Need
PostgreSQLACID guarantees, complex queries, joinsVertical scaling limitsConsistency over availability, referential integrity
MongoDBSchema flexibility, fast single-doc readsJoin operations expensiveRapid prototyping, evolving schemas, nested data
RedisSub-millisecond latency, in-memory speedLimited persistence, memory costCaching, session store, real-time leaderboards
CassandraWrite-optimized, linear scalabilityEventually consistent, no joinsTime-series data, high write throughput, always-on
Neo4jRelationship traversal, graph algorithmsNot optimized for aggregationsSocial networks, fraud detection, recommendations
ElasticsearchFull-text search, near real-time indexingNot a primary data storeLog analysis, product search, autocomplete

Performance Rankings by Operation

Write Operations

Fastest → Slowest

1. Redis          In-memory hash, O(1) writes
2. Cassandra      Append-only LSM, sequential writes
3. MongoDB        Document updates, B-Tree overhead
4. Elasticsearch  Bulk indexing optimized
5. Neo4j          Relationship updates require pointer management
6. PostgreSQL     ACID overhead, WAL, index maintenance

Point Reads (single record lookup)

Fastest → Slowest

1. Redis          O(1) hash lookup, in-memory
2. PostgreSQL     O(log n) B-Tree index, well-optimized
3. MongoDB        O(log n) B-Tree, single document fetch
4. Neo4j          O(1) with index, pointer following
5. Cassandra      Read amplification from multiple SSTables
6. Elasticsearch  Optimized for search, not point lookups

Range Queries

Fastest → Slowest

1. PostgreSQL     B-Tree optimized for ranges
2. MongoDB        B-Tree with cursor support
3. Cassandra      Efficient within partition
4. Elasticsearch  Good with filters
5. Redis          Skip list for sorted sets only
6. Neo4j          Not designed for range queries
Fastest → Slowest

1. Elasticsearch  Native inverted index, scoring, analyzers
2. PostgreSQL     GIN/GiST indexes, full-text support
3. MongoDB        Text indexes available but slower
4. Others         Require external search layer

Relationship Traversal

Fastest → Slowest

1. Neo4j          Native graph, O(1) pointer hops
2. PostgreSQL     Requires joins, O(k log n)
3. MongoDB        Lookup/unwind, multiple queries
4. Others         Not designed for this workload

Horizontal Scalability

Easiest → Hardest

1. Cassandra      Peer-to-peer, no single point of failure
2. MongoDB        Sharding + replica sets built-in
3. Elasticsearch  Distributed by design
4. Neo4j          Causal clustering available
5. Redis          Requires Redis Cluster or Sentinel
6. PostgreSQL     Requires external tools (Citus, partitioning)