SQL vs NoSQL: Choosing the Right Database for the Job


A startup builds their entire product on MongoDB because “it is more flexible.” Two years later, they are writing 200-line aggregation pipelines to answer questions that would be a 3-line SQL query. Another team builds on PostgreSQL. Their user activity table hits 500 million rows. Every query takes 30 seconds. They wish they had used Cassandra.

Both teams made the wrong choice for their workload. The SQL vs NoSQL decision is not about which is better - it is about which data model and query pattern fits your problem.

What SQL databases actually are

SQL (relational) databases store data in tables with rows and columns. Relationships between tables are expressed through foreign keys and joins. The schema is defined upfront. The query language (SQL) is declarative - you describe what you want, not how to get it.

Core properties:

  • ACID transactions - Atomicity, consistency, isolation, durability
  • Joins - Combine data from multiple tables in a single query
  • Flexible queries - Ad-hoc queries without pre-defining access patterns
  • Strong consistency - Reads always see the latest committed data
  • Schema enforcement - The database rejects data that does not match the schema

Examples: PostgreSQL, MySQL, SQLite, SQL Server, Oracle, CockroachDB

What NoSQL databases actually are

NoSQL (“not only SQL”) is an umbrella term for databases that do not use the relational model. They trade some SQL properties for other benefits: scale, flexibility, specialized data models, or performance for specific access patterns.

NoSQL categories:

Document stores - Store JSON-like documents. Flexible schema. Good for hierarchical data. Examples: MongoDB, CouchDB, Firestore.

Key-value stores - Simple get/set by key. Extremely fast. No query language. Examples: Redis, DynamoDB (in its simplest form), Memcached.

Wide-column stores - Rows with dynamic columns. Optimized for time-series and write-heavy workloads. Examples: Cassandra, HBase, Bigtable.

Graph databases - Nodes and edges. Optimized for relationship traversal. Examples: Neo4j, Amazon Neptune.

Search engines - Inverted indexes for full-text search. Examples: Elasticsearch, Solr.

graph TB
subgraph sql["SQL - Relational Model"]
  T1["Users table
id, name, email"]
  T2["Orders table
id, user_id, total"]
  T3["Items table
id, order_id, product"]
  T1 -->|"JOIN"| T2
  T2 -->|"JOIN"| T3
end

subgraph doc["Document Store"]
  D1["User document
id, name, email
orders: embedded array
items: nested"]
end

subgraph kv["Key-Value Store"]
  K1["user:123 -> JSON blob"]
  K2["session:abc -> token data"]
  K3["cache:product:456 -> product data"]
end

style T1 fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style T2 fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style T3 fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style D1 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style K1 fill:#FAEEDA,stroke:#854F0B,color:#633806
style K2 fill:#FAEEDA,stroke:#854F0B,color:#633806
style K3 fill:#FAEEDA,stroke:#854F0B,color:#633806

When to use SQL

You have relational data with complex queries. If your data has many relationships and you need to query across them in ad-hoc ways, SQL is hard to beat. A reporting query that joins 5 tables and aggregates results is trivial in SQL and painful in most NoSQL databases.

You need ACID transactions. Financial systems, inventory management, order processing - anywhere that partial writes are catastrophic. SQL databases with Serializable isolation prevent the anomalies that can corrupt your data.

Your schema is stable. If you know your data structure upfront and it does not change frequently, SQL’s schema enforcement is a feature, not a limitation. It catches bugs at the database level.

Your data volume is manageable. A single PostgreSQL instance handles hundreds of millions of rows with proper indexing. Most applications never outgrow a well-tuned SQL database.

When to use NoSQL

You have a specific, well-defined access pattern. Cassandra is designed for “get all events for user X in time range Y.” If that is your query, Cassandra is faster and more scalable than PostgreSQL. But if you later need “get all events of type Z across all users,” you are in trouble.

You need horizontal write scaling. SQL databases scale reads easily (read replicas) but writes are limited to a single primary. Cassandra, DynamoDB, and MongoDB (with sharding) scale writes horizontally.

Your schema changes frequently. Document databases let you add fields to documents without a schema migration. Useful during rapid development when the data model is still evolving.

You have a specialized data model. Graph data (social networks, recommendation engines) is awkward in SQL. Time-series data (metrics, events) has specialized databases (InfluxDB, TimescaleDB) that are far more efficient than generic SQL.

graph LR
subgraph use_sql["Use SQL When"]
  S1["Complex joins needed"]
  S2["ACID transactions required"]
  S3["Ad-hoc reporting queries"]
  S4["Stable schema"]
  S5["Moderate data volume"]
end

subgraph use_nosql["Use NoSQL When"]
  N1["Known, simple access patterns"]
  N2["Horizontal write scaling needed"]
  N3["Flexible or evolving schema"]
  N4["Specialized data model"]
  N5["Extreme read or write throughput"]
end

style use_sql fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style use_nosql fill:#E1F5EE,stroke:#0F6E56,color:#085041

Where it breaks or gets interesting

NoSQL does not mean “no schema”

Document databases like MongoDB are schema-flexible, not schema-free. Your application code enforces the schema. If you store documents with inconsistent structures, your application code becomes a mess of null checks and type coercions. Many teams add application-level schema validation (Mongoose for MongoDB) and end up with the complexity of schema management without the database-level enforcement.

SQL can scale further than most people think

PostgreSQL with proper indexing handles billions of rows. With read replicas, it handles enormous read throughput. With Citus (sharding extension), it scales writes horizontally. TimescaleDB extends PostgreSQL for time-series. Many companies that “outgrew” SQL actually outgrew their poorly optimized SQL queries.

NoSQL joins are just application-level joins

MongoDB does not support joins (it has $lookup but it is slow). If you need data from two collections, you make two queries and join in application code. This is fine if you designed your data model to avoid joins. It is painful if you did not. The NoSQL data modeling mantra: “design for your queries, not your entities.”

NewSQL: the best of both worlds

CockroachDB, Google Spanner, and TiDB are “NewSQL” databases: they provide SQL semantics and ACID transactions with horizontal scalability. They achieve this through distributed consensus (Raft/Paxos). The tradeoff: higher latency than a single-node SQL database, and more operational complexity. But for teams that need both SQL and horizontal scale, they are a compelling option.

Real-world systems and their choices

Airbnb - PostgreSQL for core transactional data (bookings, payments). Elasticsearch for search. Druid for analytics. Different databases for different workloads.

Instagram - PostgreSQL sharded with Citus for user data. Cassandra for activity feeds and direct messages. Redis for caching.

Netflix - Cassandra for viewing history and user preferences (high write throughput, simple access patterns). MySQL for billing (ACID required). Elasticsearch for search.

Uber - MySQL for trip data (ACID for payments). Cassandra for geospatial data and driver location history. Schemaless (built on MySQL) for flexible document storage.

Discord - Started with MongoDB, migrated to Cassandra for messages (better write throughput, time-series access pattern). Postgres for user and server data.

How to apply it in practice

The decision framework

  1. What are your top 5 queries? Write them out. If they require joins across multiple entities, SQL is likely better. If they are all “get by key” or “get by key in time range,” NoSQL might be better.

  2. What is your write volume? Under 10,000 writes/second: SQL handles it. Over 100,000 writes/second across multiple regions: you need horizontal write scaling (Cassandra, DynamoDB).

  3. Do you need transactions across multiple entities? If yes, SQL or a NewSQL database. If no, NoSQL is viable.

  4. How stable is your schema? Rapidly evolving: document database. Stable: SQL.

  5. What does your team know? Operational expertise matters. A team that knows PostgreSQL deeply will outperform a team struggling with Cassandra’s data modeling.

Use both

Most production systems use multiple databases. PostgreSQL for transactional data, Redis for caching, Elasticsearch for search, Cassandra for time-series events. The key is choosing the right tool for each workload, not picking one database for everything.

FAQ

Q: Is MongoDB a good default choice for a new project?

It depends. MongoDB’s flexible schema is genuinely useful during early development when your data model is evolving. But many teams reach for MongoDB because it “feels easier” and then struggle with data modeling, lack of joins, and eventual consistency issues. PostgreSQL with JSONB columns gives you SQL’s power plus document flexibility for fields that genuinely need it. For most new projects, PostgreSQL is the safer default. Use MongoDB when you have a specific reason: very flexible schema requirements, or a team with strong MongoDB expertise.

Q: Can you use SQL and NoSQL together in the same application?

Yes, and you should when it makes sense. The polyglot persistence pattern means using the best database for each use case. User accounts and orders in PostgreSQL (ACID, joins). Session data in Redis (fast key-value). Search in Elasticsearch (full-text). Activity feed in Cassandra (time-series, high write throughput). The operational complexity of managing multiple databases is real, but it is often worth it for the performance and scalability benefits.

Q: What happened to all the companies that “migrated from SQL to NoSQL” in the 2010s?

Many of them migrated back, or added SQL alongside NoSQL. The NoSQL hype of 2010-2015 led many teams to adopt Cassandra or MongoDB for workloads that were perfectly suited to SQL. The operational complexity, lack of joins, and eventual consistency issues were painful. The lesson: NoSQL is excellent for specific workloads (high-throughput writes, time-series, flexible schema) but is not a general-purpose replacement for SQL.

Interview questions

Q1: You are building a ride-sharing app. What databases would you use and why?

Strong answer: Multiple databases for different workloads. PostgreSQL for core transactional data: user accounts, driver profiles, trip records, payments. These need ACID transactions (a trip must atomically update driver status, create a trip record, and initiate payment). Redis for real-time data: driver locations (updated every few seconds, read frequently, no need for persistence), active trip state, session tokens. Cassandra or DynamoDB for historical data: trip history, location history, event logs. These are write-heavy, time-series access patterns where Cassandra excels. Elasticsearch for search: searching for drivers by location, searching trip history. Each database is chosen for its specific access pattern.

Q2: A team is using MongoDB for their e-commerce platform. They are struggling with slow queries that need to join orders, users, and products. What would you recommend?

Strong answer: First, understand the access patterns. If the slow queries are for real-time user-facing features (show a user their order history with product details), the fix is data denormalization: embed product details in the order document at write time. This is the MongoDB way - design for your queries. If the slow queries are for analytics and reporting (total revenue by product category, cohort analysis), MongoDB is the wrong tool. Move analytics to a data warehouse (BigQuery, Redshift, Snowflake) that is designed for complex aggregations. If the team needs frequent ad-hoc joins for operational queries, consider migrating to PostgreSQL. The migration is painful but the long-term developer productivity gain is often worth it.

Q3: When would you choose Cassandra over PostgreSQL for a new service?

Strong answer: Cassandra is the right choice when you have: very high write throughput (millions of writes per second) that a single PostgreSQL primary cannot handle, multi-region active-active writes where you need low-latency writes in multiple geographic regions simultaneously, time-series access patterns where you always query by a partition key and time range, and you can accept eventual consistency (no ACID transactions across rows). Concrete examples: IoT sensor data, user activity events, application logs, social media feeds. PostgreSQL is better for: anything requiring joins, ACID transactions across multiple rows, ad-hoc queries, or when your team does not have Cassandra operational expertise. Cassandra’s data modeling is significantly more constrained than SQL - you must design your tables around your queries, and changing access patterns later requires creating new tables.