ACID vs BASE: Two Philosophies for Data Correctness


It is the end of the month. Your billing service is running the monthly charge job. Halfway through, the database server crashes. When it comes back up, some customers have been charged. Others haven’t. A few might have been charged twice. Your support queue fills up overnight.

This is the problem ACID was designed to prevent. And it is also the reason some systems deliberately choose not to use it.

What ACID actually is

ACID is a set of four properties that guarantee database transactions behave correctly even in the face of errors, crashes, and concurrent access.

Atomicity - A transaction is all or nothing. If you transfer 100 dollars from account A to account B, either both the debit and credit happen, or neither does. There is no state where A is debited but B is not credited.

Consistency - A transaction brings the database from one valid state to another. All defined rules, constraints, and invariants hold before and after. You cannot end up with a negative balance if your schema forbids it.

Isolation - Concurrent transactions execute as if they were serial. One transaction cannot see the intermediate state of another. The exact behavior depends on the isolation level, but the intent is that transactions don’t step on each other.

Durability - Once a transaction commits, it stays committed. A crash immediately after a commit does not lose the data. This is typically achieved through write-ahead logging (WAL).

What BASE actually is

BASE is not a formal specification like ACID. It is a description of how many distributed systems actually behave, coined by Eric Brewer as a contrast to ACID.

Basically Available - The system guarantees availability in the CAP sense. Responses are always returned, even if some nodes are down or partitioned.

Soft state - The state of the system may change over time, even without new input. Replicas converge toward consistency, but at any given moment they might disagree.

Eventually consistent - Given enough time without new writes, all replicas will converge to the same value. There is no guarantee of when, just that it will happen.

graph TB
subgraph acid["ACID - Strong Guarantees"]
  AT["Atomicity<br/>All or nothing"]
  CO["Consistency<br/>Valid state always"]
  IS["Isolation<br/>No dirty reads"]
  DU["Durability<br/>Survives crashes"]
end

subgraph base["BASE - Relaxed Guarantees"]
  BA["Basically Available<br/>Always responds"]
  SS["Soft State<br/>May change without input"]
  EC["Eventually Consistent<br/>Converges over time"]
end

subgraph tradeoff["The Core Tradeoff"]
  PERF["Higher throughput<br/>Lower latency<br/>Better availability"]
  CORR["Stronger correctness<br/>No anomalies<br/>Predictable behavior"]
end

acid --> CORR
base --> PERF

style AT fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style CO fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style IS fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style DU fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style BA fill:#E1F5EE,stroke:#0F6E56,color:#085041
style SS fill:#E1F5EE,stroke:#0F6E56,color:#085041
style EC fill:#E1F5EE,stroke:#0F6E56,color:#085041
style CORR fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style PERF fill:#E1F5EE,stroke:#0F6E56,color:#085041

How ACID works - the mechanism

Atomicity via write-ahead logging

Before any change is written to the actual data pages, it is first written to a log. If the system crashes mid-transaction, the database replays or rolls back using the log on recovery. PostgreSQL, MySQL InnoDB, and Oracle all use WAL variants. The log is the source of truth, not the data pages.

Isolation levels - the spectrum

Isolation is not binary. SQL defines four levels, each trading correctness for performance:

  • Read Uncommitted - You can read data that another transaction has modified but not yet committed. Allows dirty reads. Almost never used.
  • Read Committed - You only see committed data. Prevents dirty reads but allows non-repeatable reads (the same row can return different values within one transaction if another transaction commits between your reads).
  • Repeatable Read - The same row returns the same value within a transaction. Prevents non-repeatable reads but allows phantom reads (new rows matching your query can appear).
  • Serializable - Full isolation. Transactions behave as if they ran one at a time. Most expensive. PostgreSQL implements this with predicate locking.

Most databases default to Read Committed. PostgreSQL defaults to Read Committed. MySQL InnoDB defaults to Repeatable Read.

Durability via fsync

When a transaction commits, the WAL record must be flushed to disk before the commit acknowledgment is returned to the client. This is the fsync call. It is also why databases on fast NVMe SSDs commit faster than on spinning disks - the bottleneck is often the fsync latency.

How BASE works - the mechanism

Eventual consistency via anti-entropy

In a BASE system like Cassandra, writes go to a coordinator node which forwards them to replicas. If a replica is down, the coordinator stores a hint and delivers it when the replica recovers (hinted handoff). Periodically, nodes run anti-entropy repair, comparing their data using Merkle trees and syncing differences.

Conflict resolution

When two replicas accept conflicting writes during a partition, they need a strategy to converge:

  • Last-write-wins (LWW) - The write with the highest timestamp wins. Simple, but loses data if clocks are skewed.
  • Vector clocks - Track causality. If write B happened after write A, B wins. If they are concurrent, surface the conflict to the application.
  • CRDTs - Data structures designed to merge automatically. A G-Counter (grow-only counter) can always be merged by taking the max of each node’s count.
graph TB
subgraph acid_rep["ACID - Synchronous Replication"]
  AC["Client"] -->|"Write balance=500"| AN1["Node 1 Primary"]
  AN1 -->|"Replicate sync"| AN2["Node 2 Replica"]
  AN1 -->|"Replicate sync"| AN3["Node 3 Replica"]
  AN2 -->|"ACK"| AN1
  AN3 -->|"ACK"| AN1
  AN1 -->|"Commit confirmed"| AC
end

subgraph base_rep["BASE - Asynchronous Replication"]
  BC["Client"] -->|"Write balance=500"| BN1["Node 1 Primary"]
  BN1 -->|"ACK immediately"| BC
  BN1 -.->|"Replicate async later"| BN2["Node 2 Replica"]
  BN1 -.->|"Replicate async later"| BN3["Node 3 Replica"]
end

style AN1 fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style AN2 fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style AN3 fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style BN1 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style BN2 fill:#FAEEDA,stroke:#854F0B,color:#633806
style BN3 fill:#FAEEDA,stroke:#854F0B,color:#633806

Where it breaks or gets interesting

ACID does not mean “no anomalies”

Even with ACID, the default isolation level (Read Committed) allows anomalies. Two transactions can both read a balance of 100 dollars, both decide to spend 80 dollars, and both commit - leaving the account at -60 dollars. This is a write skew anomaly. You need Serializable isolation to prevent it, which most applications don’t use because of the performance cost.

”Eventually consistent” has no time bound

Eventually consistent means the system will converge, but it does not say when. Under normal conditions, Cassandra replication lag is milliseconds. Under heavy load or after a partition, it could be seconds or minutes. Applications that assume “eventually” means “very soon” get burned.

BASE systems can offer strong consistency for specific operations

Cassandra with QUORUM reads and writes gives you linearizable consistency for that operation. DynamoDB has a “strongly consistent read” option. The BASE label describes the default behavior, not the ceiling of what is possible.

ACID transactions across microservices don’t exist

A distributed transaction spanning two separate databases cannot use ACID. The two-phase commit (2PC) protocol attempts this but is slow, fragile, and a single point of failure. This is why the Saga pattern exists - it replaces a distributed ACID transaction with a sequence of local ACID transactions with compensating actions for rollback.

Real-world systems and their choices

ACID systems:

  • PostgreSQL - Full ACID with all four isolation levels including true Serializable via SSI (Serializable Snapshot Isolation).
  • MySQL InnoDB - ACID compliant. The MyISAM engine was not, which is why InnoDB became the default.
  • SQLite - ACID compliant. Uses a single-writer model which makes isolation simple.
  • CockroachDB - Distributed ACID using Raft consensus. Serializable by default.
  • Google Spanner - Globally distributed ACID with external consistency using TrueTime.

BASE systems:

  • Cassandra - Tunable consistency, eventual by default. Designed for multi-datacenter writes.
  • DynamoDB - Eventually consistent reads by default, strongly consistent reads available at higher cost.
  • Couchbase - Multi-master replication with eventual consistency.
  • Riak - Designed around eventual consistency with vector clocks for conflict resolution.

Systems that blur the line:

  • MongoDB - Single-document operations are ACID. Multi-document transactions added in v4.0. Replica set reads can be eventually consistent depending on read preference.
  • Redis - Single-threaded, so operations are atomic. But replication is asynchronous, so a primary crash can lose recent writes.

How to apply it in practice

The choice is not “ACID or BASE” globally. It is per data type, per operation.

Use ACID when:

  • Money moves between accounts
  • Inventory counts must be exact
  • Order state transitions must be atomic
  • Regulatory compliance requires auditability

Use BASE when:

  • User activity feeds and timelines
  • Product catalog reads
  • Analytics counters and metrics
  • Social graph traversals
  • Any data where brief staleness is acceptable
graph LR
subgraph acid_use["Use ACID"]
  P["Payments"]
  I["Inventory"]
  O["Orders"]
  A["Audit logs"]
end

subgraph base_use["Use BASE"]
  F["Activity feeds"]
  C["Catalog reads"]
  M["Metrics"]
  S["Search indexes"]
end

subgraph consider["Consider carefully"]
  US["User sessions"]
  PR["User profiles"]
  NO["Notifications"]
end

style acid_use fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style base_use fill:#E1F5EE,stroke:#0F6E56,color:#085041
style consider fill:#FAEEDA,stroke:#854F0B,color:#633806

The practical checklist

Before choosing, answer these:

  1. Can two concurrent operations produce an incorrect result if they both read the same stale value? If yes, you need at minimum Read Committed, possibly Serializable.
  2. Does a partial failure leave data in an inconsistent state? If yes, you need atomicity.
  3. Can you tolerate reading data that is a few seconds old? If yes, BASE is viable.
  4. Do you need to write to multiple geographic regions simultaneously? ACID across regions is extremely expensive. BASE is the pragmatic choice.

FAQ

Q: Is NoSQL the same as BASE?

No. NoSQL just means “not only SQL” - it describes the query interface and data model, not the consistency guarantees. MongoDB is NoSQL but supports ACID transactions. Cassandra is NoSQL and BASE. You can have a SQL database (like CockroachDB) that is distributed and ACID. The terms describe different dimensions.

Q: Can you build a BASE system on top of an ACID database?

Yes, and many systems do. You might use PostgreSQL as your primary store (ACID) but maintain a Redis cache (BASE) in front of it. The cache is eventually consistent with the database. The trick is handling cache invalidation correctly so the staleness window is bounded and acceptable.

Q: What happens when an eventually consistent system never converges?

This is called a “permanent inconsistency” and it happens when conflict resolution fails or when anti-entropy repair is disabled. In practice, Cassandra’s repair process must be run regularly (typically weekly) or tombstones accumulate and replicas drift. Neglecting repair is one of the most common operational mistakes with Cassandra clusters.

Interview questions

Q1: Your payment service uses PostgreSQL. You want to add a Redis cache for balance reads to reduce database load. What consistency problems does this introduce and how do you handle them?

Strong answer: The cache introduces eventual consistency between Redis and PostgreSQL. A write to PostgreSQL won’t immediately reflect in Redis. The main risks are: a user sees a stale balance after a recent transaction, and two concurrent requests both read a stale balance and both approve a spend that would overdraft the account. Mitigations: use cache-aside with a short TTL (5-10 seconds) for balance reads, but always do the actual debit/credit check against PostgreSQL with a SELECT FOR UPDATE to prevent the double-spend. The cache is only for display reads, never for authorization decisions.

Q2: Explain write skew and give a concrete example. Which isolation level prevents it?

Strong answer: Write skew happens when two transactions each read overlapping data, make decisions based on what they read, and write to non-overlapping data - but the combined effect violates a constraint. Classic example: an on-call scheduling system requires at least one doctor on call. Two doctors both check and see two doctors on call. Both decide to go off call. Both commit. Now zero doctors are on call. Neither transaction wrote to the same row, so row-level locking doesn’t help. Only Serializable isolation prevents this, because it detects the read-write dependency between the two transactions and aborts one.

Q3: A startup is building a social media app. They ask you whether to use PostgreSQL or Cassandra. What questions do you ask before recommending?

Strong answer: Ask about the access patterns first. What are the top 5 queries? If it is mostly “get user profile” and “get recent posts for a user” - those are key-value and time-series patterns that Cassandra handles well. If it is “find all users who follow both A and B” or “count posts by category” - those are relational queries that PostgreSQL handles better. Ask about write volume: if they expect millions of writes per second across multiple regions, Cassandra’s multi-master model is a better fit. Ask about team expertise: Cassandra’s operational complexity (repair, compaction, schema design) is significant. For a startup with a small team, PostgreSQL’s simplicity often wins even if Cassandra would theoretically scale better.