Build a Wallet Ledger System at Scale


databases distributed-systems scalability

System Design Deep Dive

Wallet Ledger System at Scale

Every cent must be accounted for. The brutal tension between financial accuracy and throughput - and how to win both.

14 min readAdvancedData Engineering

Think of a general ledger as an accountant’s notebook that is never allowed to have anything crossed out. Every financial event - a payment received, a fee charged, a refund issued - gets written on the next blank line. Pages are never torn out. Nothing is erased. If you made an error in entry 10,000, you do not white it out; you write entry 10,001 that corrects it. This is the core principle of double-entry bookkeeping, and it is not a legacy accounting concept - it is the correct mental model for designing a financial ledger system that will be auditable by regulators, trusted by users, and correct under network failures.

The scale numbers make this design genuinely hard. A mid-size fintech with 10 million active users might see 3,000 transactions per second during peak hours - a Black Friday promotion, a cash-back campaign going viral, a payment deadline at midnight. Each transaction requires writing two ledger entries atomically (a debit and a corresponding credit) and updating two account balances without ever allowing a negative balance or a partial write. At 3,000 TPS, that is 6,000 ledger inserts and 6,000 balance updates per second, all needing ACID guarantees. A single slow lock on a popular account (a merchant receiving thousands of micro-payments simultaneously) can cascade into a queue of stalled transactions.

The analogy I use is a busy bank branch with one teller window. Single-threaded serial processing gives perfect correctness - the teller processes each customer completely before the next begins, so no two customers ever touch the same account simultaneously. But correctness at the cost of throughput is not a business outcome. The real design challenge is equivalent to opening 16 teller windows while guaranteeing they cannot accidentally give the same dollar to two customers. Partitioning by account identifier is the key, but it breaks down the moment a transaction spans two accounts on different partitions.

The three core architectural decisions that cascade from this: first, whether to compute balances on the fly by summing ledger entries or to maintain a materialized balance with its own locking strategy. Second, how to partition the ledger to eliminate cross-shard transactions while supporting multi-party payments. Third, how to build an idempotency layer robust enough that a retried network request never posts a duplicate debit.

Requirements and Constraints

Functional Requirements

  • Record every credit and debit as an immutable ledger entry with a unique entry ID
  • Enforce double-entry bookkeeping: every transaction produces a debit entry and a corresponding credit entry, and their amounts must be equal
  • Provide real-time balance reads: GET /balance returns the current available balance within 50ms P99
  • Prevent negative balances: a debit must be rejected if the available balance is insufficient
  • Support idempotent transaction submission: retrying the same request with the same idempotency key must not create duplicate entries
  • Expose a full transaction history: paginated list of ledger entries per account with timestamps and reference IDs

Non-Functional Requirements

  • Write throughput: 5,000 transactions per second sustained, 15,000 TPS peak for 30-second bursts
  • Read throughput: 50,000 balance reads per second (balance checks are called before every payment)
  • Write latency: P99 under 150ms for a committed transaction (both ledger entries + balance update)
  • Read latency: P99 under 50ms for balance reads; P99 under 500ms for full transaction history
  • Durability: zero tolerance for data loss - RPO = 0, achieved via synchronous replication
  • Availability: 99.99% uptime (52 minutes downtime per year maximum) on the write path
  • Audit trail: every ledger entry immutable and cryptographically chained for regulatory compliance
  • Accounts: support up to 100 million active accounts across 16 ledger shards

Constraints and Scope

  • Currency handling is in integer minor units (paise, cents) to avoid floating-point rounding
  • Cross-currency conversion is out of scope - each wallet holds one currency denomination
  • Fraud detection is a downstream consumer of the event stream, not inline with transaction commits
  • The system does not handle card authorization holds - only settled transactions are recorded

High-Level Architecture

The system comprises five layers working together. The API Gateway handles rate limiting, authentication, and routes requests to stateless Transaction Service instances. The Transaction Service is the orchestrator - it validates input, checks idempotency, and drives the Double-Entry Engine that writes the paired ledger entries. The Ledger Storage Layer holds the immutable append-only ledger shards and the materialized balance table. The Balance Cache (Redis Cluster) absorbs the 50K RPS read load so balance checks never hit the primary database. The Event Stream (Kafka) decouples downstream consumers - audit trail, notifications, analytics, and reconciliation - from the synchronous write path.

Wallet ledger system high-level architecture showing API gateway, transaction service, double-entry engine, ledger shards, balance cache, and event stream

The critical design choice is that the synchronous write path only touches two things: the ledger table (INSERT) and the balances table (UPDATE with optimistic lock). Everything else - audit archiving, notification delivery, fraud scoring - is asynchronous and reads from the Kafka event stream. This keeps the P99 write latency under 150ms even as downstream consumers grow.

Key Insight

The read/write ratio in a wallet system is typically 10:1 to 20:1 - users check balances far more than they transact. Putting a Redis cache in front of balance reads is not a premature optimization; it is what keeps the primary database’s IOPS budget available for the writes that require ACID guarantees. Cache invalidation is synchronous within the same database transaction via a transactional outbox pattern.

The Ledger Storage Layer

The ledger is not a mutable table of account states. It is an immutable, append-only log of financial events, analogous to a git commit history. You never UPDATE or DELETE a ledger row. When a refund happens, you do not modify the original debit - you append a new credit entry referencing the original transaction. This design has two compelling properties: it is inherently audit-friendly (the full history is always present), and it eliminates the concurrency problems of updating a row in place.

The append-only model means that reads of historical data are always consistent regardless of concurrent writes - there is nothing to lock on rows that are never modified. The only mutable state in the system is the balances table (materialized running total per account) and the idempotency_keys table (deduplication cache).

Transaction data flow showing the lifecycle from client request through idempotency check, double-entry engine, ledger write, balance update, and downstream event publishing

The ledger table is partitioned by account_id hash. A transaction between account A and account B produces one ledger entry on account A’s shard (DEBIT) and one ledger entry on account B’s shard (CREDIT). These writes are coordinated by the Transaction Service using a two-phase approach: both entries are written within the same distributed transaction using the database’s cross-shard transaction support (CockroachDB’s serializable isolation, or Postgres with Citus extension), or via a saga pattern with compensating transactions for systems that cannot span shards atomically.

Common Mistake

Storing balances only in the ledger (no materialized balance table) forces a full table scan - SUM(amount) WHERE account_id = X - on every balance read. At 50,000 balance reads per second over accounts with years of transaction history, this will destroy database performance. Always maintain a materialized balance, updated atomically with the ledger insert.

Double-Entry Bookkeeping Engine

Double-entry bookkeeping, invented by Luca Pacioli in 1494 and still the foundation of every accounting system on earth, states that every financial event has two equal and opposite effects. When you pay 500 rupees for a pizza, your wallet account decreases by 500 (DEBIT your asset account) and the restaurant’s wallet account increases by 500 (CREDIT their asset account). The total of all debits must always equal the total of all credits - this invariant is the mathematical proof that no money has been created or destroyed.

In software, enforcing this means every call to the Transaction Service must result in exactly two ledger entries within a single ACID transaction: one DEBIT entry and one CREDIT entry with equal amounts. The transaction must atomically commit both or roll back both. There is no valid state where the debit commits but the credit does not.

Double-entry bookkeeping engine internals showing debit and credit entry creation, optimistic lock checks, invariant assertion, and commit or rollback paths

The engine also enforces the non-negative balance constraint before committing. The sequence within a single database transaction is: (1) read the debit account’s current balance with SELECT ... FOR UPDATE, (2) verify balance >= amount, (3) INSERT the debit ledger entry, (4) UPDATE the debit account balance, (5) INSERT the credit ledger entry, (6) UPDATE the credit account balance, (7) assert SUM(debits) = SUM(credits) for this transaction, (8) COMMIT. If any step fails or the assertion fails, the entire transaction rolls back with no visible effect.

Real World

Stripe’s financial infrastructure uses a ledger system conceptually identical to this design. Their internal “Stripe Ledger” records every charge, refund, payout, and fee as immutable ledger entries with double-entry pairs. PayPal’s internal systems similarly use an append-only ledger at their core - the balance you see in the UI is a materialized view over this ledger, not the authoritative source. The ledger entries are the source of truth; balances are derived caches.

Balance Computation

There are two valid strategies for computing an account’s current balance, and the choice between them determines your read latency, storage footprint, and operational complexity.

Strategy 1: Compute from ledger (no materialized balance). On every balance read, execute SELECT SUM(CASE WHEN type='CREDIT' THEN amount ELSE -amount END) FROM ledger_entries WHERE account_id = ?. This is trivially correct - the sum of all ledger entries is definitionally the balance. The problem is performance: an account with 5 years of transaction history might have 50,000 rows. Summing 50,000 rows for each of 50,000 concurrent balance reads is 2.5 billion row reads per second. This is unsustainable.

Strategy 2: Materialized balance with optimistic locking. Maintain a balances table with one row per account containing (account_id, balance, version). The version field is a monotonically increasing integer. Every ledger write atomically increments the version. Balance reads are a single indexed primary key lookup - sub-millisecond even without a cache layer. The tradeoff: this mutable row is a contention hotspot under high concurrency.

The solution to the contention problem is optimistic locking (also called compare-and-swap). Instead of acquiring a pessimistic lock at read time, the balance update uses a conditional update:

UPDATE balances
SET balance = balance - 500, version = version + 1
WHERE account_id = 'acc_123' AND version = 42;

If another transaction has already modified this row (version is now 43), the UPDATE affects 0 rows. The application detects this (rows_affected = 0) and retries the entire transaction with a fresh read of the current version. This eliminates lock contention at the cost of retry logic in the application layer - a worthwhile trade when contention is rare.

Key Insight

Optimistic locking works well when the probability of collision is low - which is true for most accounts. A typical account sees maybe 10-20 concurrent transactions at most. The exception is merchant accounts receiving thousands of micro-payments simultaneously. For those hot accounts, use a dedicated balance aggregation queue: buffer incoming credits in Redis with INCRBYFLOAT, then flush to the database in batches every 100ms. This converts N concurrent UPDATE conflicts into one batch write.

Data Model

The data model is deliberately minimal. Four tables handle the core functionality.

-- Account registry: one row per wallet account
CREATE TABLE accounts (
  account_id    UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  owner_id      UUID NOT NULL,
  currency      CHAR(3) NOT NULL DEFAULT 'INR',
  status        VARCHAR(20) NOT NULL DEFAULT 'active',
  created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  CONSTRAINT status_values CHECK (status IN ('active','frozen','closed'))
);

-- Materialized balance: mutable, one row per account
-- version enables optimistic locking (compare-and-swap)
CREATE TABLE balances (
  account_id    UUID PRIMARY KEY REFERENCES accounts(account_id),
  balance       BIGINT NOT NULL DEFAULT 0,  -- in minor units (paise)
  version       BIGINT NOT NULL DEFAULT 0,
  updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  CONSTRAINT non_negative CHECK (balance >= 0)
);

-- Append-only ledger: immutable, one row per debit/credit entry
-- Partitioned by account_id for horizontal scaling
CREATE TABLE ledger_entries (
  entry_id      UUID NOT NULL DEFAULT gen_random_uuid(),
  txn_id        UUID NOT NULL,
  account_id    UUID NOT NULL,
  entry_type    VARCHAR(6) NOT NULL,         -- 'DEBIT' or 'CREDIT'
  amount        BIGINT NOT NULL CHECK (amount > 0),
  currency      CHAR(3) NOT NULL,
  balance_after BIGINT NOT NULL,             -- snapshot at write time
  reference_id  VARCHAR(255),               -- external payment reference
  metadata      JSONB,
  created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (account_id, entry_id)         -- composite for partition locality
) PARTITION BY HASH (account_id);

-- Create 16 hash partitions
CREATE TABLE ledger_entries_p0  PARTITION OF ledger_entries FOR VALUES WITH (MODULUS 16, REMAINDER 0);
CREATE TABLE ledger_entries_p1  PARTITION OF ledger_entries FOR VALUES WITH (MODULUS 16, REMAINDER 1);
-- ... repeat for p2 through p15

CREATE INDEX idx_ledger_txn ON ledger_entries (txn_id);
CREATE INDEX idx_ledger_account_time ON ledger_entries (account_id, created_at DESC);

-- Idempotency keys: deduplication cache with TTL
CREATE TABLE idempotency_keys (
  idempotency_key  VARCHAR(255) NOT NULL,
  account_id       UUID NOT NULL,
  txn_id           UUID NOT NULL,
  response_body    JSONB,
  created_at       TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  expires_at       TIMESTAMPTZ NOT NULL DEFAULT NOW() + INTERVAL '30 days',
  PRIMARY KEY (idempotency_key, account_id)
);

CREATE INDEX idx_idempotency_expires ON idempotency_keys (expires_at);
Key Insight

The balance_after column in ledger_entries is redundant but invaluable. It is a snapshot of the balance immediately after this entry was written. This lets reconciliation jobs verify the running total by scanning entries chronologically without recomputing from scratch - if balance_after on entry N-1 plus the amount on entry N does not equal balance_after on entry N, something corrupted the ledger at that point. This is your canary in the coal mine.

Key Algorithms and Protocols

Transaction Processing with Idempotency

import uuid
from contextlib import contextmanager
from typing import Optional

class TransactionService:
    def __init__(self, db, redis_client):
        self.db = db
        self.redis = redis_client

    def post_transaction(
        self,
        from_account_id: str,
        to_account_id: str,
        amount: int,              # in minor units (paise)
        idempotency_key: str,
        reference_id: Optional[str] = None,
    ) -> dict:
        """
        Post a double-entry transaction. Idempotent: calling this
        multiple times with the same idempotency_key returns the same
        result without creating duplicate entries.
        """
        # Step 1: Idempotency check (fast path - Redis lookup first)
        cached = self.redis.get(f"idem:{idempotency_key}:{from_account_id}")
        if cached:
            return cached  # return previously committed result

        # Step 2: Execute within a serializable database transaction
        with self.db.transaction(isolation="SERIALIZABLE") as txn:
            # Check idempotency in DB (handles Redis cache miss after restart)
            existing = txn.query_one(
                "SELECT txn_id, response_body FROM idempotency_keys "
                "WHERE idempotency_key = %s AND account_id = %s",
                [idempotency_key, from_account_id]
            )
            if existing:
                return existing["response_body"]

            # Step 3: Read debit account balance with row lock
            debit_acct = txn.query_one(
                "SELECT balance, version FROM balances "
                "WHERE account_id = %s FOR UPDATE",
                [from_account_id]
            )
            if debit_acct is None:
                raise ValueError(f"Account {from_account_id} not found")
            if debit_acct["balance"] < amount:
                raise InsufficientFundsError(
                    f"Balance {debit_acct['balance']} < amount {amount}"
                )

            # Step 4: Read credit account balance with row lock
            credit_acct = txn.query_one(
                "SELECT balance, version FROM balances "
                "WHERE account_id = %s FOR UPDATE",
                [to_account_id]
            )
            if credit_acct is None:
                raise ValueError(f"Account {to_account_id} not found")

            txn_id = str(uuid.uuid4())
            new_debit_balance = debit_acct["balance"] - amount
            new_credit_balance = credit_acct["balance"] + amount

            # Step 5: Insert debit ledger entry (immutable)
            txn.execute(
                "INSERT INTO ledger_entries "
                "(entry_id, txn_id, account_id, entry_type, amount, "
                " currency, balance_after, reference_id) "
                "VALUES (%s, %s, %s, 'DEBIT', %s, 'INR', %s, %s)",
                [str(uuid.uuid4()), txn_id, from_account_id,
                 amount, new_debit_balance, reference_id]
            )

            # Step 6: Insert credit ledger entry (immutable)
            txn.execute(
                "INSERT INTO ledger_entries "
                "(entry_id, txn_id, account_id, entry_type, amount, "
                " currency, balance_after, reference_id) "
                "VALUES (%s, %s, %s, 'CREDIT', %s, 'INR', %s, %s)",
                [str(uuid.uuid4()), txn_id, to_account_id,
                 amount, new_credit_balance, reference_id]
            )

            # Step 7: Update debit balance with optimistic lock
            rows = txn.execute(
                "UPDATE balances SET balance = %s, version = version + 1, "
                "updated_at = NOW() "
                "WHERE account_id = %s AND version = %s",
                [new_debit_balance, from_account_id, debit_acct["version"]]
            )
            if rows.rowcount == 0:
                raise OptimisticLockError("Concurrent modification on debit account")

            # Step 8: Update credit balance with optimistic lock
            rows = txn.execute(
                "UPDATE balances SET balance = %s, version = version + 1, "
                "updated_at = NOW() "
                "WHERE account_id = %s AND version = %s",
                [new_credit_balance, to_account_id, credit_acct["version"]]
            )
            if rows.rowcount == 0:
                raise OptimisticLockError("Concurrent modification on credit account")

            # Step 9: Store idempotency key (expires in 30 days)
            result = {"txn_id": txn_id, "status": "settled"}
            txn.execute(
                "INSERT INTO idempotency_keys "
                "(idempotency_key, account_id, txn_id, response_body) "
                "VALUES (%s, %s, %s, %s)",
                [idempotency_key, from_account_id, txn_id, result]
            )

            # Step 10: Insert outbox event for async consumers
            txn.execute(
                "INSERT INTO outbox_events (event_type, payload) "
                "VALUES ('transaction.settled', %s)",
                [{"txn_id": txn_id, "from": from_account_id,
                  "to": to_account_id, "amount": amount}]
            )

        # Step 11: Cache idempotency result in Redis (best effort)
        self.redis.setex(
            f"idem:{idempotency_key}:{from_account_id}",
            86400 * 30,  # 30 days TTL
            result
        )
        return result

Optimistic Locking with Retry

import time
import random

def post_transaction_with_retry(
    service: TransactionService,
    from_account_id: str,
    to_account_id: str,
    amount: int,
    idempotency_key: str,
    max_retries: int = 5,
) -> dict:
    """
    Retry wrapper for optimistic lock conflicts.
    Uses exponential backoff with jitter to avoid thundering herd
    when many concurrent transactions retry simultaneously.
    """
    for attempt in range(max_retries):
        try:
            return service.post_transaction(
                from_account_id=from_account_id,
                to_account_id=to_account_id,
                amount=amount,
                idempotency_key=idempotency_key,
            )
        except OptimisticLockError:
            if attempt == max_retries - 1:
                raise  # exhausted retries
            # Exponential backoff: 10ms, 20ms, 40ms, 80ms
            base_delay = 0.01 * (2 ** attempt)
            jitter = random.uniform(0, base_delay * 0.5)
            time.sleep(base_delay + jitter)
        except InsufficientFundsError:
            raise  # do not retry - user error, not a system conflict

Balance Read with Cache

package ledger

import (
    "context"
    "fmt"
    "time"
)

// GetBalance returns the current balance for an account.
// Reads from Redis cache; falls back to primary DB on cache miss.
// Cache TTL is 1 second - accepts up to 1 second of staleness
// for non-payment-critical balance displays.
func (s *BalanceService) GetBalance(ctx context.Context, accountID string) (int64, error) {
    cacheKey := fmt.Sprintf("balance:%s", accountID)

    // Fast path: Redis cache hit (sub-millisecond)
    cached, err := s.redis.Get(ctx, cacheKey).Int64()
    if err == nil {
        return cached, nil
    }

    // Slow path: read from primary DB (indexed PK lookup, ~5ms)
    var balance int64
    err = s.db.QueryRowContext(
        ctx,
        "SELECT balance FROM balances WHERE account_id = $1",
        accountID,
    ).Scan(&balance)
    if err != nil {
        return 0, fmt.Errorf("balance read failed for %s: %w", accountID, err)
    }

    // Backfill cache with 1-second TTL
    // Short TTL ensures post-transaction reads see updated balance quickly
    s.redis.Set(ctx, cacheKey, balance, 1*time.Second)

    return balance, nil
}

// InvalidateBalanceCache is called synchronously after a transaction commits.
// The transactional outbox ensures the event fires only on successful commit.
func (s *BalanceService) InvalidateBalanceCache(ctx context.Context, accountIDs ...string) {
    keys := make([]string, len(accountIDs))
    for i, id := range accountIDs {
        keys[i] = fmt.Sprintf("balance:%s", id)
    }
    s.redis.Del(ctx, keys...)
}

Audit Trail Hash Chain

import hashlib
import json

def compute_entry_hash(entry: dict, previous_hash: str) -> str:
    """
    Each ledger entry includes a hash of the previous entry's hash,
    creating a cryptographic chain. Tampering with any historical entry
    invalidates all subsequent hashes - detectable by reconciliation.

    This is conceptually similar to a blockchain but without
    decentralization overhead - a single trusted chain per account.
    """
    chain_input = {
        "previous_hash": previous_hash,
        "entry_id": entry["entry_id"],
        "txn_id": entry["txn_id"],
        "account_id": entry["account_id"],
        "entry_type": entry["entry_type"],
        "amount": entry["amount"],
        "balance_after": entry["balance_after"],
        "created_at": entry["created_at"].isoformat(),
    }
    payload = json.dumps(chain_input, sort_keys=True).encode("utf-8")
    return hashlib.sha256(payload).hexdigest()

def verify_audit_chain(entries: list) -> bool:
    """
    Walk the ledger entries for one account in chronological order,
    recomputing each hash and verifying it matches the stored hash.
    Returns False if any entry has been tampered with.
    """
    previous_hash = "genesis"
    for entry in entries:
        expected = compute_entry_hash(entry, previous_hash)
        if entry["entry_hash"] != expected:
            return False
        previous_hash = entry["entry_hash"]
    return True

Scaling and Performance

Scaling and partitioning strategy showing shard router, 16 ledger shards with read replicas, Redis balance cache cluster, and capacity estimation numbers

The partition strategy is account-level hash sharding: shard_id = hash(account_id) % N_SHARDS. This guarantees that all ledger entries for a given account land on the same shard, and that both the ledger INSERT and balance UPDATE for one side of a transaction are local to that shard. The only cross-shard coordination is for the two-sided double-entry write (debit shard and credit shard may differ), which uses distributed transactions or the saga pattern.

Capacity Estimation - 100M accounts, 5,000 TPS peak
----------------------------------------------------
Ledger writes:       5,000 TPS x 2 entries = 10,000 rows/sec
Row size:            ~256 bytes per ledger entry
Write bandwidth:     10,000 x 256B = 2.5 MB/sec = ~220 GB/day
Storage (1 year):    220 GB/day x 365 = ~80 TB (before replication)
With 3x replication: ~240 TB total storage across all shards

Balance reads:       50,000 RPS
Cache hit ratio:     ~95% (most reads hit Redis within 1-second TTL)
DB reads avoided:    47,500 RPS saved; only 2,500 RPS reach primary
Redis memory:        100M accounts x 100 bytes/entry = ~10 GB
Redis ops/sec:       50,000 GET + 10,000 SET = 60,000 ops/sec per cluster

Shard sizing (16 shards):
- Accounts per shard: 100M / 16 = 6.25M accounts
- Write TPS per shard: 10,000 rows/sec / 16 = 625 rows/sec
- Each shard: 1 primary (c6g.2xlarge, 32 GB RAM) + 2 read replicas
- Cross-shard transactions: <5% of volume (estimated)
Real World

Square (now Block) processes billions of dollars daily through a ledger architecture with these exact properties. Their Subzero ledger system uses append-only entries, materialized balances, and hash sharding by merchant account. Notably, they found that the 1-second balance cache TTL caused customer confusion when users transferred money and immediately checked their balance on a different device - they solved this with cache invalidation on the write path, not by lowering TTL across the board.

Failure Modes and Recovery

Failure ScenarioDetectionImpactRecovery
Primary DB shard crashHealth check fails within 10s, Prometheus alertWrites to that shard blocked; reads from replica continueAutomated failover to replica in 30-60 seconds; zero data loss if synchronous replication
Redis cache node failureRedis Sentinel health checkBalance reads fall back to DB; read latency increases from 1ms to 10msRedis Cluster auto-promotes replica; degraded mode for ~30 seconds
Duplicate transaction submittedIdempotency key lookup in DB returns existing txnNo duplicate entry; previous result returnedNo recovery needed - idempotency key prevents the duplicate
Optimistic lock contention spikeRow affected = 0 rate monitored in metricsIncreased transaction latency from retries; possible timeout at P99Retry with backoff; escalate hot accounts to dedicated queue
Outbox event delivery failureCDC lag monitoring; consumer lag alertDownstream systems (audit, notifications) lag; core ledger unaffectedOutbox guarantees at-least-once delivery; consumers must be idempotent
Cross-shard transaction partial failureSaga coordinator detects missing credit entryDebit posted without credit - money appears destroyedCompensating transaction: reverse the debit entry, alert on-call
Balance cache and DB divergeNightly reconciliation: SUM(ledger) vs balances tablePossible incorrect balance displayRecompute materialized balance from ledger (authoritative); invalidate cache
Common Mistake

Relying on the transactional outbox pattern without monitoring CDC lag. If the Change Data Capture relay falls behind, the Kafka event stream goes stale and downstream consumers - including the audit trail - stop receiving events. The ledger itself is fine, but regulators care about the audit log being near-real-time. Alert on CDC lag exceeding 30 seconds, not just on consumer lag.

Comparison of Approaches

Design DecisionOption AOption BOption CRecommendation
Balance storageCompute from ledger (no cache)Materialized balance (DB only)Materialized balance + RedisOption C: Redis absorbs read load; DB is authoritative
Concurrency controlPessimistic locks (SELECT FOR UPDATE)Optimistic locking (CAS on version)Serializable isolation (MVCC)Option B for most accounts; Option A for hot merchant accounts
Partition strategyNo partitioning (single DB)Partition by account_id hashPartition by tenant/currencyOption B: account hash gives even distribution; simple routing
Cross-shard transactionsDistributed 2PC (strong consistency)Saga pattern (eventual consistency)Single-shard only (denormalize)Option B for most cases; Option C for highest-throughput paths
Idempotency storageDB table onlyRedis onlyRedis + DB fallbackOption C: Redis for speed; DB for durability across restarts
Audit trailWORM storage (S3 + Object Lock)Cryptographic hash chainAppend-only ledger aloneOption B + A: hash chain proves integrity; S3 Object Lock prevents deletion

Key Takeaways

  • The append-only ledger is not a performance constraint - it is a correctness guarantee. Immutability means reads of historical data are contention-free and the audit trail cannot be tampered with without detection.
  • Double-entry bookkeeping’s invariant (sum of debits = sum of credits) is the mathematical check that no money is created or destroyed. Enforce it as a hard assertion inside the transaction, not as a soft background job.
  • Optimistic locking (compare-and-swap on a version column) outperforms pessimistic locking for typical accounts because collision probability is low. Fall back to pessimistic locking or batched writes only for hot merchant accounts with extremely high concurrency.
  • The idempotency key is a first-class system primitive, not an afterthought. Every external transaction submission must carry one. The deduplication window (30 days) must be longer than any reasonable client retry window.
  • Separate your read path from your write path. The 50,000 RPS balance read load is served by Redis; it should never touch the primary database. The 5,000 TPS write load requires ACID guarantees and should never share connection pools with reads.
  • Partition by account_id hash from day one. Adding shards to an unpartitioned ledger after it contains billions of rows requires a painful online migration. Start with 16 shards and add more as needed.
  • The transactional outbox pattern - writing events to an outbox table within the same ACID transaction as the ledger write - is the correct way to publish events without risk of the ledger committing but the Kafka publish failing.
  • Store balances in integer minor units (paise, cents, satoshis). Floating-point arithmetic on financial amounts is a bug waiting to happen. Every rounding error is a regulatory finding.

Frequently Asked Questions

How do you handle the case where a user’s balance check races with a concurrent debit? The balance check before a debit uses SELECT ... FOR UPDATE within the same database transaction that will perform the INSERT and UPDATE. This row-level lock prevents any other transaction from modifying that account’s balance between the check and the update. If the lock cannot be acquired within the statement timeout (default 5 seconds), the transaction fails with a timeout error and the client retries.

What happens if the application crashes after writing the debit entry but before writing the credit entry? The crash happens inside an open database transaction. When the connection closes, the database engine automatically rolls back the uncommitted transaction. Neither the debit entry nor the credit entry is visible to any other transaction. The BEGIN/COMMIT boundary is the atomicity guarantee. This is why both entries must be in the same transaction, not sequential separate transactions.

How do you reconcile the materialized balance against the ledger if they diverge? Run a nightly reconciliation query per account: SELECT SUM(CASE WHEN entry_type='CREDIT' THEN amount ELSE -amount END) AS computed FROM ledger_entries WHERE account_id = ? and compare against SELECT balance FROM balances WHERE account_id = ?. Discrepancies trigger an alert and a recomputation from the authoritative ledger. The balance_after column on each entry lets you binary-search the exact entry where divergence first appeared.

How do you prevent a scenario where money is debited from one account but the credit never arrives in the destination account? Use either distributed transactions (CockroachDB’s cross-range transactions) or the saga pattern. In the saga pattern, the Transaction Service records a PENDING state before writing either entry, writes the debit, then writes the credit. A background saga coordinator checks for transactions stuck in PENDING state after a timeout and runs compensating transactions (a reverse credit to cancel the orphaned debit). Monitoring alerts if any transaction stays in PENDING for more than 30 seconds.

Why not just use an event sourcing framework instead of building this from scratch? Event sourcing frameworks (Axon, EventStore) solve the same problem but add operational complexity. For a financial ledger, the data model is simple enough that building directly on PostgreSQL or CockroachDB with the append-only pattern gives you ACID guarantees, mature operational tooling, SQL queryability for reconciliation, and no dependency on an additional runtime. Event sourcing frameworks add value when the domain is more complex (multiple aggregate types, complex projections) - a pure ledger system does not need them.

Can this design handle cross-currency transactions? Not as described - each wallet account is a single currency. For cross-currency, add a currency conversion leg: the debit comes from account A in currency X, a FX conversion service converts at the mid-market rate, and the credit goes to account B in currency Y. This requires a third ledger entry for the conversion spread (the fee the system charges for FX), making it a three-entry transaction instead of two. The double-entry invariant still holds: debit in X + debit of fee in X = credit in Y at the converted rate.

Interview Questions

Design the balance update mechanism for a merchant account receiving 10,000 micro-payments per second. Expected depth: Optimistic locking breaks down above ~500 concurrent writers on a single row due to retry cascades. The answer is to buffer incoming credits in a Redis sorted set or counter, then flush to the DB in batches every 100-500ms. Each batch produces one ledger entry per account per flush cycle rather than 10,000. This trades sub-second balance accuracy for sustainable write throughput. Discuss the tradeoffs: how stale is acceptable, what happens if Redis loses the unflushed buffer (use a WAL-backed queue like Kafka for durability).

How would you implement a “hold” mechanism where funds are reserved but not yet debited? Expected depth: Add a HOLD ledger entry type. Available balance = total balance - sum(active holds). The hold entry is created atomically with a corresponding HOLD_CREDIT on a system escrow account. When the hold settles, it converts to a real DEBIT and the escrow HOLD_CREDIT reverses. If the hold expires, the HOLD entry is cancelled with a compensating HOLD_REVERSAL entry. This requires updating the balance check to consider both confirmed balance and held amounts - the query must sum both ledger entry types.

Walk me through how you would migrate from a single-database ledger (100GB) to a sharded ledger (16 shards) with zero downtime. Expected depth: The migration has five phases: (1) Add a shard_id column to ledger_entries with a computed value based on account_id. (2) Dual-write: all new inserts go to both the old table and the target shard, verified by a shadow comparison. (3) Backfill historical data shard by shard using an online migration tool that backfills in small batches to avoid locking production traffic. (4) Shift reads shard by shard using feature flags, comparing results against old table. (5) Once reads are fully migrated and verified, stop writing to the old table and decommission it. Each phase should be reversible.

How do you guarantee that a regulatory audit can prove no ledger entries were modified or deleted after creation? Expected depth: Three mechanisms in combination. First, hash chaining: each entry stores sha256(entry_data || previous_entry_hash) - tampering with any entry invalidates all subsequent hashes, detectable by a chain walk. Second, database-level immutability: revoke DELETE and UPDATE permissions on the ledger table for all application roles; only the INSERT role is granted. Third, periodic archival to S3 Object Lock (WORM storage) with a legal hold period that exceeds regulatory retention requirements. The reconciliation job re-verifies the hash chain against the S3 archive daily and alerts on any discrepancy.

What is the correct way to handle a refund in this ledger model? Expected depth: Never modify the original transaction. A refund is a new transaction that reverses the original: the DEBIT and CREDIT entries swap sides. The new transaction references the original txn_id in its reference_id field. The balance effect is immediate. The audit trail shows the original transaction followed by the refund transaction as separate, immutable records - this is what accountants and auditors expect. Partial refunds create an entry for the partial amount, not a modification of the original. The idempotency key for the refund must be distinct from the original transaction’s idempotency key.

Premium Content

Unlock the full article along with everything else in the archive — all in one place.

In-depth analysis Expert insights Full archive access
Unlock Full Article