The Backfill That Changed Business Logic


data-engineering databases reliability

System Design Scenario

The Backfill That Changed Business Logic

Your migration succeeded, but it rewrote financial history using today’s business rules

⏱ 12 min read📐 Intermediate🔒 Data Engineering

Friday, 11:47 PM. The migration script finishes successfully. Eight million customer records backfilled. Zero errors logged. All validation checks pass. You deploy to production and go home satisfied that you’ve just fixed years of inconsistent tax calculations.

Monday morning standup. Sarah from Finance raises her hand. “Our quarterly reports are wrong. Revenue numbers don’t match what we filed with the SEC last month. We’re showing $400K more in Q2 than we reported.” She pulls up the comparison. “These orders from March are now showing different tax amounts. Did engineering change something?”

You feel your stomach drop. The backfill script didn’t just fix the tax calculation bug - it recalculated taxes for every historical order using the new tax rules introduced six months ago. Orders from January that were taxed at 8.5% now show 9.25%. International orders that were tax-exempt under the old rules now have VAT applied. Three years of financial history has been silently rewritten to reflect business logic that didn’t exist when those transactions actually occurred.

Like a photocopier that changes every document it scans to use today’s letterhead, your migration has transformed past events to match present rules. The auditors are going to have questions. Finance can’t trust the numbers anymore. You’ve just discovered that in data systems, time travel isn’t just possible - it’s dangerous.

This is the data lineage corruption problem. Historical accuracy versus current correctness creates an impossible choice.

Why This Happens

Data migrations carry an invisible assumption: that fixing data means applying current business logic to past records. The logic seems sound - if the code has a bug, fixing the data means running the corrected code against all affected records. But this conflates two different types of problems.

The smart engineer assumes that data consistency means uniformity across time periods. If tax calculations are wrong, backfill everything with the correct calculation. If commission structures changed, recalculate all historical commissions. The database should reflect “what the numbers should have been” rather than “what they actually were.”

The failure chain is subtle:

Old tax calculation: 8.5% rate
  → Bug discovered: some states missing
    → New calculation: state-specific rates (8.5% to 9.25%)
      → Migration: recalculate ALL historical orders
        → Q1 orders: taxed at 8.5% → now show 9.25%
          → Financial reports: historical quarters change
            → Audit trail: broken, numbers don't match filed reports
              → Compliance: violation, potential legal issues
Key Insight

Business logic changes over time, but historical events occurred under the rules that existed then - backfills that apply current logic create alternate histories.

The Naive Solution (and where it breaks)

Most teams reach for what seems like the obvious fix: run the corrected business logic against all historical data. If the calculation was wrong, recalculate everything. The database should be consistent, and consistency means using the same logic everywhere.

Think of it like editing a novel where you discover the main character’s name was spelled wrong in chapter 3. The natural instinct is to find-and-replace every instance throughout the entire book. But unlike novels, business data exists in real time - chapter 3 happened in March with the original spelling, and changing it retroactively means March never actually happened the way your records now claim.

Naive backfill approach showing historical data being overwritten with new business logic

The naive approach works fine at small scale:

Small scale: 1,000 orders, 6-month history → backfill completes, no one notices discrepancies
Large scale: 8M orders, 3-year history → quarterly reports invalidated, audit failures
Watch Out

The most dangerous backfills are the ones that “succeed” - no errors, all validations pass, but you’ve just rewritten business history using rules that didn’t exist when those events actually occurred.

The Immutable Audit Log Approach

Here’s what actually fixes this: treat historical data as immutable evidence of what happened, not a draft to be revised. When business logic changes, you add new computed columns or derived tables rather than overwriting the original records.

An immutable audit log is like a court stenographer’s record - once an event is transcribed, it never changes, even if the law changes later. New interpretations are added as separate entries, preserving both what happened and how it was understood at different points in time.

-- Original orders table - never modified after creation
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_id INT,
  amount DECIMAL(10,2),
  tax_amount DECIMAL(10,2), -- calculated with rules at time of order
  created_at TIMESTAMP,
  business_logic_version INT -- tracks which tax rules were used
);

-- Tax recalculations table - tracks retroactive adjustments
CREATE TABLE tax_adjustments (
  id SERIAL PRIMARY KEY,
  order_id INT REFERENCES orders(id),
  original_tax DECIMAL(10,2),
  adjusted_tax DECIMAL(10,2),
  adjustment_reason TEXT,
  logic_version_used INT,
  calculated_at TIMESTAMP,
  applied_by TEXT
);

The key mechanism here is versioned business logic. Every order records which version of the tax calculation was active when it was created. When you discover bugs or need to apply new rules, you create adjustment records instead of modifying original data.

Immutable audit log approach showing original data preserved with separate adjustment records
Real World

Stripe handles this by never modifying completed transactions - all changes create new “adjustment” objects that reference the original charge, maintaining a complete audit trail of every financial change.

The Event Sourcing Approach

Event sourcing takes immutability further: instead of storing current state, you store the sequence of events that led to that state. When business logic changes, you can replay events using different projection logic without losing the original event history.

Think of event sourcing like a bank statement - the original transactions never change, but you can calculate different views of your balance using different rules (pending vs cleared, different date ranges, etc.) without modifying the underlying transaction history.

// Event store - immutable history
type OrderCreated struct {
    OrderID      string
    CustomerID   string
    Amount       decimal.Decimal
    Items        []OrderItem
    Timestamp    time.Time
    EventVersion int
}

// Tax calculation projections - can be rebuilt
type TaxProjection struct {
    OrderID        string
    OriginalTax    decimal.Decimal  // calculated at order time
    CurrentTax     decimal.Decimal  // recalculated with current rules
    ProjectionDate time.Time
    RulesVersion   int
}

// Rebuild tax projections without touching source events
func RebuildTaxProjections(events []OrderCreated, rulesVersion int) {
    for _, event := range events {
        // Apply current tax rules to historical order data
        currentTax := calculateTax(event, rulesVersion)
        
        // Store as new projection, preserve original
        projection := TaxProjection{
            OrderID:      event.OrderID,
            CurrentTax:   currentTax,
            RulesVersion: rulesVersion,
        }
        projectionStore.Save(projection)
    }
}

Event sourcing allows you to maintain multiple views of the same data - historical accuracy for compliance, current calculations for operational decisions, and reconciliation reports showing the differences.

Real World

Amazon’s financial systems use event sourcing to handle tax rule changes across different jurisdictions - original orders are never modified, but tax projections can be recalculated for different reporting requirements.

The Migration Safety Pattern

For teams that can’t implement full event sourcing, a migration safety pattern provides a middle ground. Before running any backfill, create a complete snapshot of the affected data. Then apply changes to copies, not originals.

-- Step 1: Create historical snapshot
CREATE TABLE orders_snapshot_20241219 AS 
SELECT * FROM orders 
WHERE created_at < '2024-12-19';

-- Step 2: Create corrected view (not replacement)
CREATE TABLE orders_corrected AS
SELECT 
  o.*,
  recalculate_tax(o.amount, o.state, o.created_at) as corrected_tax_amount,
  'backfill-20241219' as correction_source
FROM orders o;

-- Step 3: Reconciliation report
CREATE VIEW tax_reconciliation AS
SELECT 
  original.id,
  original.tax_amount as original_tax,
  corrected.corrected_tax_amount as new_tax,
  (corrected.corrected_tax_amount - original.tax_amount) as difference
FROM orders_snapshot_20241219 original
JOIN orders_corrected corrected ON original.id = corrected.id
WHERE original.tax_amount != corrected.corrected_tax_amount;

This approach maintains data lineage - you can trace every change back to its source and understand exactly what was modified and why.

Migration safety pattern showing original data preserved with corrected views and reconciliation reports
Key Insight

The safest backfills are the ones that add information rather than replacing it - preserve the original state and layer corrections on top.

The Full Architecture

Complete data lineage architecture showing event store, immutable records, versioned projections, and audit trails

Here’s how the pieces work together in a production system. When an order is created, it generates an immutable event that’s stored permanently. Current business logic processes that event to create operational projections - tax calculations, reporting summaries, analytics aggregations.

When business logic changes, new projections are created alongside the old ones. The original event never changes. Historical projections remain available for compliance and audit purposes. Reconciliation reports show exactly what changed and why.

The data flows like this: events arrive and are stored immutably in the event store. Multiple projection processors consume these events to create different views. Each processor can be updated independently without affecting others. Audit reports compare projections across different logic versions to track changes over time.

Key Insight

The most important design decision is treating the original event as the single source of truth - all other data is derived and therefore disposable and rebuildable.

Component Deep Dives

Event Store

The event store’s job is permanent, immutable storage of business events as they actually occurred. Every event includes metadata about the business logic version that was active when it was processed.

type EventStore struct {
    db *sql.DB
}

// Store event with business logic versioning
func (es *EventStore) AppendEvent(event BusinessEvent) error {
    eventData := EventWrapper{
        ID:               uuid.New(),
        EventType:        event.Type(),
        EventData:        event.Serialize(),
        BusinessVersion:  getCurrentBusinessLogicVersion(),
        CreatedAt:        time.Now().UTC(),
        CreatedBy:        getCurrentUser(),
    }
    
    // Events are append-only, never updated
    _, err := es.db.Exec(`
        INSERT INTO events (id, event_type, event_data, business_version, created_at, created_by)
        VALUES ($1, $2, $3, $4, $5, $6)
    `, eventData.ID, eventData.EventType, eventData.EventData, 
       eventData.BusinessVersion, eventData.CreatedAt, eventData.CreatedBy)
    
    return err
}

The versioning prevents accidental application of future business logic to historical events. Every event carries the context of when and why it was created.

Projection Builder

The projection builder’s job is creating queryable views from the event stream using specific business logic versions. Multiple projections can coexist, each reflecting different rule sets.

class ProjectionBuilder:
    def __init__(self, db_connection, business_rules_version):
        self.db = db_connection
        self.rules_version = business_rules_version
        
    def rebuild_tax_projections(self, from_date=None):
        """Rebuild tax calculations using specific rules version"""
        
        # Get events to process
        events = self.get_events_since(from_date)
        
        # Create projection table for this rules version
        projection_table = f"tax_projections_v{self.rules_version}"
        self.create_projection_table(projection_table)
        
        for event in events:
            if event.event_type == 'OrderCreated':
                # Apply business rules that were active at rules_version
                tax_amount = self.calculate_tax(event.data, self.rules_version)
                
                # Store projection with lineage metadata
                self.db.execute(f"""
                    INSERT INTO {projection_table} 
                    (order_id, tax_amount, calculated_at, rules_version, source_event_id)
                    VALUES (%s, %s, %s, %s, %s)
                """, [event.order_id, tax_amount, datetime.now(), 
                      self.rules_version, event.id])

This allows you to recalculate historical data using different business rules without losing the original calculations.

Audit Trail Generator

The audit trail generator’s job is creating reconciliation reports that show exactly what changed between different projections. This is crucial for compliance and debugging.

-- Audit trail view comparing projections across versions
CREATE VIEW audit_trail AS
SELECT 
  e.id as event_id,
  e.created_at as original_date,
  e.business_version as original_rules_version,
  old_proj.tax_amount as original_calculation,
  new_proj.tax_amount as recalculated_amount,
  (new_proj.tax_amount - old_proj.tax_amount) as difference,
  CASE 
    WHEN ABS(new_proj.tax_amount - old_proj.tax_amount) > 0.01 
    THEN 'MATERIAL_CHANGE'
    ELSE 'IMMATERIAL'
  END as impact_level
FROM events e
JOIN tax_projections_v1 old_proj ON e.id = old_proj.source_event_id  
JOIN tax_projections_v2 new_proj ON e.id = new_proj.source_event_id
WHERE old_proj.tax_amount != new_proj.tax_amount
ORDER BY ABS(new_proj.tax_amount - old_proj.tax_amount) DESC;

The audit trail shows not just what changed, but why it changed and when the original calculation was performed.

Data Lineage Tracker

The lineage tracker’s job is maintaining the complete chain of custody for every piece of data. It records every transformation, recalculation, and projection creation with full traceability.

# Data lineage configuration
lineage_tracking:
  enabled: true
  track_transformations: true
  include_business_logic_versions: true
  
  # Track these operations
  tracked_operations:
    - projection_creation
    - backfill_execution  
    - business_rule_changes
    - data_corrections
    
  # Lineage metadata stored with every operation
  metadata_fields:
    - source_system
    - transformation_logic_version
    - execution_timestamp
    - executing_user
    - business_justification

This ensures that every data point can be traced back to its ultimate source and every transformation can be explained and audited.

Comparison Table

ApproachWrite ComplexityRead ComplexityStorage CostFailure ModesBest Use Case
Direct modificationLow - single UPDATELow - direct queriesLow - no duplicationHistorical accuracy lost, audit failuresNever - always dangerous
Immutable audit logsMedium - original + adjustmentsMedium - JOINs requiredMedium - 2x storageAdjustment logic bugsFinancial systems, compliance
Event sourcingHigh - event modeling requiredHigh - projection queriesHigh - all events storedProjection rebuild complexityHigh-compliance, complex domains
Migration safetyMedium - snapshot + correctionsMedium - comparison viewsHigh - multiple copiesSnapshot management overheadOne-time migrations, legacy systems
Versioned projectionsHigh - version managementLow - standard queriesMedium - projection per versionVersion proliferationMulti-tenant, evolving business rules

For most teams dealing with financial or compliance-critical data, immutable audit logs provide the best balance of safety and complexity. Event sourcing is worth the investment for domains with complex business rules that change frequently.

Key Takeaways

Data immutability: Historical records should reflect what actually happened, not what you wish had happened using current business logic.

Business logic versioning: Every calculation should be tagged with the version of rules that produced it, enabling accurate reconstruction of historical contexts.

Audit trail preservation: Changes to derived data must maintain complete lineage back to original sources with justification for every transformation.

Event sourcing benefits: Storing events rather than state allows unlimited recalculation without losing historical accuracy.

Migration safety patterns: Always snapshot original data before backfills and create correction views rather than replacements.

Compliance requirements: Financial and regulatory systems often legally require preserving original calculations as they were computed at transaction time.

The best data systems design for the historian, not just the operator. Your future auditors will thank you for preserving the complete story of how your data evolved, not just its current state. Time travel in databases should add context, not rewrite history.

Frequently Asked Questions

Q: What if the original business logic was clearly wrong and we need to fix historical reports?

A: Create adjustment records that document the correction while preserving the original calculations. Use a pattern like original_amount and adjusted_amount columns with adjustment_reason metadata. This maintains audit trails while providing corrected figures for reporting.

Q: How do you handle event sourcing when events themselves need to be corrected?

A: Never modify events - instead, emit corrective events. If Order #123 had the wrong amount, emit an OrderAmountCorrected event rather than changing the original OrderCreated event. Projections can then apply corrections during rebuild.

Q: What about GDPR right-to-be-forgotten requirements with immutable logs?

A: Use cryptographic erasure - encrypt personal data with user-specific keys and delete the keys instead of the events. The events remain immutable but the personal data becomes permanently unrecoverable.

Q: How do you prevent projection drift between different business logic versions?

A: Implement automated reconciliation jobs that compare projections and alert on unexpected differences. Define tolerance thresholds (e.g., tax differences > $0.01) and flag material variances for manual review.

Q: What’s the performance impact of maintaining multiple projections?

A: Storage costs increase linearly with projection count, but query performance can actually improve since each projection is optimized for specific use cases. The bigger cost is operational complexity - more projections mean more monitoring and maintenance.

Interview Questions

Q: Design a system to handle tax calculation changes across different jurisdictions while maintaining audit compliance.

Expected depth: Discuss event sourcing vs audit logs, versioned business rules, projection strategies, compliance requirements for tax systems, and handling retroactive tax law changes.

Q: How would you implement a safe backfill for correcting 10M financial records without losing historical accuracy?

Expected depth: Cover snapshot strategies, incremental processing, rollback plans, audit trail preservation, reconciliation reporting, and business stakeholder communication.

Q: Your company is acquired and needs to merge two different accounting systems with incompatible business logic. How do you preserve data lineage?

Expected depth: Address schema mapping, business rule reconciliation, dual-projection strategies, migration staging, and regulatory compliance during system consolidation.

Q: Design a system where machine learning models update pricing rules monthly, but historical pricing decisions must remain auditable.

Expected depth: Discuss model versioning, prediction reproducibility, A/B testing preservation, feature drift tracking, and explainable AI requirements for audit purposes.

Q: How do you handle cascading backfills when correcting upstream data affects multiple downstream systems?

Expected depth: Cover dependency graphs, change propagation strategies, consistency boundaries, eventual consistency handling, and cross-system reconciliation protocols.

Continue Learning

Want to see how these patterns hold up when traffic spikes 50x at 3 AM? That's exactly what this Premium deep-dive covers.