The Backfill That Changed Business Logic
data-engineering databases reliability
System Design Scenario
The Backfill That Changed Business Logic
Your migration succeeded, but it rewrote financial history using today’s business rules
Friday, 11:47 PM. The migration script finishes successfully. Eight million customer records backfilled. Zero errors logged. All validation checks pass. You deploy to production and go home satisfied that you’ve just fixed years of inconsistent tax calculations.
Monday morning standup. Sarah from Finance raises her hand. “Our quarterly reports are wrong. Revenue numbers don’t match what we filed with the SEC last month. We’re showing $400K more in Q2 than we reported.” She pulls up the comparison. “These orders from March are now showing different tax amounts. Did engineering change something?”
You feel your stomach drop. The backfill script didn’t just fix the tax calculation bug - it recalculated taxes for every historical order using the new tax rules introduced six months ago. Orders from January that were taxed at 8.5% now show 9.25%. International orders that were tax-exempt under the old rules now have VAT applied. Three years of financial history has been silently rewritten to reflect business logic that didn’t exist when those transactions actually occurred.
Like a photocopier that changes every document it scans to use today’s letterhead, your migration has transformed past events to match present rules. The auditors are going to have questions. Finance can’t trust the numbers anymore. You’ve just discovered that in data systems, time travel isn’t just possible - it’s dangerous.
This is the data lineage corruption problem. Historical accuracy versus current correctness creates an impossible choice.
Why This Happens
Data migrations carry an invisible assumption: that fixing data means applying current business logic to past records. The logic seems sound - if the code has a bug, fixing the data means running the corrected code against all affected records. But this conflates two different types of problems.
The smart engineer assumes that data consistency means uniformity across time periods. If tax calculations are wrong, backfill everything with the correct calculation. If commission structures changed, recalculate all historical commissions. The database should reflect “what the numbers should have been” rather than “what they actually were.”
The failure chain is subtle:
Old tax calculation: 8.5% rate
→ Bug discovered: some states missing
→ New calculation: state-specific rates (8.5% to 9.25%)
→ Migration: recalculate ALL historical orders
→ Q1 orders: taxed at 8.5% → now show 9.25%
→ Financial reports: historical quarters change
→ Audit trail: broken, numbers don't match filed reports
→ Compliance: violation, potential legal issues
Business logic changes over time, but historical events occurred under the rules that existed then - backfills that apply current logic create alternate histories.
The Naive Solution (and where it breaks)
Most teams reach for what seems like the obvious fix: run the corrected business logic against all historical data. If the calculation was wrong, recalculate everything. The database should be consistent, and consistency means using the same logic everywhere.
Think of it like editing a novel where you discover the main character’s name was spelled wrong in chapter 3. The natural instinct is to find-and-replace every instance throughout the entire book. But unlike novels, business data exists in real time - chapter 3 happened in March with the original spelling, and changing it retroactively means March never actually happened the way your records now claim.
The naive approach works fine at small scale:
Small scale: 1,000 orders, 6-month history → backfill completes, no one notices discrepancies
Large scale: 8M orders, 3-year history → quarterly reports invalidated, audit failures
The most dangerous backfills are the ones that “succeed” - no errors, all validations pass, but you’ve just rewritten business history using rules that didn’t exist when those events actually occurred.
The Immutable Audit Log Approach
Here’s what actually fixes this: treat historical data as immutable evidence of what happened, not a draft to be revised. When business logic changes, you add new computed columns or derived tables rather than overwriting the original records.
An immutable audit log is like a court stenographer’s record - once an event is transcribed, it never changes, even if the law changes later. New interpretations are added as separate entries, preserving both what happened and how it was understood at different points in time.
-- Original orders table - never modified after creation
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT,
amount DECIMAL(10,2),
tax_amount DECIMAL(10,2), -- calculated with rules at time of order
created_at TIMESTAMP,
business_logic_version INT -- tracks which tax rules were used
);
-- Tax recalculations table - tracks retroactive adjustments
CREATE TABLE tax_adjustments (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id),
original_tax DECIMAL(10,2),
adjusted_tax DECIMAL(10,2),
adjustment_reason TEXT,
logic_version_used INT,
calculated_at TIMESTAMP,
applied_by TEXT
);
The key mechanism here is versioned business logic. Every order records which version of the tax calculation was active when it was created. When you discover bugs or need to apply new rules, you create adjustment records instead of modifying original data.
Stripe handles this by never modifying completed transactions - all changes create new “adjustment” objects that reference the original charge, maintaining a complete audit trail of every financial change.
The Event Sourcing Approach
Event sourcing takes immutability further: instead of storing current state, you store the sequence of events that led to that state. When business logic changes, you can replay events using different projection logic without losing the original event history.
Think of event sourcing like a bank statement - the original transactions never change, but you can calculate different views of your balance using different rules (pending vs cleared, different date ranges, etc.) without modifying the underlying transaction history.
// Event store - immutable history
type OrderCreated struct {
OrderID string
CustomerID string
Amount decimal.Decimal
Items []OrderItem
Timestamp time.Time
EventVersion int
}
// Tax calculation projections - can be rebuilt
type TaxProjection struct {
OrderID string
OriginalTax decimal.Decimal // calculated at order time
CurrentTax decimal.Decimal // recalculated with current rules
ProjectionDate time.Time
RulesVersion int
}
// Rebuild tax projections without touching source events
func RebuildTaxProjections(events []OrderCreated, rulesVersion int) {
for _, event := range events {
// Apply current tax rules to historical order data
currentTax := calculateTax(event, rulesVersion)
// Store as new projection, preserve original
projection := TaxProjection{
OrderID: event.OrderID,
CurrentTax: currentTax,
RulesVersion: rulesVersion,
}
projectionStore.Save(projection)
}
}
Event sourcing allows you to maintain multiple views of the same data - historical accuracy for compliance, current calculations for operational decisions, and reconciliation reports showing the differences.
Amazon’s financial systems use event sourcing to handle tax rule changes across different jurisdictions - original orders are never modified, but tax projections can be recalculated for different reporting requirements.
The Migration Safety Pattern
For teams that can’t implement full event sourcing, a migration safety pattern provides a middle ground. Before running any backfill, create a complete snapshot of the affected data. Then apply changes to copies, not originals.
-- Step 1: Create historical snapshot
CREATE TABLE orders_snapshot_20241219 AS
SELECT * FROM orders
WHERE created_at < '2024-12-19';
-- Step 2: Create corrected view (not replacement)
CREATE TABLE orders_corrected AS
SELECT
o.*,
recalculate_tax(o.amount, o.state, o.created_at) as corrected_tax_amount,
'backfill-20241219' as correction_source
FROM orders o;
-- Step 3: Reconciliation report
CREATE VIEW tax_reconciliation AS
SELECT
original.id,
original.tax_amount as original_tax,
corrected.corrected_tax_amount as new_tax,
(corrected.corrected_tax_amount - original.tax_amount) as difference
FROM orders_snapshot_20241219 original
JOIN orders_corrected corrected ON original.id = corrected.id
WHERE original.tax_amount != corrected.corrected_tax_amount;
This approach maintains data lineage - you can trace every change back to its source and understand exactly what was modified and why.
The safest backfills are the ones that add information rather than replacing it - preserve the original state and layer corrections on top.
The Full Architecture
Here’s how the pieces work together in a production system. When an order is created, it generates an immutable event that’s stored permanently. Current business logic processes that event to create operational projections - tax calculations, reporting summaries, analytics aggregations.
When business logic changes, new projections are created alongside the old ones. The original event never changes. Historical projections remain available for compliance and audit purposes. Reconciliation reports show exactly what changed and why.
The data flows like this: events arrive and are stored immutably in the event store. Multiple projection processors consume these events to create different views. Each processor can be updated independently without affecting others. Audit reports compare projections across different logic versions to track changes over time.
The most important design decision is treating the original event as the single source of truth - all other data is derived and therefore disposable and rebuildable.
Component Deep Dives
Event Store
The event store’s job is permanent, immutable storage of business events as they actually occurred. Every event includes metadata about the business logic version that was active when it was processed.
type EventStore struct {
db *sql.DB
}
// Store event with business logic versioning
func (es *EventStore) AppendEvent(event BusinessEvent) error {
eventData := EventWrapper{
ID: uuid.New(),
EventType: event.Type(),
EventData: event.Serialize(),
BusinessVersion: getCurrentBusinessLogicVersion(),
CreatedAt: time.Now().UTC(),
CreatedBy: getCurrentUser(),
}
// Events are append-only, never updated
_, err := es.db.Exec(`
INSERT INTO events (id, event_type, event_data, business_version, created_at, created_by)
VALUES ($1, $2, $3, $4, $5, $6)
`, eventData.ID, eventData.EventType, eventData.EventData,
eventData.BusinessVersion, eventData.CreatedAt, eventData.CreatedBy)
return err
}
The versioning prevents accidental application of future business logic to historical events. Every event carries the context of when and why it was created.
Projection Builder
The projection builder’s job is creating queryable views from the event stream using specific business logic versions. Multiple projections can coexist, each reflecting different rule sets.
class ProjectionBuilder:
def __init__(self, db_connection, business_rules_version):
self.db = db_connection
self.rules_version = business_rules_version
def rebuild_tax_projections(self, from_date=None):
"""Rebuild tax calculations using specific rules version"""
# Get events to process
events = self.get_events_since(from_date)
# Create projection table for this rules version
projection_table = f"tax_projections_v{self.rules_version}"
self.create_projection_table(projection_table)
for event in events:
if event.event_type == 'OrderCreated':
# Apply business rules that were active at rules_version
tax_amount = self.calculate_tax(event.data, self.rules_version)
# Store projection with lineage metadata
self.db.execute(f"""
INSERT INTO {projection_table}
(order_id, tax_amount, calculated_at, rules_version, source_event_id)
VALUES (%s, %s, %s, %s, %s)
""", [event.order_id, tax_amount, datetime.now(),
self.rules_version, event.id])
This allows you to recalculate historical data using different business rules without losing the original calculations.
Audit Trail Generator
The audit trail generator’s job is creating reconciliation reports that show exactly what changed between different projections. This is crucial for compliance and debugging.
-- Audit trail view comparing projections across versions
CREATE VIEW audit_trail AS
SELECT
e.id as event_id,
e.created_at as original_date,
e.business_version as original_rules_version,
old_proj.tax_amount as original_calculation,
new_proj.tax_amount as recalculated_amount,
(new_proj.tax_amount - old_proj.tax_amount) as difference,
CASE
WHEN ABS(new_proj.tax_amount - old_proj.tax_amount) > 0.01
THEN 'MATERIAL_CHANGE'
ELSE 'IMMATERIAL'
END as impact_level
FROM events e
JOIN tax_projections_v1 old_proj ON e.id = old_proj.source_event_id
JOIN tax_projections_v2 new_proj ON e.id = new_proj.source_event_id
WHERE old_proj.tax_amount != new_proj.tax_amount
ORDER BY ABS(new_proj.tax_amount - old_proj.tax_amount) DESC;
The audit trail shows not just what changed, but why it changed and when the original calculation was performed.
Data Lineage Tracker
The lineage tracker’s job is maintaining the complete chain of custody for every piece of data. It records every transformation, recalculation, and projection creation with full traceability.
# Data lineage configuration
lineage_tracking:
enabled: true
track_transformations: true
include_business_logic_versions: true
# Track these operations
tracked_operations:
- projection_creation
- backfill_execution
- business_rule_changes
- data_corrections
# Lineage metadata stored with every operation
metadata_fields:
- source_system
- transformation_logic_version
- execution_timestamp
- executing_user
- business_justification
This ensures that every data point can be traced back to its ultimate source and every transformation can be explained and audited.
Comparison Table
| Approach | Write Complexity | Read Complexity | Storage Cost | Failure Modes | Best Use Case |
|---|---|---|---|---|---|
| Direct modification | Low - single UPDATE | Low - direct queries | Low - no duplication | Historical accuracy lost, audit failures | Never - always dangerous |
| Immutable audit logs | Medium - original + adjustments | Medium - JOINs required | Medium - 2x storage | Adjustment logic bugs | Financial systems, compliance |
| Event sourcing | High - event modeling required | High - projection queries | High - all events stored | Projection rebuild complexity | High-compliance, complex domains |
| Migration safety | Medium - snapshot + corrections | Medium - comparison views | High - multiple copies | Snapshot management overhead | One-time migrations, legacy systems |
| Versioned projections | High - version management | Low - standard queries | Medium - projection per version | Version proliferation | Multi-tenant, evolving business rules |
For most teams dealing with financial or compliance-critical data, immutable audit logs provide the best balance of safety and complexity. Event sourcing is worth the investment for domains with complex business rules that change frequently.
Key Takeaways
• Data immutability: Historical records should reflect what actually happened, not what you wish had happened using current business logic.
• Business logic versioning: Every calculation should be tagged with the version of rules that produced it, enabling accurate reconstruction of historical contexts.
• Audit trail preservation: Changes to derived data must maintain complete lineage back to original sources with justification for every transformation.
• Event sourcing benefits: Storing events rather than state allows unlimited recalculation without losing historical accuracy.
• Migration safety patterns: Always snapshot original data before backfills and create correction views rather than replacements.
• Compliance requirements: Financial and regulatory systems often legally require preserving original calculations as they were computed at transaction time.
The best data systems design for the historian, not just the operator. Your future auditors will thank you for preserving the complete story of how your data evolved, not just its current state. Time travel in databases should add context, not rewrite history.
Frequently Asked Questions
Q: What if the original business logic was clearly wrong and we need to fix historical reports?
A: Create adjustment records that document the correction while preserving the original calculations. Use a pattern like original_amount and adjusted_amount columns with adjustment_reason metadata. This maintains audit trails while providing corrected figures for reporting.
Q: How do you handle event sourcing when events themselves need to be corrected?
A: Never modify events - instead, emit corrective events. If Order #123 had the wrong amount, emit an OrderAmountCorrected event rather than changing the original OrderCreated event. Projections can then apply corrections during rebuild.
Q: What about GDPR right-to-be-forgotten requirements with immutable logs?
A: Use cryptographic erasure - encrypt personal data with user-specific keys and delete the keys instead of the events. The events remain immutable but the personal data becomes permanently unrecoverable.
Q: How do you prevent projection drift between different business logic versions?
A: Implement automated reconciliation jobs that compare projections and alert on unexpected differences. Define tolerance thresholds (e.g., tax differences > $0.01) and flag material variances for manual review.
Q: What’s the performance impact of maintaining multiple projections?
A: Storage costs increase linearly with projection count, but query performance can actually improve since each projection is optimized for specific use cases. The bigger cost is operational complexity - more projections mean more monitoring and maintenance.
Interview Questions
Q: Design a system to handle tax calculation changes across different jurisdictions while maintaining audit compliance.
Expected depth: Discuss event sourcing vs audit logs, versioned business rules, projection strategies, compliance requirements for tax systems, and handling retroactive tax law changes.
Q: How would you implement a safe backfill for correcting 10M financial records without losing historical accuracy?
Expected depth: Cover snapshot strategies, incremental processing, rollback plans, audit trail preservation, reconciliation reporting, and business stakeholder communication.
Q: Your company is acquired and needs to merge two different accounting systems with incompatible business logic. How do you preserve data lineage?
Expected depth: Address schema mapping, business rule reconciliation, dual-projection strategies, migration staging, and regulatory compliance during system consolidation.
Q: Design a system where machine learning models update pricing rules monthly, but historical pricing decisions must remain auditable.
Expected depth: Discuss model versioning, prediction reproducibility, A/B testing preservation, feature drift tracking, and explainable AI requirements for audit purposes.
Q: How do you handle cascading backfills when correcting upstream data affects multiple downstream systems?
Expected depth: Cover dependency graphs, change propagation strategies, consistency boundaries, eventual consistency handling, and cross-system reconciliation protocols.
Want to see how these patterns hold up when traffic spikes 50x at 3 AM? That's exactly what this Premium deep-dive covers.