The Frontend Made 42 API Calls Per Page

System Design Scenario

The Frontend Made 42 API Calls Per Page

When REST meets reality - sequential requests create waterfall delays that drown performance

⏱ 12 min read📐 Intermediate🔒 Performance

The user clicks “Dashboard” at 2:14 PM. Eight seconds later, the page finally loads. Eight seconds. In 2024. The backend team insists their APIs are lightning fast - each endpoint responds in under 50 milliseconds. They’re not wrong. But the frontend is making 42 separate requests, one after another, like a person asking questions in a meeting but waiting for each answer before asking the next one.

It started innocently enough. The user object needed profile data, so one API call. The profile needed subscription status, so another call. The subscription needed payment history, so a third call. Each feature team built their own endpoints, and the frontend dutifully called them all. What began as clean separation of concerns became a waterfall of dependencies that turns a snappy backend into a sluggish user experience.

Think of it like ordering coffee where you have to visit different counters: first the order counter, wait for receipt, then payment counter, wait for payment confirmation, then brewing counter, wait for brewing to start, then pickup counter. Each step is fast, but the serial process is glacial. This is the API waterfall problem.

Why This Happens

The instinct is to think in terms of resources and endpoints - users have profiles, profiles have subscriptions, subscriptions have payments. This creates a natural dependency chain where each API response contains only the ID needed to make the next request.

REST APIs encourage this pattern through resource-oriented design. A user endpoint returns user_id. A profile endpoint requires user_id and returns subscription_id. A subscription endpoint requires subscription_id and returns payment_history_ids. Each endpoint is clean, focused, and fast - but chaining them together creates sequential dependencies.

Page load starts
  -> GET /users/me (50ms)
    -> GET /profiles/{userId} (45ms)  
      -> GET /subscriptions/{profileId} (60ms)
        -> GET /payments/{subscriptionId} (55ms)
          -> GET /notifications/{userId} (40ms)
            -> Total time: 250ms + network overhead = 800ms+

The problem compounds with network latency. Each API call includes not just server processing time, but also TCP handshake, DNS lookup, SSL negotiation, and network transit time. At 100ms round-trip latency, 42 sequential calls take 4.2 seconds in network time alone, before any server processing.

Key Insight

API response time and page load time are different metrics - sequential calls multiply latency, not add it.

The Naive Solution (and where it breaks)

Most teams reach for parallel requests or client-side caching. The thinking is that making calls simultaneously instead of sequentially will reduce the total time.

Parallel requests help when dependencies allow it, but many calls have genuine dependencies. You can’t fetch subscription details until you have the user profile. You can’t load payment history until you have the subscription ID. The dependency chain forces serialization regardless of your parallel execution strategy.

Naive approach showing parallel requests still blocked by dependencies

Client-side caching seems smarter - cache API responses and reuse them across page loads. But caching doesn’t help the initial page load, which is where users form their performance perception. Additionally, cache invalidation becomes complex when you have 42 different endpoints with different update frequencies.

Watch Out

Parallel requests only help when calls are truly independent - dependency chains still serialize.

Small scale: 5 API calls -> parallelization works
Large scale: 42 API calls with dependencies -> still serial

The Better Solution - Backend for Frontend (BFF)

Here’s what actually fixes this: create a Backend for Frontend service that aggregates all the data needed for a specific page into a single API call. The BFF is like having a personal assistant who gathers all the information you need for a meeting before you arrive, rather than you collecting it piece by piece.

The BFF sits between your frontend and your microservices. It makes all the individual service calls in parallel on the backend, aggregates the responses, and returns a single comprehensive response to the frontend.

// BFF endpoint for dashboard page
app.get('/api/bff/dashboard', async (req, res) => {
  const userId = req.user.id;
  
  // Make all backend calls in parallel
  const [user, profile, subscription, payments, notifications, preferences] = await Promise.all([
    userService.getUser(userId),
    profileService.getProfile(userId), 
    subscriptionService.getSubscription(userId),
    paymentService.getPaymentHistory(userId),
    notificationService.getNotifications(userId),
    preferenceService.getPreferences(userId)
  ]);
  
  // Aggregate into single response optimized for frontend
  res.json({
    dashboard: {
      user: {
        name: user.name,
        avatar: user.avatar,
        memberSince: user.createdAt
      },
      subscription: {
        plan: subscription.plan,
        status: subscription.status,
        renewsAt: subscription.renewsAt,
        features: subscription.features
      },
      billing: {
        nextPayment: payments.upcoming,
        paymentMethod: payments.defaultMethod,
        invoiceHistory: payments.recent
      },
      activity: {
        unreadCount: notifications.unreadCount,
        recentNotifications: notifications.recent.slice(0, 5)
      }
    }
  });
});

Real World

Netflix uses BFF extensively - their mobile apps make 1-3 API calls per screen instead of dozens of individual service calls.

The Better Solution - GraphQL

For dynamic data requirements, GraphQL provides query-level control over data fetching. Instead of multiple REST endpoints, the client specifies exactly what data it needs in a single query.

# Single GraphQL query replacing 42 REST calls
query DashboardData {
  user {
    name
    avatar
    profile {
      subscription {
        plan
        status
        payments(limit: 5) {
          amount
          date
          method
        }
      }
    }
    notifications(unreadOnly: true) {
      count
      recent(limit: 3) {
        message
        timestamp
      }
    }
  }
}

GraphQL resolvers handle the backend coordination, making parallel calls to underlying services and returning exactly the data requested. The frontend gets everything it needs in one round trip.

GraphQL approach showing single query replacing multiple REST calls

GraphQL also enables query batching and caching at the field level. If multiple queries request the same user data, GraphQL can deduplicate the backend calls automatically.

The Better Solution - Request Batching

For existing REST APIs, implement request batching to send multiple API calls in a single HTTP request. The backend processes all requests in parallel and returns all responses together.

// Batch API endpoint
app.post('/api/batch', async (req, res) => {
  const { requests } = req.body;
  
  // Process all requests in parallel
  const responses = await Promise.allSettled(
    requests.map(async (request) => {
      try {
        const result = await this.processRequest(request);
        return { id: request.id, data: result, error: null };
      } catch (error) {
        return { id: request.id, data: null, error: error.message };
      }
    })
  );
  
  res.json({ responses });
});

// Frontend usage
const batchResponse = await fetch('/api/batch', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    requests: [
      { id: 'user', method: 'GET', url: '/users/me' },
      { id: 'profile', method: 'GET', url: '/profiles/123' },
      { id: 'subscription', method: 'GET', url: '/subscriptions/456' },
      { id: 'payments', method: 'GET', url: '/payments/789' }
    ]
  })
});

Key Insight

Request batching preserves REST API design while eliminating network round-trip multiplication.

The Full Architecture

Complete BFF architecture with GraphQL, batching, and caching layers

The complete system combines multiple strategies for different use cases. The BFF layer handles page-specific data aggregation for common screens. GraphQL provides flexible querying for dynamic requirements. Request batching optimizes existing REST endpoints without migration overhead. Edge caching reduces latency for frequently accessed aggregated data.

The frontend makes intelligent decisions about which approach to use. Static page loads use BFF endpoints. Dynamic dashboards use GraphQL. Legacy screens use request batching. The result is one network call per screen instead of dozens, with sub-second load times regardless of the number of backend services involved.

This architecture recognizes that frontend performance is about minimizing network calls, not maximizing backend service granularity. Microservices can remain focused and independent while the BFF layer provides frontend-optimized data aggregation.

Key Insight

The BFF pattern decouples frontend performance from backend service architecture by adding an aggregation layer.

Component Deep Dives

BFF Service

The BFF service’s job is to efficiently aggregate data from multiple backend services while handling failures gracefully. It needs to be faster than making individual calls and more reliable than the sum of its dependencies.

// BFF with circuit breaker and parallel processing
type BFFService struct {
    userService    *ServiceClient
    profileService *ServiceClient
    circuitBreaker *CircuitBreaker
}

func (b *BFFService) GetDashboardData(ctx context.Context, userID string) (*DashboardData, error) {
    // Use circuit breaker for each service call
    type serviceCall struct {
        name string
        fn   func() (interface{}, error)
    }
    
    calls := []serviceCall{
        {"user", func() (interface{}, error) { 
            return b.circuitBreaker.Call("user", func() (interface{}, error) {
                return b.userService.GetUser(ctx, userID)
            })
        }},
        {"profile", func() (interface{}, error) {
            return b.circuitBreaker.Call("profile", func() (interface{}, error) {
                return b.profileService.GetProfile(ctx, userID) 
            })
        }},
    }
    
    // Execute all calls concurrently with timeout
    results := make(map[string]interface{})
    errors := make(map[string]error)
    
    var wg sync.WaitGroup
    for _, call := range calls {
        wg.Add(1)
        go func(c serviceCall) {
            defer wg.Done()
            result, err := c.fn()
            if err != nil {
                errors[c.name] = err
            } else {
                results[c.name] = result
            }
        }(call)
    }
    
    wg.Wait()
    
    // Build response with available data, graceful degradation for failures
    return b.buildDashboardResponse(results, errors), nil
}

The BFF handles partial failures gracefully - if the notification service is down, the dashboard still loads with placeholder notification data rather than failing entirely.

GraphQL Resolver Layer

The GraphQL resolver layer’s job is to efficiently resolve nested queries while avoiding the N+1 problem. It uses dataloader patterns to batch related queries automatically.

// GraphQL resolver with dataloader for N+1 prevention
const DataLoader = require('dataloader');

class GraphQLResolvers {
  constructor() {
    // Create batched loaders for common access patterns
    this.userLoader = new DataLoader(async (userIds) => {
      const users = await userService.getUsersByIds(userIds);
      return userIds.map(id => users.find(user => user.id === id));
    });
    
    this.subscriptionLoader = new DataLoader(async (userIds) => {
      const subscriptions = await subscriptionService.getSubscriptionsByUserIds(userIds);
      return userIds.map(id => subscriptions.find(sub => sub.userId === id));
    });
  }
  
  resolvers = {
    User: {
      subscription: (user) => {
        // Uses dataloader to batch subscription queries
        return this.subscriptionLoader.load(user.id);
      },
    },
    
    Subscription: {
      payments: async (subscription, { limit = 10 }) => {
        return paymentService.getPayments(subscription.id, limit);
      }
    },
    
    Query: {
      user: (_, { id }) => {
        return this.userLoader.load(id);
      }
    }
  };
}

DataLoader automatically batches multiple resolver requests into single database queries, preventing the N+1 query problem that often makes GraphQL slower than REST.

Request Batch Processor

The batch processor’s job is to handle multiple REST requests efficiently while maintaining individual request semantics like authentication, validation, and error handling.

# Efficient request batch processing
class BatchProcessor:
    def __init__(self, max_batch_size=50, timeout_seconds=10):
        self.max_batch_size = max_batch_size
        self.timeout = timeout_seconds
        
    async def process_batch(self, requests, user_context):
        if len(requests) > self.max_batch_size:
            raise ValueError(f"Batch size {len(requests)} exceeds limit {self.max_batch_size}")
        
        # Group requests by service to optimize backend calls
        requests_by_service = defaultdict(list)
        for request in requests:
            service = self.extract_service_from_url(request['url'])
            requests_by_service[service].append(request)
        
        # Process each service group concurrently
        all_results = {}
        async with asyncio.TaskGroup() as group:
            for service, service_requests in requests_by_service.items():
                task = group.create_task(
                    self.process_service_batch(service, service_requests, user_context)
                )
                all_results[service] = task
        
        # Combine results maintaining original request order
        results = []
        for request in requests:
            service = self.extract_service_from_url(request['url'])
            service_results = await all_results[service]
            result = service_results.get(request['id'])
            results.append(result)
            
        return results

The batch processor groups requests by backend service to minimize the number of individual service calls while maintaining the flexibility of REST API design.

Caching Layer

The caching layer’s job is to reduce redundant backend calls across requests and users while maintaining data consistency. It implements multiple cache levels with different TTLs.

# Redis caching strategy for BFF responses
# User-specific cache (5 minutes)
SETEX user:123:dashboard 300 "{dashboard data}"

# Shared resource cache (1 hour) 
SETEX subscription:plan:premium 3600 "{plan details}"

# Static reference data (24 hours)
SETEX features:all 86400 "{feature list}"

The caching layer uses different TTLs based on data volatility - user-specific data expires quickly, shared configuration data expires slower, and static reference data caches for hours.

Comparison Table

Approach	Request Count	Network Latency	Backend Load	Implementation Effort	Flexibility	Best Use Case
Sequential REST	42 requests	4.2s+	High (many calls)	Low	High	Never recommended
Parallel REST	42 requests	800ms+	High (many calls)	Medium	High	Independent requests only
BFF Aggregation	1 request	200ms	Medium (parallel)	Medium	Medium	Page-specific optimization
GraphQL	1 request	150ms	Medium (batched)	High	Very High	Dynamic data requirements
Request Batching	1 request	250ms	Medium (grouped)	Low	High	Legacy REST optimization

GraphQL wins for flexibility and performance, but BFF aggregation provides the best balance of performance improvement and implementation simplicity for most teams.

Key Takeaways

Network round trips are the primary performance bottleneck, not individual API response times
BFF pattern decouples frontend performance from microservice granularity without changing backend architecture
GraphQL resolvers enable flexible data fetching while solving N+1 query problems through batching
Request batching preserves REST API semantics while eliminating waterfall delays
Parallel processing only helps when API calls are truly independent - dependencies force serialization
Circuit breakers in aggregation layers prevent cascading failures from degrading the entire user experience
Caching strategies must account for different data volatility patterns across aggregated responses
Error handling in aggregation layers should provide graceful degradation rather than all-or-nothing failures

The counterintuitive lesson: more API calls don’t necessarily mean worse performance if they’re made intelligently. The problem isn’t having 42 API endpoints - it’s making 42 sequential network calls to populate a single page. Good architecture separates the logical data model from the network call strategy.

Frequently Asked Questions

Q: Doesn’t the BFF create a new single point of failure?
A: BFF services should be stateless and horizontally scalable. Deploy multiple BFF instances and use circuit breakers to degrade gracefully when backend services fail. The risk is lower than 42 individual API calls, each of which could fail.

Q: How do we handle real-time updates when data is aggregated in the BFF?
A: Use WebSocket connections or Server-Sent Events from the BFF to push updates. The BFF can subscribe to backend service events and push relevant changes to connected frontends, maintaining real-time behavior with aggregated data.

Q: What if different pages need different combinations of the same data?
A: Create page-specific BFF endpoints rather than trying to build a generic aggregation API. Each endpoint should be optimized for its specific use case. Some duplication across endpoints is acceptable for performance.

Q: How do we handle authentication and authorization in BFF services?
A: The BFF authenticates the user and passes user context to backend services. Use service-to-service authentication (JWT, mutual TLS) for BFF-to-service calls. The BFF acts as a trusted intermediary that can make calls on behalf of the authenticated user.

Q: Can we use HTTP/2 multiplexing instead of reducing API calls?
A: HTTP/2 multiplexing helps with connection overhead but doesn’t eliminate the dependency chain problem. You still wait for response A before you can make request B. Multiplexing optimizes independent parallel requests, not sequential dependent requests.

Q: How do we maintain data consistency when aggregating from multiple services?
A: Accept eventual consistency for most use cases - aggregated data reflects a point-in-time snapshot. For strong consistency requirements, use distributed transactions or event sourcing, but these add significant complexity.

Interview Questions

Q: Design a system to reduce page load time from 8 seconds to under 1 second when the page requires data from 42 different microservices.
Expected depth: Discuss BFF patterns, GraphQL, request batching, dependency analysis, parallel processing strategies, and caching approaches. Cover the tradeoffs between aggregation complexity and performance gains.

Q: How would you handle partial failures when aggregating data from 20 backend services in a single BFF call?
Expected depth: Explain circuit breakers, graceful degradation, timeout strategies, fallback data, and user experience considerations when some services are unavailable.

Q: Your GraphQL resolver is making 100+ database queries for a single page load. How do you optimize this?
Expected depth: Discuss the N+1 problem, DataLoader pattern, query batching, database query optimization, and resolver-level caching strategies.

Q: Design an API aggregation strategy that works for both web and mobile clients with different data requirements.
Expected depth: Cover client-specific BFF endpoints, GraphQL schema design, request customization, response payload optimization, and maintaining a single backend service layer.

Q: How do you implement request batching for an existing REST API without breaking existing clients?
Expected depth: Explain versioning strategies, backward compatibility, batch endpoint design, error handling for mixed success/failure scenarios, and migration approaches for existing integrations.

Premium Content

Unlock the full article along with everything else in the archive — all in one place.

In-depth analysis Expert insights Full archive access

Unlock Full Article