API Gateway Patterns: The Front Door to Your Microservices


You have 20 microservices. Each one handles authentication differently. Each one has its own rate limiting logic. Each one logs requests in a different format. Mobile clients make 15 API calls to render one screen. You are duplicating cross-cutting concerns across every service, and your clients are chatty.

An API gateway solves both problems: it centralizes cross-cutting concerns and provides a single entry point that aggregates and transforms requests for clients.

What an API gateway actually does

An API gateway sits between clients and your backend services. Every request from external clients goes through the gateway. The gateway handles:

Routing - Forward requests to the right backend service based on URL path, headers, or other criteria.

Authentication and authorization - Validate tokens, check permissions. Backend services trust requests that have passed through the gateway.

Rate limiting - Enforce per-client, per-endpoint, or global rate limits.

Request/response transformation - Modify headers, translate protocols (REST to gRPC), aggregate multiple service responses.

SSL termination - Decrypt HTTPS at the gateway. Backend services communicate over plain HTTP internally.

Observability - Centralized logging, metrics, and distributed tracing for all API traffic.

Caching - Cache responses for frequently requested, rarely changing data.

graph TB
subgraph clients["Clients"]
  MOB["Mobile app"]
  WEB["Web browser"]
  EXT["Third-party API"]
end

subgraph gateway["API Gateway"]
  AUTH["Auth validation"]
  RL["Rate limiting"]
  ROUTE["Request routing"]
  TRANS["Protocol translation"]
  LOG["Logging and tracing"]
end

subgraph services["Backend Services"]
  US["User service"]
  OS["Order service"]
  PS["Product service"]
  NS["Notification service"]
end

MOB --> gateway
WEB --> gateway
EXT --> gateway
gateway --> US
gateway --> OS
gateway --> PS
gateway --> NS

style gateway fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style US fill:#E1F5EE,stroke:#0F6E56,color:#085041
style OS fill:#E1F5EE,stroke:#0F6E56,color:#085041
style PS fill:#E1F5EE,stroke:#0F6E56,color:#085041
style NS fill:#E1F5EE,stroke:#0F6E56,color:#085041

Gateway patterns

Simple reverse proxy

The most basic pattern: the gateway forwards requests to backend services based on URL path.

/api/users/* -> user-service:8080
/api/orders/* -> order-service:8081
/api/products/* -> product-service:8082

This is what nginx and Traefik do out of the box. Simple, fast, but no application-level features.

Backend for Frontend (BFF)

A BFF is a gateway tailored to a specific client type. Instead of one generic gateway, you have:

  • A BFF for the mobile app (returns compact responses, aggregates data)
  • A BFF for the web app (returns richer responses, different data shape)
  • A BFF for third-party partners (different auth, different rate limits)

Each BFF is owned by the team that owns the client. They can evolve independently.

When to use BFF: When different clients have significantly different data needs. When mobile clients need to minimize round trips. When you want client teams to own their API layer.

Aggregation gateway

The gateway calls multiple backend services and combines the results into a single response. The client makes one request; the gateway makes several.

GET /home-screen
-> GET /users/123 (user service)
-> GET /orders?user=123&limit=5 (order service)
-> GET /recommendations?user=123 (recommendation service)
-> Combine and return

This reduces client round trips and moves the aggregation logic to the server side.

Tradeoff: The gateway becomes more complex. If one backend service is slow, the entire response is slow. Use parallel requests and timeouts to mitigate this.

Protocol translation

The gateway accepts REST from external clients and translates to gRPC for internal services. Or accepts HTTP/1.1 and translates to HTTP/2. Or accepts JSON and translates to Protobuf.

This lets you use the best protocol for each context: REST for external compatibility, gRPC for internal performance.

graph LR
subgraph bff["BFF Pattern"]
  MOB2["Mobile app"] -->|"REST/JSON
compact"| MBFF["Mobile BFF
Aggregates
Compresses"]
  WEB2["Web app"] -->|"REST/JSON
rich"| WBFF["Web BFF
Full data"]
  MBFF -->|"gRPC"| SVC["Backend
Services"]
  WBFF -->|"gRPC"| SVC
end

style MBFF fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style WBFF fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style SVC fill:#E1F5EE,stroke:#0F6E56,color:#085041

Where it breaks or gets interesting

The gateway as a single point of failure

The gateway handles all traffic. If it goes down, everything goes down. Mitigate with:

  • Multiple gateway instances behind a load balancer
  • Circuit breakers to prevent cascading failures
  • Health checks and automatic failover
  • Graceful degradation (return cached responses when backends are down)

The fat gateway anti-pattern

Gateways should handle cross-cutting concerns, not business logic. If your gateway is making business decisions (calculating prices, applying discounts, validating business rules), you have a fat gateway. Business logic belongs in services.

Signs of a fat gateway:

  • The gateway has its own database
  • The gateway contains domain-specific code
  • Changes to business rules require gateway deployments

Latency addition

Every request through the gateway adds latency: the gateway must receive the request, process it (auth, rate limiting, routing), and forward it. This is typically 1-5ms for a well-implemented gateway. For latency-sensitive applications, this matters.

Optimize: keep gateway logic simple, use efficient auth (JWT validation is fast, database lookups are slow), use connection pooling to backends.

Service mesh vs API gateway

A service mesh (Istio, Linkerd) handles service-to-service communication inside the cluster. An API gateway handles external-to-internal communication. They are complementary, not alternatives.

  • API gateway: North-south traffic (external clients to services)
  • Service mesh: East-west traffic (service to service)

Both can handle auth, rate limiting, and observability, but at different layers.

Real-world systems

AWS API Gateway - Managed gateway service. Integrates with Lambda, ECS, and other AWS services. Supports REST, HTTP, and WebSocket APIs. Built-in auth (Cognito, Lambda authorizers), rate limiting, and caching.

Kong - Open-source API gateway built on nginx. Plugin architecture for auth, rate limiting, logging, and more. Supports REST and gRPC.

Envoy - High-performance proxy used as the data plane in service meshes and as a standalone gateway. Supports advanced features: circuit breaking, retries, distributed tracing.

nginx - Widely used as a simple reverse proxy and gateway. Fast, configurable, but requires more manual configuration for application-level features.

Traefik - Cloud-native reverse proxy with automatic service discovery. Integrates with Docker, Kubernetes, and Consul. Good for dynamic environments.

GraphQL as a gateway - Some teams use a GraphQL server as their API gateway. The GraphQL server aggregates data from multiple REST or gRPC services. Clients get a unified, flexible API.

How to apply it in practice

What belongs in the gateway

Yes:

  • Authentication (validate JWT, check API keys)
  • Authorization (check if the token has permission for this endpoint)
  • Rate limiting
  • SSL termination
  • Request logging and distributed tracing
  • Response caching for public endpoints
  • CORS headers
  • Request/response transformation (add/remove headers)
  • Protocol translation

No:

  • Business logic
  • Database queries (except for auth lookups)
  • Complex data aggregation (use a BFF instead)
  • Service-specific validation

Auth at the gateway

The gateway validates the token (JWT signature check, expiry check). It extracts the user ID and passes it to backend services as a trusted header (X-User-ID: 123). Backend services trust this header without re-validating the token.

This centralizes auth logic and reduces latency (no database lookup per service). The tradeoff: if the gateway is compromised, all services are compromised. Use mTLS between the gateway and backend services to prevent direct access.

Timeouts and circuit breakers

Configure timeouts for every backend service. If the order service takes more than 500ms, return a timeout error rather than waiting indefinitely. Use circuit breakers to stop sending requests to a failing service and return a fallback response.

FAQ

Q: Should every microservice have its own API gateway?

No. One gateway (or one per client type with BFF) is the standard. Multiple gateways add complexity without benefit. The exception: if you have completely separate products with different auth systems and different client bases, separate gateways make sense.

Q: What is the difference between an API gateway and a load balancer?

A load balancer distributes traffic across multiple instances of the same service. An API gateway routes traffic to different services based on the request content. A load balancer operates at L4 (TCP) or L7 (HTTP). An API gateway always operates at L7 and adds application-level features (auth, rate limiting, transformation). In practice, you often have both: a load balancer in front of the gateway (for HA), and the gateway routing to services.

Q: How do you handle authentication in a microservices architecture without an API gateway?

Without a gateway, each service must validate tokens independently. This means duplicating auth logic across services, or calling a central auth service on every request (adding latency). An API gateway centralizes this. If you cannot use a gateway, use a shared auth library that all services import, or use a service mesh that handles auth at the infrastructure level (mTLS + SPIFFE/SPIRE for service identity).

Interview questions

Q1: Your mobile app makes 12 API calls to render the home screen. How would you use an API gateway to fix this?

Strong answer: Implement a BFF (Backend for Frontend) for the mobile app. The BFF exposes a single endpoint: GET /mobile/home-screen. The BFF calls the 12 backend services in parallel (or in the optimal order if some depend on others), aggregates the results, and returns a single response tailored to what the mobile app needs. The mobile app makes one request instead of 12. Benefits: fewer round trips (critical on mobile networks), the BFF can optimize the response format for mobile (smaller payloads, different field names), and the mobile team owns the BFF so they can iterate without coordinating with 12 backend teams. The BFF calls backend services via gRPC for low latency.

Q2: How do you implement authentication in an API gateway without adding a database lookup to every request?

Strong answer: Use JWT (JSON Web Tokens). The auth service issues a JWT signed with a private key. The gateway validates the JWT signature using the public key - this is a local cryptographic operation, no database lookup needed. The JWT contains the user ID and permissions as claims. The gateway extracts these and passes them to backend services as trusted headers. The tradeoff: JWTs cannot be revoked before expiry. Mitigate with short expiry times (15 minutes) and refresh tokens. For immediate revocation (account suspension, logout), maintain a small revocation list in Redis. The gateway checks the revocation list only for tokens that are otherwise valid. This keeps the common case (valid token, not revoked) fast while supporting revocation.

Q3: Design an API gateway for a multi-tenant SaaS application where different tenants have different rate limits and feature flags.

Strong answer: The gateway needs to identify the tenant from the request (API key, subdomain, or JWT claim). Look up the tenant’s configuration (rate limits, feature flags) from a fast store (Redis, with a short TTL cache in the gateway process). Apply the tenant-specific rate limit using a token bucket per tenant. Check feature flags to determine which backend services to route to (some tenants might be on a beta feature). Pass the tenant ID to backend services as a trusted header. For the tenant configuration lookup: cache it in the gateway process for 60 seconds to avoid a Redis lookup on every request. Use a pub/sub mechanism to invalidate the cache when tenant configuration changes. This gives you per-tenant customization without adding significant latency to each request.