Stateless vs Stateful: The Architecture Decision That Shapes Everything

You deploy your API to 10 servers. A user logs in on server 3. Their session is stored in server 3’s memory. Their next request hits server 7. Server 7 has no session. The user is logged out. You add sticky sessions to the load balancer. Now server 3 handles all traffic from that user. Server 3 gets overloaded. You cannot scale it independently. You cannot restart it without logging out users.

You have a stateful service. And you are discovering why stateless is the default recommendation for application servers.

What state actually means

State is any data that persists between requests and affects how future requests are handled. If you can restart a server and it behaves identically for all future requests, it is stateless. If restarting it changes behavior (because it loses session data, cached computations, or in-progress work), it is stateful.

Stateless service - Each request contains all the information needed to process it. The server does not remember anything between requests. Any instance can handle any request.

Stateful service - The server maintains data between requests. A specific instance must handle requests from a specific client, or state must be synchronized across instances.

graph TB
subgraph stateless["Stateless - Any Server Can Handle Any Request"]
  LB1["Load Balancer"]
  S1["Server 1"]
  S2["Server 2"]
  S3["Server 3"]
  EXT["External State
Database, Redis, S3"]
  LB1 --> S1
  LB1 --> S2
  LB1 --> S3
  S1 --> EXT
  S2 --> EXT
  S3 --> EXT
end

subgraph stateful["Stateful - Client Bound to Specific Server"]
  LB2["Load Balancer
Sticky Sessions"]
  SS1["Server 1
User A session"]
  SS2["Server 2
User B session"]
  SS3["Server 3
User C session"]
  LB2 -->|"User A always"| SS1
  LB2 -->|"User B always"| SS2
  LB2 -->|"User C always"| SS3
end

style EXT fill:#E1F5EE,stroke:#0F6E56,color:#085041
style LB1 fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style LB2 fill:#FAEEDA,stroke:#854F0B,color:#633806

Why stateless is the default for application servers

Horizontal scaling - Add more instances behind a load balancer. Any instance handles any request. No sticky sessions needed.

Zero-downtime deploys - Rolling deploys work: take one instance out of rotation, deploy, bring it back. No user sessions are lost because sessions are not stored in the instance.

Fault tolerance - If an instance crashes, the load balancer routes to healthy instances. No user is affected because their state is in an external store.

Simplicity - No synchronization between instances. No distributed cache invalidation. No split-brain scenarios.

The key insight: stateless does not mean no state. It means state is stored externally - in a database, Redis, or another dedicated store. The application server is stateless; the state store is stateful.

Where state lives in a stateless architecture

Sessions - Store in Redis or a database. Include a session token in every request (cookie or Authorization header). The server looks up the session on each request.

User-specific cache - Store in Redis with a user-scoped key. Any server can read it.

File uploads - Write to S3 or shared object storage. Return a URL. Any server can serve the URL.

Background job state - Store in a database or job queue (Redis, SQS). Any worker can pick up any job.

WebSocket connections - These are inherently stateful (a connection is tied to a specific server). Handle with a pub/sub layer: when a message needs to go to a user, publish it to Redis pub/sub. All servers subscribe. The server holding that user’s connection delivers it.

When stateful is the right choice

Some services are inherently stateful and fighting it adds complexity without benefit.

Databases - The entire point is to persist state. PostgreSQL, MySQL, Cassandra are all stateful. You scale them with replication and sharding, not by making them stateless.

Caches - Redis, Memcached. Stateful by design. You scale with clustering and replication.

Stream processors - Kafka consumers maintain offset state. Flink and Spark Streaming maintain aggregation state. Making these stateless would require re-reading all historical data on every restart.

Game servers - Real-time game state (player positions, game world) must be in memory for performance. Serializing to a database on every frame is not feasible.

Long-running computations - A video encoding job that takes 2 hours maintains state about its progress. Checkpointing to a database periodically is the right approach, not making it stateless.

graph LR
subgraph stateless_ok["Make Stateless"]
  API["REST API servers"]
  WEB["Web servers"]
  WORK["Short-lived workers"]
  AUTH["Auth services"]
end

subgraph stateful_ok["Keep Stateful"]
  DB["Databases"]
  CACHE["Caches"]
  STREAM["Stream processors"]
  GAME["Game servers"]
end

subgraph hybrid["Hybrid - Stateless with External State"]
  WS["WebSocket servers
Connection state local
Message state in Redis"]
  SESS["Session servers
No local state
Session in Redis"]
end

style stateless_ok fill:#E1F5EE,stroke:#0F6E56,color:#085041
style stateful_ok fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style hybrid fill:#FAEEDA,stroke:#854F0B,color:#633806

Where it breaks or gets interesting

The external state store becomes the bottleneck

You made your application servers stateless by moving sessions to Redis. Now all 50 application servers hit Redis on every request. Redis becomes the bottleneck and the single point of failure. You have moved the state problem, not eliminated it.

Fix: Redis Cluster for horizontal scaling, Redis Sentinel or Redis Cluster for high availability. Or use JWT tokens that are self-contained (no Redis lookup needed for validation).

JWT as stateless sessions

JSON Web Tokens (JWTs) are a way to make authentication stateless. The token contains the user’s identity and claims, signed by the server. Any server can verify the signature without a database lookup.

The tradeoff: you cannot invalidate a JWT before it expires. If a user logs out or their account is suspended, the JWT is still valid until expiry. Solutions: short expiry times (15 minutes) with refresh tokens, or maintain a small revocation list in Redis (which reintroduces some state).

Stateful protocols

Some protocols are inherently stateful. FTP has a control connection and a data connection. SMTP has a conversation with multiple commands. These require the same server to handle the entire conversation. Modern APIs avoid this by using stateless HTTP where each request is independent.

The session affinity trap

Sticky sessions (session affinity) make a stateful service appear to work with multiple instances. But it is a band-aid. The load balancer must track which client goes to which server. If that server goes down, the session is lost. You cannot rebalance load without disrupting sessions. Sticky sessions are a workaround, not a solution. The real fix is to externalize the state.

Actor model for stateful services

The actor model (Erlang, Akka, Orleans) provides a structured way to build stateful distributed services. Each actor is a lightweight stateful entity. The framework handles routing messages to the right actor, distributing actors across nodes, and recovering actors after failures. This is how game servers, real-time collaboration tools, and IoT platforms handle per-entity state at scale.

Real-world systems

Kubernetes pods - Designed to be stateless and ephemeral. Pods can be killed and rescheduled on any node. StatefulSets are a separate abstraction for stateful workloads (databases) that need stable network identities and persistent storage.

AWS Lambda - Explicitly stateless. Each invocation may run on a different container. State must be in S3, DynamoDB, or another external service. The execution environment may be reused (warm start) but you cannot rely on it.

Netflix - Application servers are stateless. User preferences, watch history, and recommendations are stored in Cassandra and EVCache (a distributed Memcached layer). Any server can handle any request.

Twitch - Chat is stateful (WebSocket connections). Each chat server holds connections for a subset of channels. A pub/sub layer (custom built on top of Kafka) routes messages between servers. When a server restarts, clients reconnect and the load balancer distributes them to healthy servers.

How to apply it in practice

The stateless checklist

Before deploying multiple instances of a service, verify:

No in-process session storage (use Redis or JWT)
No local file writes that other instances need (use S3 or shared NFS)
No in-process caches that must be consistent (use Redis or accept per-instance staleness)
Background jobs are idempotent (multiple instances might process the same job)
Scheduled tasks use distributed locking (only one instance runs the cron job)
WebSocket connections use a pub/sub layer for cross-instance messaging

Choosing where to store state

Short-lived session data (minutes to hours): Redis with TTL
User data that must survive restarts: database
Large binary data (files, images): object storage (S3)
Computed results that are expensive to recompute: Redis cache with TTL
Audit logs and events: append-only database or event stream

FAQ

Q: Is a stateless service always better than a stateful one?

No. Stateless is better for application servers because it enables horizontal scaling and simplifies operations. But for data stores, caches, and stream processors, stateful is the right model. The goal is not to eliminate state but to put state in the right place - in dedicated, purpose-built stateful systems rather than in application servers.

Q: How do you handle WebSocket connections in a stateless architecture?

WebSocket connections are inherently stateful - a connection is tied to a specific server. The standard approach: use a pub/sub layer (Redis pub/sub, Kafka) for message routing. When server A needs to send a message to a user connected to server B, it publishes to a channel. Server B subscribes to that channel and delivers the message. The application servers are still stateless in the sense that any server can handle any new connection, and the message routing is handled by the pub/sub layer rather than direct server-to-server communication.

Q: What is the performance cost of stateless architecture?

The main cost is the extra network round trip to the external state store on each request. A Redis lookup adds 0.5-2ms. A database session lookup adds 5-20ms. For most applications, this is acceptable. For extremely latency-sensitive applications, you can use in-process caching with a short TTL to reduce external lookups, accepting that the cache might be slightly stale. JWT tokens eliminate the session lookup entirely for authentication, at the cost of not being able to invalidate tokens immediately.

Interview questions

Q1: You are migrating a monolith to microservices. The monolith stores user sessions in memory. How do you handle this during the migration?

Strong answer: The session storage is the first thing to externalize, before splitting the monolith. Add a Redis session store to the monolith. Change the session middleware to read and write from Redis instead of in-process memory. Deploy multiple instances of the monolith to verify sessions work correctly across instances. Now the monolith is stateless and you can split it into microservices without worrying about session state. Each microservice can validate sessions by reading from the same Redis store, or you can switch to JWT tokens so services do not need to call Redis at all.

Q2: Design a real-time multiplayer game server that needs to handle 100,000 concurrent players.

Strong answer: Game state (player positions, game world) must be in memory for performance - this is inherently stateful. Use the actor model: each game room is an actor with its own state. Distribute actors across a cluster of game servers. Use consistent hashing to route players to the right server for their game room. For persistence, checkpoint game state to a database every few seconds. For player connections, use WebSockets. When a server fails, the actors on that server are restarted on other servers from the last checkpoint. Players reconnect and the game resumes from the checkpoint. The stateful part (game room state) is managed by the actor framework. The connection handling is stateless in the sense that any server can accept a new connection and route it to the right actor.

Q3: Your API service is stateless but you are seeing high latency spikes. Investigation shows the Redis session store is the bottleneck. How do you fix it?

Strong answer: Several options depending on the root cause. If Redis is CPU-bound: switch to Redis Cluster to distribute load across multiple Redis nodes. If Redis is network-bound: add a local in-process cache (LRU cache) in front of Redis with a short TTL (1-5 seconds). Most session reads are for the same user making multiple requests in quick succession - the local cache absorbs these. If the session data is large: store only the session ID in Redis and fetch user data from the database lazily. If you want to eliminate Redis entirely for auth: switch to JWT tokens. The API server validates the JWT signature locally without any external call. Revocation is handled by short expiry times (15 minutes) and a refresh token flow. This trades the ability to instantly revoke sessions for zero-latency auth.