Pub/Sub Pattern: Broadcasting Events to Multiple Subscribers
Your user service sends a welcome email when a user registers. Then you add analytics tracking. Then loyalty points. Then a Slack notification for the sales team. Now your user service has four direct dependencies. Every new feature that reacts to user registration requires a change to the user service.
Pub/sub inverts this. The user service publishes a “user registered” event. The email service, analytics service, loyalty service, and Slack integration each subscribe to that event independently. The user service does not know or care who is listening. New subscribers can be added without touching the user service.
What pub/sub actually is
Pub/sub (publish-subscribe) is a messaging pattern where:
- Publishers send messages to a topic (channel) without knowing who will receive them
- Subscribers receive messages from topics they are interested in, without knowing who sent them
- A message broker sits between publishers and subscribers, routing messages
The key property: publishers and subscribers are decoupled. They do not know about each other. They only know about the topic.
This is different from a point-to-point queue where one producer sends to one consumer. In pub/sub, one publisher can have many subscribers, and each subscriber gets a copy of every message.
graph LR subgraph pubsub["Pub/Sub Pattern"] PUB["User service (Publisher)"] -->|"user.registered event"| TOPIC["Topic: user.registered"] TOPIC -->|"copy"| S1["Email service (Subscriber)"] TOPIC -->|"copy"| S2["Analytics service (Subscriber)"] TOPIC -->|"copy"| S3["Loyalty service (Subscriber)"] TOPIC -->|"copy"| S4["Slack integration (Subscriber)"] end style PUB fill:#EEEDFE,stroke:#534AB7,color:#3C3489 style TOPIC fill:#F1EFE8,stroke:#888780,color:#444441 style S1 fill:#E1F5EE,stroke:#0F6E56,color:#085041 style S2 fill:#E1F5EE,stroke:#0F6E56,color:#085041 style S3 fill:#E1F5EE,stroke:#0F6E56,color:#085041 style S4 fill:#E1F5EE,stroke:#0F6E56,color:#085041
Pub/sub vs message queue
Message queue (point-to-point):
- One producer, one consumer group
- Each message is processed by exactly one consumer
- Used for task distribution (work queue)
- Message is deleted after consumption
Pub/sub:
- One publisher, multiple subscriber groups
- Each subscriber group gets a copy of every message
- Used for event broadcasting
- Message is retained until all subscribers have consumed it (or until retention expires)
Most message systems support both. Kafka supports both through consumer groups (queue behavior) and multiple consumer groups (pub/sub behavior). AWS SNS is pub/sub. AWS SQS is a queue. They are often used together: SNS fans out to multiple SQS queues.
Pub/sub implementations
Topic-based pub/sub
Subscribers subscribe to named topics. All messages published to a topic are delivered to all subscribers of that topic. Simple and widely used.
Examples: Kafka topics, Google Pub/Sub topics, AWS SNS topics, Redis pub/sub channels.
Content-based pub/sub
Subscribers specify a filter expression. Messages are delivered only if they match the filter. More flexible but more complex.
Example: AWS SNS with message filtering. A subscriber can filter: {"event_type": ["order.shipped", "order.delivered"]}. Only matching messages are delivered.
Hierarchical topics
Topics are organized in a hierarchy. Subscribers can subscribe to a specific topic or a wildcard. orders.* subscribes to all order events. orders.shipped subscribes only to shipped events.
Examples: MQTT (IoT messaging), RabbitMQ topic exchanges.
graph TB subgraph sns_sqs["AWS SNS + SQS Fan-out Pattern"] PUB2["Order service"] -->|"publish"| SNS["SNS Topic order.created"] SNS -->|"fan-out"| SQS1["SQS Queue Inventory service"] SNS -->|"fan-out"| SQS2["SQS Queue Email service"] SNS -->|"fan-out"| SQS3["SQS Queue Analytics service"] SQS1 --> INV["Inventory workers"] SQS2 --> EMAIL["Email workers"] SQS3 --> ANAL["Analytics workers"] end style SNS fill:#EEEDFE,stroke:#534AB7,color:#3C3489 style SQS1 fill:#E1F5EE,stroke:#0F6E56,color:#085041 style SQS2 fill:#E1F5EE,stroke:#0F6E56,color:#085041 style SQS3 fill:#E1F5EE,stroke:#0F6E56,color:#085041
Where it breaks or gets interesting
Event ordering across subscribers
Each subscriber processes events independently. If subscriber A processes event 1 before event 2, subscriber B might process event 2 before event 1. If subscribers need to see events in order, use a message system that guarantees ordering (Kafka with partition keys) and ensure each subscriber processes events from the same partition sequentially.
Subscriber failures
If a subscriber is down, what happens to messages? Options:
- Durable subscriptions - Messages are stored until the subscriber comes back online (Kafka, Google Pub/Sub, SQS)
- Non-durable subscriptions - Messages are lost if the subscriber is offline (Redis pub/sub)
For production systems, use durable subscriptions. Redis pub/sub is non-durable - if a subscriber disconnects, it misses messages. Use Redis Streams instead for durable pub/sub.
Fan-out at scale
Publishing one event to 1,000 subscribers requires delivering 1,000 copies. This is fan-out. At high volume, fan-out can overwhelm the message broker or the subscribers.
Solutions: use a message broker designed for fan-out (Kafka, SNS), batch deliveries, or use a tiered approach (SNS fans out to SQS queues, each queue has multiple workers).
The “who owns the schema” problem
In pub/sub, the publisher defines the event schema. Subscribers depend on it. If the publisher changes the schema (renames a field, removes a field), subscribers break.
Solutions: use a schema registry (Confluent Schema Registry, AWS Glue) to enforce compatibility. Use backward-compatible schema evolution (add fields, never remove). Version your events (event_version: 2).
Debugging distributed pub/sub
When something goes wrong, tracing the flow of an event through multiple subscribers is hard. Use distributed tracing (OpenTelemetry) to propagate trace IDs through events. Include the trace ID in every event payload. Subscribers continue the trace when processing.
Real-world systems
Google Pub/Sub - Managed pub/sub service. At-least-once delivery. Push and pull delivery modes. Dead letter topics. Global availability.
AWS SNS + SQS - SNS for fan-out, SQS for durable queues. The standard AWS pattern for pub/sub with reliable delivery.
Kafka - Multiple consumer groups achieve pub/sub semantics. Each consumer group is an independent subscriber. Events are retained for replay.
Redis pub/sub - Simple, fast, non-durable. Good for real-time notifications where message loss is acceptable (live dashboard updates, chat presence).
Redis Streams - Durable pub/sub with consumer groups. Better than Redis pub/sub for production use cases.
MQTT - Lightweight pub/sub protocol for IoT. Hierarchical topics. QoS levels (at-most-once, at-least-once, exactly-once). Used by millions of IoT devices.
How to apply it in practice
When to use pub/sub
Use pub/sub when:
- Multiple services need to react to the same event
- You want to decouple the publisher from subscribers
- New subscribers should be addable without changing the publisher
- Events should be broadcast to all interested parties
Use a point-to-point queue when:
- Only one service processes each message
- You need load balancing across multiple workers
- You want exactly-one processing semantics
Event design for pub/sub
Good events are:
- Named clearly -
user.registered,order.shipped, notevent1,update - Self-contained - Include all data subscribers need. Do not require subscribers to call back to the publisher for more data.
- Versioned - Include a version field for schema evolution
- Immutable - Events represent things that happened. Do not modify published events.
The SNS + SQS pattern on AWS
For reliable pub/sub on AWS:
- Create an SNS topic for each event type
- Create an SQS queue for each subscriber service
- Subscribe each SQS queue to the SNS topic
- Each subscriber service polls its SQS queue
- Configure dead letter queues on each SQS queue
This gives you: fan-out (SNS), durable delivery (SQS), independent scaling per subscriber, and dead letter handling.
FAQ
Q: What is the difference between pub/sub and event-driven architecture?
Event-driven architecture (EDA) is a broader architectural style where services communicate through events. Pub/sub is one implementation pattern within EDA. EDA also includes event sourcing (storing state as events), CQRS (separate read and write models), and event streaming (Kafka). Pub/sub is the messaging pattern; EDA is the architectural approach.
Q: How do you handle a subscriber that is much slower than others?
Each subscriber has its own queue (in the SNS + SQS pattern) or its own consumer group (in Kafka). A slow subscriber falls behind without affecting other subscribers. The slow subscriber’s queue grows. Scale the slow subscriber horizontally (add more workers) to increase its throughput. Monitor consumer lag per subscriber and alert when it exceeds a threshold.
Q: Should you use pub/sub for all inter-service communication?
No. Pub/sub is for asynchronous, one-to-many communication. For synchronous request-response (the caller needs the result immediately), use REST or gRPC. For one-to-one async communication (task queue), use a point-to-point queue. Pub/sub is the right choice when multiple services need to react to the same event and the publisher does not need to wait for the result.
Interview questions
Q1: You are building a notification system. When a user’s order ships, they should receive an email, a push notification, and an SMS. How do you design this with pub/sub?
Strong answer: The shipping service publishes an order.shipped event to a topic. Three subscribers consume this event: the email service, the push notification service, and the SMS service. Each subscriber processes the event independently. If the SMS service is down, the email and push notification still go out. When the SMS service recovers, it processes the queued events. Use durable subscriptions (SQS queues in AWS, or Kafka consumer groups) so events are not lost if a subscriber is temporarily down. Each subscriber should be idempotent - if the same event is delivered twice (at-least-once delivery), the user should not receive duplicate notifications. Use the event ID as an idempotency key.
Q2: Your pub/sub system has 50 subscribers for a high-volume topic (1 million events per second). How do you handle the fan-out?
Strong answer: At 1 million events per second with 50 subscribers, you need to deliver 50 million messages per second total. This requires a message broker designed for high fan-out. Use Kafka: the topic has many partitions, each subscriber is a consumer group. Kafka delivers each event to each consumer group independently. The broker does not need to copy the event 50 times - each consumer group reads from the same partition log at its own offset. This is much more efficient than SNS fan-out (which creates 50 copies). For the consumers: each consumer group has enough instances to keep up with the partition count. Monitor consumer lag per group. If a consumer group falls behind, scale it independently without affecting other groups.
Q3: How do you handle schema evolution in a pub/sub system where you cannot update all subscribers simultaneously?
Strong answer: Use backward-compatible schema evolution. When adding a field, make it optional with a default value. Old subscribers ignore the new field. New subscribers use it. Never remove or rename fields without a migration period. Use a schema registry (Confluent Schema Registry) to enforce compatibility rules: backward compatible (new schema can read old messages), forward compatible (old schema can read new messages). Version your events: include a schema_version field. Subscribers check the version and handle accordingly. For breaking changes: publish to a new topic version (orders.v2.shipped) while keeping the old topic active. Migrate subscribers one by one. Deprecate the old topic after all subscribers have migrated. This is the same expand-contract pattern used for API versioning.