Pub/Sub Pattern: Broadcasting Events to Multiple Subscribers


Your user service sends a welcome email when a user registers. Then you add analytics tracking. Then loyalty points. Then a Slack notification for the sales team. Now your user service has four direct dependencies. Every new feature that reacts to user registration requires a change to the user service.

Pub/sub inverts this. The user service publishes a “user registered” event. The email service, analytics service, loyalty service, and Slack integration each subscribe to that event independently. The user service does not know or care who is listening. New subscribers can be added without touching the user service.

What pub/sub actually is

Pub/sub (publish-subscribe) is a messaging pattern where:

  • Publishers send messages to a topic (channel) without knowing who will receive them
  • Subscribers receive messages from topics they are interested in, without knowing who sent them
  • A message broker sits between publishers and subscribers, routing messages

The key property: publishers and subscribers are decoupled. They do not know about each other. They only know about the topic.

This is different from a point-to-point queue where one producer sends to one consumer. In pub/sub, one publisher can have many subscribers, and each subscriber gets a copy of every message.

graph LR
subgraph pubsub["Pub/Sub Pattern"]
  PUB["User service
(Publisher)"] -->|"user.registered event"| TOPIC["Topic:
user.registered"]
  TOPIC -->|"copy"| S1["Email service
(Subscriber)"]
  TOPIC -->|"copy"| S2["Analytics service
(Subscriber)"]
  TOPIC -->|"copy"| S3["Loyalty service
(Subscriber)"]
  TOPIC -->|"copy"| S4["Slack integration
(Subscriber)"]
end

style PUB fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style TOPIC fill:#F1EFE8,stroke:#888780,color:#444441
style S1 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style S2 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style S3 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style S4 fill:#E1F5EE,stroke:#0F6E56,color:#085041

Pub/sub vs message queue

Message queue (point-to-point):

  • One producer, one consumer group
  • Each message is processed by exactly one consumer
  • Used for task distribution (work queue)
  • Message is deleted after consumption

Pub/sub:

  • One publisher, multiple subscriber groups
  • Each subscriber group gets a copy of every message
  • Used for event broadcasting
  • Message is retained until all subscribers have consumed it (or until retention expires)

Most message systems support both. Kafka supports both through consumer groups (queue behavior) and multiple consumer groups (pub/sub behavior). AWS SNS is pub/sub. AWS SQS is a queue. They are often used together: SNS fans out to multiple SQS queues.

Pub/sub implementations

Topic-based pub/sub

Subscribers subscribe to named topics. All messages published to a topic are delivered to all subscribers of that topic. Simple and widely used.

Examples: Kafka topics, Google Pub/Sub topics, AWS SNS topics, Redis pub/sub channels.

Content-based pub/sub

Subscribers specify a filter expression. Messages are delivered only if they match the filter. More flexible but more complex.

Example: AWS SNS with message filtering. A subscriber can filter: {"event_type": ["order.shipped", "order.delivered"]}. Only matching messages are delivered.

Hierarchical topics

Topics are organized in a hierarchy. Subscribers can subscribe to a specific topic or a wildcard. orders.* subscribes to all order events. orders.shipped subscribes only to shipped events.

Examples: MQTT (IoT messaging), RabbitMQ topic exchanges.

graph TB
subgraph sns_sqs["AWS SNS + SQS Fan-out Pattern"]
  PUB2["Order service"] -->|"publish"| SNS["SNS Topic
order.created"]
  SNS -->|"fan-out"| SQS1["SQS Queue
Inventory service"]
  SNS -->|"fan-out"| SQS2["SQS Queue
Email service"]
  SNS -->|"fan-out"| SQS3["SQS Queue
Analytics service"]
  SQS1 --> INV["Inventory
workers"]
  SQS2 --> EMAIL["Email
workers"]
  SQS3 --> ANAL["Analytics
workers"]
end

style SNS fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style SQS1 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style SQS2 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style SQS3 fill:#E1F5EE,stroke:#0F6E56,color:#085041

Where it breaks or gets interesting

Event ordering across subscribers

Each subscriber processes events independently. If subscriber A processes event 1 before event 2, subscriber B might process event 2 before event 1. If subscribers need to see events in order, use a message system that guarantees ordering (Kafka with partition keys) and ensure each subscriber processes events from the same partition sequentially.

Subscriber failures

If a subscriber is down, what happens to messages? Options:

  • Durable subscriptions - Messages are stored until the subscriber comes back online (Kafka, Google Pub/Sub, SQS)
  • Non-durable subscriptions - Messages are lost if the subscriber is offline (Redis pub/sub)

For production systems, use durable subscriptions. Redis pub/sub is non-durable - if a subscriber disconnects, it misses messages. Use Redis Streams instead for durable pub/sub.

Fan-out at scale

Publishing one event to 1,000 subscribers requires delivering 1,000 copies. This is fan-out. At high volume, fan-out can overwhelm the message broker or the subscribers.

Solutions: use a message broker designed for fan-out (Kafka, SNS), batch deliveries, or use a tiered approach (SNS fans out to SQS queues, each queue has multiple workers).

The “who owns the schema” problem

In pub/sub, the publisher defines the event schema. Subscribers depend on it. If the publisher changes the schema (renames a field, removes a field), subscribers break.

Solutions: use a schema registry (Confluent Schema Registry, AWS Glue) to enforce compatibility. Use backward-compatible schema evolution (add fields, never remove). Version your events (event_version: 2).

Debugging distributed pub/sub

When something goes wrong, tracing the flow of an event through multiple subscribers is hard. Use distributed tracing (OpenTelemetry) to propagate trace IDs through events. Include the trace ID in every event payload. Subscribers continue the trace when processing.

Real-world systems

Google Pub/Sub - Managed pub/sub service. At-least-once delivery. Push and pull delivery modes. Dead letter topics. Global availability.

AWS SNS + SQS - SNS for fan-out, SQS for durable queues. The standard AWS pattern for pub/sub with reliable delivery.

Kafka - Multiple consumer groups achieve pub/sub semantics. Each consumer group is an independent subscriber. Events are retained for replay.

Redis pub/sub - Simple, fast, non-durable. Good for real-time notifications where message loss is acceptable (live dashboard updates, chat presence).

Redis Streams - Durable pub/sub with consumer groups. Better than Redis pub/sub for production use cases.

MQTT - Lightweight pub/sub protocol for IoT. Hierarchical topics. QoS levels (at-most-once, at-least-once, exactly-once). Used by millions of IoT devices.

How to apply it in practice

When to use pub/sub

Use pub/sub when:

  • Multiple services need to react to the same event
  • You want to decouple the publisher from subscribers
  • New subscribers should be addable without changing the publisher
  • Events should be broadcast to all interested parties

Use a point-to-point queue when:

  • Only one service processes each message
  • You need load balancing across multiple workers
  • You want exactly-one processing semantics

Event design for pub/sub

Good events are:

  • Named clearly - user.registered, order.shipped, not event1, update
  • Self-contained - Include all data subscribers need. Do not require subscribers to call back to the publisher for more data.
  • Versioned - Include a version field for schema evolution
  • Immutable - Events represent things that happened. Do not modify published events.

The SNS + SQS pattern on AWS

For reliable pub/sub on AWS:

  1. Create an SNS topic for each event type
  2. Create an SQS queue for each subscriber service
  3. Subscribe each SQS queue to the SNS topic
  4. Each subscriber service polls its SQS queue
  5. Configure dead letter queues on each SQS queue

This gives you: fan-out (SNS), durable delivery (SQS), independent scaling per subscriber, and dead letter handling.

FAQ

Q: What is the difference between pub/sub and event-driven architecture?

Event-driven architecture (EDA) is a broader architectural style where services communicate through events. Pub/sub is one implementation pattern within EDA. EDA also includes event sourcing (storing state as events), CQRS (separate read and write models), and event streaming (Kafka). Pub/sub is the messaging pattern; EDA is the architectural approach.

Q: How do you handle a subscriber that is much slower than others?

Each subscriber has its own queue (in the SNS + SQS pattern) or its own consumer group (in Kafka). A slow subscriber falls behind without affecting other subscribers. The slow subscriber’s queue grows. Scale the slow subscriber horizontally (add more workers) to increase its throughput. Monitor consumer lag per subscriber and alert when it exceeds a threshold.

Q: Should you use pub/sub for all inter-service communication?

No. Pub/sub is for asynchronous, one-to-many communication. For synchronous request-response (the caller needs the result immediately), use REST or gRPC. For one-to-one async communication (task queue), use a point-to-point queue. Pub/sub is the right choice when multiple services need to react to the same event and the publisher does not need to wait for the result.

Interview questions

Q1: You are building a notification system. When a user’s order ships, they should receive an email, a push notification, and an SMS. How do you design this with pub/sub?

Strong answer: The shipping service publishes an order.shipped event to a topic. Three subscribers consume this event: the email service, the push notification service, and the SMS service. Each subscriber processes the event independently. If the SMS service is down, the email and push notification still go out. When the SMS service recovers, it processes the queued events. Use durable subscriptions (SQS queues in AWS, or Kafka consumer groups) so events are not lost if a subscriber is temporarily down. Each subscriber should be idempotent - if the same event is delivered twice (at-least-once delivery), the user should not receive duplicate notifications. Use the event ID as an idempotency key.

Q2: Your pub/sub system has 50 subscribers for a high-volume topic (1 million events per second). How do you handle the fan-out?

Strong answer: At 1 million events per second with 50 subscribers, you need to deliver 50 million messages per second total. This requires a message broker designed for high fan-out. Use Kafka: the topic has many partitions, each subscriber is a consumer group. Kafka delivers each event to each consumer group independently. The broker does not need to copy the event 50 times - each consumer group reads from the same partition log at its own offset. This is much more efficient than SNS fan-out (which creates 50 copies). For the consumers: each consumer group has enough instances to keep up with the partition count. Monitor consumer lag per group. If a consumer group falls behind, scale it independently without affecting other groups.

Q3: How do you handle schema evolution in a pub/sub system where you cannot update all subscribers simultaneously?

Strong answer: Use backward-compatible schema evolution. When adding a field, make it optional with a default value. Old subscribers ignore the new field. New subscribers use it. Never remove or rename fields without a migration period. Use a schema registry (Confluent Schema Registry) to enforce compatibility rules: backward compatible (new schema can read old messages), forward compatible (old schema can read new messages). Version your events: include a schema_version field. Subscribers check the version and handle accordingly. For breaking changes: publish to a new topic version (orders.v2.shipped) while keeping the old topic active. Migrate subscribers one by one. Deprecate the old topic after all subscribers have migrated. This is the same expand-contract pattern used for API versioning.