Messaging in Distributed Systems (Pub-Sub, Message Queues, etc)

| 7 min read

Introduction

Messaging is a powerful design pattern in distributed systems, allowing us to decouple system components and enforce greater resiliency and reliability in communication between system components. Typically, one of many dedicated messaging technologies (or message-oriented middleware) is used, with common examples being Kafka, RabbitMQ, and managed services like AWS SQS or GCP's Pub-Sub. But messaging is not suitable for all systems and use cases. While it can greatly simplify certain types of distributed systems and solve many problems, using them in the wrong place can over-complicate the system, make it more expensive, and introduce performance bottlenecks. Before using a messaging technology, it is key to understand the tradeoffs it offers.

In this article, we will discuss messaging at a high level, gaining an understanding of what it really is, its capabilities and limitations. While messaging is a broad term, encompassing more specific patterns such as message queuing, publish-subscribe messaging, and data streaming, they share many aspects and key concepts, and we will focus on messaging as a whole. I will discuss those specific patterns and specific technologies in-depth in future articles.

What is a Messaging Service

In distributed systems, a messaging service is a middleware component that handles receiving and sending messages between other components of the system. It is essentially a communication middle man, abstracting away communication between components and simplifying it. It takes messages from one or more system components, and relays it to one or more other components.

| Message Producer (A) | ----> | Message Broker | ----> | Message Consumer (B) |

Why Use Messaging?

On a first look, it might seem like messaging is an unnecessary addition to the system, and we are just placing an unnecessary middle-man when we could establish communication directly. Why not just have the message producer send the message directly to the consumer? This question is valid, and for many cases, that's what you should do if the communication between two components is simple. But a message service will typically do a lot more than merely relay messages, such as persisting messages, retrying failures, solving complicated routing patterns, etc. Let's take a deeper look at what it is capable of.

Example

Before we dive deeper, let us consider an example where using a message broker is useful. Think of this example as you read the rest of the article, as it may help deepen your understanding.

Suppose we have a video sharing platform (similar to TikTok). After the video is uploaded, we want to run the video through our community guideline scanner. We decided to do this asynchronously. So after the video is uploaded, we post a message to our message broker, and eventually, this message gets fed into our community guideline scanner.

This is not the only way to architect this, but we take this approach as an example. We also assume that the community guideline scanner can be a long running or CPU-intensive process.

Video Uploader ----> Message Queue ----> Community Guidelines Scanner

Now let us dive deeper into what a message broker can possibly do for us.

Decoupling System Components

By using a messaging middleware, services A and B are now decoupled, and the contract between them is simplified. Services A and B can each focus on their own tasks, and service A only has to concern itself with sending a message to the message middleware, while service B only has to concern itself with reading or receiving a message. Beyond that, the services are now decoupled and are not concerned with the details or implementation of the other. They only have to agree on the message format or protocol. As per our example, the video uploader is no longer concerned with the Community Guidelines Scanner. It only needs to focus on video uploading and sending the message to the messaging middleware.

"What do you mean they don't have to concern themselves with other details or implementation? I don't use messaging middleware and I have never had to deal with that!". Bear with me, I discuss that in the remaining points.

Asynchronous Workflows

If service A wants to call service B to continue execution, but it can be asynchronous, we acquire many benefits using a message middleware. Service A can just send a message to the message middleware, and now its free to process other requests, while the middleware handles telling B to process the remainder of the workflow asynchronously, or merely stores the message until B asks for it. Service A no longer has to concern itself with whether B has received the message or not, or when it is done processing it. This is delegated to the message middleware.

Following our example, as we said earlier, the video uploader does not have to deal with the community guidelines scanner or even delivering the message to it successfully. We do not have to wait for the long running process. We are immediately free to handle more video uploads again.

Handling Retries

When one service A has to communicate or call another service B, any failures in service B may lead to loss of data or workflow breakages. Without a message middleware, service A would have to be concerned with making sure the delivery of the message to service B was successful, and that service B was executed successfully. It is often preferable to free service A to process other requests, and offload this task to a message middleware to handle it. A message middleware is able to retry delivering a message until it is processed successfully. It is also capable of ensuring that retrying failures does not impact other traffic negatively.

Following the example, we would not want a failure in the guidelines scanner to not be retried. But we also do not want the video uploader to handle that. A message broker will take over handling retries.

Prioritization and Ordering

Some message middlewares allow for prioritizing the consumption of messages based on a specific ordering algorithm. Sometimes, an ordering scheme may be as simple as processing messages in the order they were received, "First In First Out", or something more complex, like sorting by priority assigned by the message producer.

In our example, consider if we implemented prioritization. Maybe we want videos coming from users with a record of guideline violations to have higher priority in being scanned. Our message broker can ensure that higher priority videos will get scanned first.

Routing

A service might be seeking to communicate with many, sometimes an undefined or changing number of receivers. This can complicate the service, and it can be simpler to instead send a message to a message broker and delegate the task of communication to it. The message broker can then handle delivering the message to the right receivers. Receivers can even pull the messages, which prevents from having to keep track of receivers. This means that messaging can greatly simplify communication routing. An almost-identical use case is a receiver wanting to receive messages from an undefined or changing number of senders.

Moreover, when operating at scale, there may be a changing number of instances of a given service as it scales up or down. While there are other solutions to this, a message broker can simplify this task as we just explained.

Deduplication

Message middlewares can often handle deduplicating messages. If the same message was sent multiple times, but it is necessary to process a message only once, deduplication can be enacted. Moreover, a message middleware can ensure that a received message is relayed to only one consumer, and only once (or only until successfully processed once). This way, we prevent duplicate processing of a message.

Following our example, we would not want to waste compute resources scanning the same video twice, since scanning may be expensive.

Handling Traffic Spikes

One use case where message middlewares can be quite helpful is when handling traffic spikes, especially if they're unpredictable and we are working with long-running or intensive processes or asynchronous workflows. By using a message middleware, we can queue and persist the messages until they are processed, without worrying about not having enough instances up to handle our traffic.

Scaling Independently

By decoupling system components, not only have we separated concerns, but we can allow the services to scale independently from each other. One service might be long running and intensive, and having larger scaling requirements.

Persisting Messages for Offline Consumers

Imagine if, in our example, the guidelines scanner went down. We would still want to scan all these videos when it comes back up. A message broker can persist these messages long term, until they are consumed. This is similar to how if you were offline for a while, you would still receive messages on your phone that were sent while you were offline, once you come back online.

Limitations of Messaging Middleware: When NOT To Use Them?

While messaging can be a worthy addition to your system, it is not without its setbacks and limitations. In general, it is rare for an addition to a distributed system to not come with its hurdles. Let us discuss them:

Limitations of Most Additions to a Distributed System

The following applies to many distributed system components. I will not discuss them in detail, as they deserve their own discussion:

  • Over-complicating the system
  • Increasing cost
  • Incurring latency costs: A message has to travel through greater distances and more components
  • Performance costs
  • Introducing extra points of failure that have to be maintained
  • Introducing more paths for the application flow to be kept track of
  • More components for which we need to configure, and unless is a managed service, we have to manage and maintain its infrastructure

Synchronous Workflows

A message middleware is unsuitable for synchronous workflows. It is often the case that a service A invokes a service B synchronously. This could be because service A uses the response from B to continue its processing, or it could be because service A handles the response to a client as part of a request-response cycle.

Message middlewares are great for asynchronous workflows, and often times, it might be beneficial to convert your synchronous workflow into asynchronous,either partially or completely. But this is not always possible, and one has to evaluate the benefits of such conversion

Latency

Although latency increases are common to many distributed system components, many (but not all) message middlewares may incur greater latency costs than others.

In general, a message middleware will often persist the message rather than merely route it. A message received by the middleware may take time to become available for consumption or processing. The middleware has to first persist the message, and a distributed message middleware may have to replicate the persisted message to multiple clusters.

This is why most message middlewares are used in asynchronous workflows.

However, it is important to point out that this does not apply to all message middlewares. Some are designed to handle stream processing at incredibly low latency, but still garner the benefits of the messaging pattern. I will discuss this in a future article about Apache Kafka.

Conclusion

Let us revise our definition of messaging in distributed systems. Recall that a message middleware is a service that acts as a communication middle-man between two or more system components. Other system components can send messages to the messaging service, or receive (or pull) messages from it.

But this was not sufficient to truly grasp the power of messaging. A messaging service abstracts away the complications of communication in distributed systems, and allows system components to simplify communication by only sending or receiving messages. The message middleware can handle retrying failures, persisting messages, and simplify routing communication to multiple receivers or instances of a receiver. It can allow the system to scale better, be more reliable, and prevent error states.

Before using a message broker, remember that they are not always trivial to setup and maintain, and they may be adding performance bottlenecks to your system. Make sure to consider the tradeoffs before putting in the effort!