Job queues are an essential component in non-trivial distributed applications. They are used for a number of reasons. Performing long-running or expensive tasks asynchronously, notifying several systems when something happens in another system are a couple of examples.

Job queues are also a great tool to improve system reliability. In this article I will explore some approaches to using job queues which I have found to benefit systems in terms of reliability.

As to everything, there are trade-offs to using job queues. I will cover them in a follow-up blog post.

Improving System Reliability with Job Queues

Frequently, an application architecture involving a job queue, looks like this:

This simple design already provides many possibilities to improve system reliability.

Initial Acceptance Layer

Many applications process requests that trigger an action but do not yield immediate responses. I.e.: send an email, process a payment, order food. These systems can place a job queue between their API endpoint and the backend that does all the processing. In that case, the input endpoint could depend solely on the job queue. Since job queues lend themselves quite well to highly-available setups, the forementioned endpoint could become very reliable.

Rate Limitting

Many times job queues are not meant to be implemented as rate limitters but they do end up fulfilling the role. The processing layer pulls jobs from the queue at the rate it can process so the queue acts like the bucket in the leaky bucket rate limitter implementation.

This can keep the whole system from becoming overwhelmed. Furthermore, when needed, some traffic could be prioritised to enable quick processing of critical jobs.

Ability to Retry

Another advantage of using job queues is the ability to implement retry mechanisms for failed jobs. This allows the processing system to experience critical failures (as long as recovery is quick enough) and pick up right where it left off.

Summary

Utilizing job queues can lead to significantly more reliable applications. As with everything, there are trade-offs to be considered but the net positive tends to outweight the negative. I will describe some of these trade-offs in a follow-up article.