A Guide to Microservices Coordination
Deciding Between Orchestration and Choreography for Microservices
Introduction
When coordinating the work of multiple microservices, we have two main approaches to choose from. Orchestration or choreography. In this article I would like to bring both of them closer, what the main characteristics are and in which situation we should pick the given solution.
What is coordination?
In microservices, we often encounter situations where multiple independently deployed services work together to complete one bigger business request. Consider an e-commerce application. A business request for such an application could be creating an order for the customer. To ensure that an order can be created, we must know that the product exists and is physically available in the warehouse.
In the monolithic application, we could create a database transaction for it to ensure the atomicity of this request by checking product existence, decreasing warehouse availability and creating an order in one go.
In distributed systems, product, warehouse state, and order data are most likely stored in different databases or on different machines.
To solve such an issue, we need to coordinate different services that own this data to process business requests and handle potential failures. There are two coordination patterns, orchestration and choreography, and we will take a look at how they could be implemented, what the upsides and downsides are and when we could use them.
Disclaimer: I distinguish "request" from "business request." In microservices, the first type is a request made to one service; depending on the chosen communication medium, it might or might not receive a response. The second type is a request to achieve a business need, and it might consist of several requests to different microservices to bring value to the user or interested parties.
Orchestration
Let's start with a simpler one. Orchestration, as the name suggests, should be orchestrated or, more specifically, should have an orchestrator.
The orchestrator is a dedicated microservice that keeps track of business requests, handles the state of the requests that are going through it and handles exceptions from downstream services.
In the image below, we can see that when we talk about the distributed system, there are four services: orders-orchestrator, orders-service, product-service and warehouse-service. Orders-orchestrator is the upstream service for the rest of them. None of those services depend on one another; they depend only on the orchestrator and database. While this example is simplified, services can have their dependencies; we will use this as an example to illustrate the rules.
The first call is fairly simple and does not require any special coordination of the requests; if the product service states that there is no such product, we can easily fail the request and return an HTTP not found response.
However, the tricky part comes in when creating an order fails. At this point, the product stock is already reserved and decreased, and it is changed in the database. If someone would like to read from this database, it might turn out that there are no more products in stock, and the warehouse-service has no idea that the business request failed on the orders-service stage. But when the orders-service failed, it, of course, returned information about that to the orchestrator. Now, it is up to the orchestrator to send this information to the warehouse service to revert this action and return the stock amount to where it was before starting this operation. Such an operation is called a compensating transaction.
Of course, this example is very simple; we can imagine way more complex situations. It is worth noting that reads do not increase complexity too much, but operations that mutate the state of the database do.
The example in this article was very simple, but in microservices, we can also face ourselves with coordinating transactions that take much more time. Let's just imagine that as part of a similar transaction, we would like to email the user when the order leaves the warehouse.
I have not mentioned almost anything about the communication medium between services in the above examples. We can choose from various synchronous or asynchronous communication mediums, which also affect complexity, scalability, responsiveness, and resilience.
Data consistency is another topic besides communication. Keeping data highly consistent between services is a non-trivial task.
In later blog posts, I will discuss communication and consistency, how they affect microservices architecture, and when to choose which solution.
Strengths and weaknesses of orchestration
Strengths of orchestration
Easy to implement: Much more resembles code that would be written in a monolithic application
Compensating transactions are straightforward: Because of this it is easier to maintain consistent data in databases of orchestrated services
Easier to maintain: Because all changes must be handled at one point of coordination, it's easier to predict the impact of modifications and ensure consistency across the system.
Weaknesses of orchestration
Coupling of services: All services that participate in the orchestrated requests are coupled with the orchestrator, which is coupled with multiple downstream services.
Network overhead: Orchestration requires a full round of request-response from each service to the orchestrator. This could increase the latency of such a request, especially if it is synchronous and the user is waiting for the response.
Single point of failure: Pretty self-explanatory.
Less scalable: Orchestrators always have to scale to handle all requests that are in progress.
Choreography
Now for the tougher one. Choreography is more difficult to implement and a bit harder to grasp. The main difference between those two approaches is the presence or absence of an orchestrator. In choreography, there is no orchestrator. All services must pass the outcome of their work to the downstream service or services. The whole process of serving a business request must be thoughtfully choreographed upfront. While in orchestration, we can make changes only in the orchestrator to change the order of called microservices, in choreography, we cannot do such a thing. We have to prepare new choreography for it, just like in real life.
In our example business request, when placing an order in an e-commerce application, we now have to couple every service with one upstream service and one downstream service. When a user starts a business request, it first goes to orders-service, it makes sense to start handling the order creation process there. Next, it goes to products-service, with the request to check product existence, specifically for the order creation process. This is also a first difference in implementation; we cannot just implement an endpoint that allows to read products by some identifier, we have to state what is the reason for this so that the product "knows" that it has to send a request to warehouse-service to decrease stock amount for this product. The same rule applies here, and this endpoint should be specifically created for the purpose of creating an order. So, after reserving the required amount of stock for a given product, the request is sent to the order service to create an order for the requesting user. And that's it, at this point, the same business request is fulfilled. Below, you can find an example flow of requests for such a process.
Things get significantly more complex if something fails along the way. Orchestrator was specifically designed to handle such issues, but in choreography, we have to somehow pass information about failure to every interested upstream service. Let's consider that the situation is a bit more complex than it appears here, with two additional services nestled between the warehouse service and the orders service. Those two services do not require any compensating transaction on order creation failure. Should we pass the error message through those services upstream? Or should we inform interested service directly? Or maybe we should inform the first service and let it pass the exception to services that need to compensate for this error? In the first case, we create unnecessary network overhead and use very complex handling. In the second case, we are creating coupling, which otherwise would not be there. The third scenario is an anti-pattern for choreography as it creates fake orchestration. At this point, I have not even mentioned availability issues; if service is not available, it cannot send information that order creation failed. It is up to the upstream service to send information in its name. This further increases the complexity of this situation.
From my experience, it would be best not to handle such cases at all. If we need to make sure that our data is consistent across microservices, this means that choreography might not be the best solution here, and it would be better to pick orchestration. As you can see, handling exceptions is harder in the world of choreographed microservices, and flow is less obvious. Choreography pairs well with asynchronous communication and a loose approach to consistency. Something that should be achieved not only as a choice of coordination for microservices but also as an approach to the system as a whole.
Strengths and weaknesses of choreography
Strengths of choreography
Scalability: There is no orchestrator which has to scale with everything.
Loosely coupled: Services should only know about their upstream and downstream services; there is no orchestrator that knows everything and is coupled to everything.
Less network overhead: There is no orchestrator-service ping-pong. Thanks to this, we can expect better latency for the whole business request in choreography.
Weaknesses of choreography
Hard exception handling: Without centralised process management, we must implement exception handling in every service. Properly catching every possibility can be very difficult, and it may even be impossible.
Requires shift in thinking about systems: Choreography tends to be much more event-driven. Coming from a monolithic application, it might be hard to switch to this approach. Also, to counter the above-mentioned exception-handling weakness, it would be best to embrace eventual consistency, which might be the next complication for both developers.
Harder to implement properly: To implement choreography properly, we should already have a pretty mature system. From my experience, choreography "likes" to be implemented on message brokers rather than HTTP. It's also very often used to scale the system better, so we are assuming that some horizontal scaling is already implemented.
Which one to use and when
When choosing the right coordination approach in microservices, we should ask ourselves a few questions that might help us make the best decision. Does our business request or process require high data consistency? Do we have complex error scenarios that have to be handled? Have we just decided to decompose our monolithic application? Are we, as a company, inexperienced in creating distributed systems?
If the answer to most of the above questions is no, then you should be good with orchestration.
If it is quite the opposite and the answer to most questions is no, then choreography should work for you.
It is worth noting that for teams inexperienced in this eventual consistency, event-driven world, even if the rest of the answers are, it still might be good to choose orchestration. This might save you a lot of headaches and night phone calls for on-call developers. Orchestration can do well with most of the things that choreography can do, but nothing will save you if you implement choreography, and it will turn out that it should not be there. From my experience, choreography works best for some pipelines that mostly store data in databases, do some sort of processing, etc. Also, if your services do not mutate data in the database anyhow, just do some computations and return results in the end, there is no point in choosing orchestration, even if you are not experienced. Whole complexity is connected to data management and consistency.
Final words
The example in this article was very simple, but we can also face the challenge of coordinating requests that take much more time in microservices. There could be one service that works for a few hours on one request, and we still need to continue coordination from the place where it was left.
I have not mentioned almost anything about the communication medium between services in the above examples. We can choose from various synchronous or asynchronous mediums for communication, which also affects complexity, scalability, responsiveness, and resilience.
Yours
K