I think that many of you have heard of Microsoft Workflow. It has been widely promoted. Microsoft publishes some pretty images.
The technology is designed to allow software developers to create an API and the business analyst to create a business process independently, without intermediaries. For example, the client requested the following business process:
And a business analyst creates it. The software developers only have to implement "Accept" and "Reject" procedures and other similar items.
Sounds cool, doesn't it?
Unfortunately, I only learned about the technology from books and marketing publications, and, of course, from messages like "We first looked at Microsoft WF six months ago, we have been using it for a month - so far so good!" I am working with a product that has already had Workflow incorporated for 6 years. The product is heavily loaded with processes and so I would like to show you the main difficulties of using this library.
How Does It Work?
The idea is very simple: the software developer creates
Activity items. Each Activity may have parameters. For example, it is possible to create
RejectActivity with a
User property. Usually, it means that
Reject will be effective for this specific User. Essentially, each
Activity has an external representation (the way a business analyst will see it) and implementation.
When the business analyst creates a graph (draws a set of activities connected with arrows), it may be saved as an XAML file. The file may then be uploaded in Microsoft Workflow Runtime and executed.
It is possible to pause some Activities (for instance, Delay Activity). In this case, the working condition is serialized in the base. After some time (as we indicate), our working process will awaken and continue. To save, we will need a base or we will have to implement Persistence Service independently. Everything sounds awesome, right?
Sometimes, the business process needs to be paused. A typical example: we wait for a user response (i.e., we use Event Activity). In this case, we have normal serialization (XML serialization for Workflow 4.0+ or binary one for the older versions). I think the readers will understand that it is very easy to save some additional information or make a small mistake here when release A is used for saving, and release B is used for downloading. A typical example from Workflow 3.0: you subscribed to an Event using lambda or an anonymous method. By doing this, you created a new class field which is saved in the base and your download will crash because of deserialization. Hence, some important advice: the whole operational code must be placed strictly outside your Activities. All fields must be saved somewhere away from the Workflow. We save only the minimum and the simplest types in Activity. It is better to sacrifice internal design but not the stability.
In fact, the problem doesn't stop here. The most interesting part begins when we have to replace a set of fields. For example, we need to add Reason to our Reject Activity from the example. And here the code must be ready to the fact that the old
RejectActivity doesn't contain this code. For Workflow 4.0+, you need to replace the serialized representation in the base for all processes and for Workflow 3.0, this method doesn't always work (since the binary compressed representation is stored). That is why it will be impossible to quickly introduce changes in the document workflow.
Microsoft Workflow has a number of shortcomings. The problems occur during one-time execution (i.e., a set of operations is not efficient) and load distribution. Well, one thing at a time.
When Do We Need to Save?
Let's imagine that we have a running business process with zero delay (It doesn't matter how they came to be. What matters is that they are there). Workflow in each Delay Activity (even in the zero) serializes and saves, and then re-loads the scheme. Naturally, the response time is the most adversely affected. Advice: use Delay Activity as sparingly as possible, and even better – together with the If Activity, which will check that there is no actual need to wait. Another bad point is the fact that if you said to “wait for 5 days”, you won't easily be able to keep the Workflow from waiting for something if our instance is already in the base. Moreover, believe it or not, if the Workflow Runtime loads your work in the memory and sees that it will have to wait for a long period of time, it will simply live it in the memory and wait for a long period of time. Additional advice: don't use long delay times or you will have similar problems. It is better to use several short wait times or awake from an external Event, and your external service will then wake the process when required.
If we believe the very same articles from Microsoft, Workflow can distribute the work brilliantly. In other words, you can have several independent servers, which will each take several tasks, perform them, take the next ones and so forth. This plan has only one drawback: it doesn't work. Workflow's distributed execution is an extremely weird thing. It works according to the following algorithm:
- Take ALL unblocked working processes from the base that may now be executed AND block them for 5 minutes.
- In 2 minutes: repeat item 1.
Yes, numbers 2 and 5 are replaceable. Another thing is important: the very first lucky guy will take all of the work from the base. In fact, in two minutes he will clear the base once again, even if he has something else to do. If it doesn't make it in five minutes for any business process, something strange will happen: it will execute the working process (it will activate all WCF connections, etc.) and unsuccessfully try to save it in the database (there is no blocking!). As a result, the memory of this Workflow Runtime will forever contain this broken object, and it will not leave it voluntarily until you physically stop the process. Workflow Runtime won't stop by itself. It won't be able to do anything. Perfect implementation!
An even more beautiful scenario will occur if you set the blocking for more than five minutes. In this case, when the process stops, these entries will be blocked. This means you won't be able to just stop the process, since it may affect the Production in a very bad way. The problem is solved very easily, though: for correct distribution, you need to write the procedures for interacting with the base by yourself (in other words, you have to implement all procedures for parallel and distributive operation and make your own implementation of WorkflowPersistenceService).
Successful Work Under Load
It actually doesn't exist. Certainly, Microsoft argues that everything was fine, but they forgot about the schedule: the dependence of the number of Activities in the memory on the total operation time (and earlier, it used to be like that). In fact, it has not changed:
It is a quadratic graph. It is not the absolute values of time that matter here but the dependence: how long will your whole system work if the complexity of the business process grows. Moreover, in practice, it is this dependence of Execute time on the total number of Activities in the memory, and it is not important if it is a single working process or several. For instance, if you have 10 parallel working processes, the processing of each small Activity will consume more resources than if there is a single process, otherwise, 10 parallel tasks are processed longer than 10 consecutive tasks with non-linear dependence.
In the previous section, I wrote how Persistence Service works: it takes everything from the base. In fact, this sort of trick with 5,000 parallel complicated working processes is harmful for the system: the system starts working with a very low speed: 1–10 Activities a minute (!!!). This is possible if the processor operates at almost 100% load. The problem is clear, but how can it be solved? The solution is to make one's own Activity processor, re-use the Activity and emulate your operating processes. You will actually have to quickly implement the basic component of Workflow, which is responsible for starting and stopping the Activity. This is required, first, to prevent a large number of launched Activities to access the business process (as it slows everything down), and, secondly, to speed up the time of serialization/deserialization (to quickly force the unnecessary processes from the memory). Microsoft Workflow never deletes the processed Activity. You will have to organize everything in such a way so as not to let the completed Activities remain in the memory.
All problems can be overcome.
But it is necessary to evaluate the costs and risks. Use Workflow Cost Calculator to appreciate the advantages, and the loss.
Author of the original article: Igor Manushin.