Today, I want to announce that the Unified Concurrency is going Cross-Platform!
Unified Concurrency is now compliant with .Net Standard 2.0, ensuring access in .NET 4.7+ and .NET Core 2.0+, Mono 5.4+.
The recent cross-platform development in the .NET world seems to be unstoppable and from the .Net Standard 2.0, it seems that the adoption rate of this platform has even accelerated across the spectrum in the open-source and commercial spheres alike. This was a great incentive for the Unified Concurrency to become cross-platform. It is still my intention to keep alive .Net 4.6 version as long as it will be possible, but the main development has shifted to the .Net Standard projects.
GreenSuperGreen (containing the Unified Concurrency) under the .Net Standard 2.0 also required complete benchmarking and cross-benchmarking libraries to undergo the same update, opening opportunity to run benchmarks, cross-benchmarks, and platform-cross-benchmarks on .NET / .NetCore and potentially Mono. It would be interesting to do also Linux benchmarking, but currently benchmarking is dependent on
PerformanceCounters and they are platform dependent (Windows) and a solution to this problem seems to be an opened question for the future versions of .Net Standard.
Examples of all the implemented synchronization primitives to start to play with are included as the unit test project under the .NetCore 2.1 here:
The Unified Concurrency framework is implemented in the open-source GreenSuperGreen library available on the GitHub and the Nuget.
It is now possible to use 3 more synchronization primitives and 1 another is internal only (benchmarking purposes).
AsyncSemaphoreSlimLockUC : IAsyncLockUC
SemaphoreSlim WaitAsync/Release based lock, which seems to play well in FIFO style, fair access.
Performance-wise similar to the
SemaphoreSlimLockUC : ILockUC
SemaphoreSlim Wait/Release based lock incorporates a hybrid approach with atomic instructions which does not play well in FIFO style, unfair access that can cause threads to stall!
SemaphoreLockUC : ILockUC
Semaphore WaitOne/Release based lock, operating system dependent, on windows roughly FIFO, fairness is not guaranteed.
MutexLockUC : ILockUC - internal, specific usage, benchmarking only
This synchronization primitive is not accessible, only to predefined benchmarking projects, because it requires thread affinity on entering and exit calls, which is not supported in the Unified Concurrency by design, but for the benchmarking it is maintainable and interesting for gathering data.
It has become general knowledge based on reports from Microsoft, outside sources and technical communities that .NET Core can speed-up an existing code base.
With cross-platform benchmarks, I can report improvements on two fronts.
In benchmarking scenarios, the sequential base-line for throughput period is a useful tool to measure potential speedup on the same hardware and the given code was 1.997 times faster on .Net Core 2.1 than on .Net 4.7.2. It does not mean that every code will be this times faster, only that certain code can be JITted more efficiently and thus run faster, but the potential speedup is always code-dependent, there will be cases where further optimization is not possible. Similar speedup has been reported by Stephen Toub for some specific cases.
Chart 1: Sequential Throughput Speedup .Net / .Net Core (Speedup is code dependent)
The result of the cross-platform benchmarking shows considerable improvement to the C# lock (Monitor class) under Heavy Load scenario and Bad Neighbor scenario as well.
.NET implementation is prone to CPU gridlock, a moment where C# lock (Monitor class) is wasting most CPU resources with very little work being done, effectively synchronization costs takes most CPU resources. This has been reported in previous articles.
.NET Core 2.1 seems to have a way better implementation of C# lock (Monitor class / AwareLock class in C++ of .NET Core 2.1 runtime).
Chart 2: CPU resource waste of C# lock (Monitor class) on .Net and .Net Core and LockUC on .Net Core.
Based on Chart 2, it is very easy to conclude where the .Net Core 2.1 is gaining in performance. C# locks are usually spread in most projects all over the code and are part of many common libraries including runtime itself and here, we see considerable CPU resources improvement in certain timing cases with more than 80% improvement in CPU waste! Please compare blue and green trend lines for C# lock,
JIT improvements are important, but multithreaded code full of C# locks is still prevalent in many projects from simple to Line Of Business code bases.
Even with simple projects, the performance gains can be considerable and incentive to upgrade to .NetCore 3.0 with access to WinForms and WPF can be very interesting.
This is an important improvement in the .NetCore 2.1 but there is still room for improvement left, as an example, we can consider the LockUC, please compare the green and red trend lines, which suggests that there is still about 10% to gain, but it is usually bought in part by little worse throughput below 1ms throughput periods, where atomic instructions based synchronization primitives can help gain throughput while managing reasonable CPU waste, but that requires modern architectural designs counting with many-core era processors.
Chart 3: Cross-benchmark of C# lock(Monitor) / LockUC under Heavy Load scenario, .Net 4.7.2, 16 cores.
Chart 4: Cross-benchmark of C# lock(Monitor) / LockUC under Heavy Load scenario, .Net Core 2.1, 16 cores.
This article serves as an announcement of the .Net Standard 2.0 version of the
GreenSuperGreen library (with built-in Unified Concurrency) including some improvements of the library.
We have discussed and showed with cross-platform benchmarks a great potential and incentive for the upgrade from .NET to .Net Core 2.1+, thanks to JIT compilation improvements and also improvements to the multithreaded code, thanks to C# lock (Monitor class) improvements in reduction of CPU waste.
- 16-03-2019: First version
- 24-03-2019: Correcting a few typos and 2 more cross-benchmarks.