Today, I want to announce that the Unified Concurrency is going Cross-Platform!
Unified Concurrency is now compliant with NetStandard 2.0, ensuring access in .NET 4.7+ and .NET Core 2.0+, Mono 5.4+.
The recent cross-platform development in the .NET world seems to be unstoppable and from the NetStandard 2.0, it seems that the adoption rate of this platform has even accelerated across the spectrum in the open-source and commercial spheres alike. This was a great incentive for the Unified Concurrency to become cross-platform. It is still my intention to keep alive Net46 version as long as it will be possible, but the main development has shifted to NetStandard projects.
GreenSuperGreen library (containing the Unified Concurrency code) under NetStandard 2.0 also required complete benchmarking and cross-benchmarking libraries to undergo the same update, opening opportunity to run benchmarks and cross-benchmarks and platform-cross-benchmarks on .NET vs. .NetCore, potentially Mono. It would be interesting to do also Linux benchmarking, but currently benchmarking is dependent on
PerformanceCounters which are not yet platform independent (outside Windows) and a solution to this problem seems to be an opened question for the future under NetStandard.
Examples of all the implemented synchronization primitives to start to play with are included as the unit test project under the .NetCore 2.1 here:
The Unified Concurrency framework is implemented in the open-source GreenSuperGreen library available on the GitHub and the Nuget.
It is now possible to use 3 more synchronization primitives and 1another is internal only (benchmarking purposes).
AsyncSemaphoreSlimLockUC : IAsyncLockUC
SemaphoreSlim WaitAsync/Release based lock, which seems to play well in FIFO style, fair access.
Performance-wise similar to the
SemaphoreSlimLockUC : ILockUC
SemaphoreSlim Wait/Release based lock incorporates a hybrid approach with atomic instructions which does not play well in FIFO style, unfair access that can cause threads to stall!
SemaphoreLockUC : ILockUC
Semaphore WaitOne/Release based lock, operating system dependent, on windows roughly FIFO, fairness is not guaranteed.
MutexLockUC : ILockUC - internal, specific usage, benchmarking only
This synchronization primitive is not accessible, only to predefined benchmarking projects, because it requires thread affinity on entering and exit calls, which is not supported in the Unified Concurrency by design, but for the benchmarking it is maintainable and interesting for gathering data.
It has become general knowledge based on reports from Microsoft employees reports, outside sources and technical communities that .NET Core can speed-up an existing code base.
With cross-platform benchmarks, I can report improvements on two fronts.
In benchmarking scenarios, the sequential base-line for throughput period is a useful tool to measure potential speedup on the same hardware and the given code was 1.997 times faster on NetCore 2.1 than on Net4.7.2. It does not mean that every code will be this times faster, only that certain code can be JITted more efficiently and thus run faster, but the potential speedup is always code-dependent, there will be cases where further optimization is not possible. Similar speedup has been reported by Stephen Toub for some specific cases.
Chart 1: Sequential Throughput Speedup Net / NetCore (Speedup is code dependent)
The result of the cross-platform benchmarking shows considerable improvement to the C# lock, the
Monitor class under Heavy Load scenario and Bad Neighbor scenario as well.
.NET implementation is prone to CPU gridlock, a moment where C# lock (Monitor class) is wasting most CPU resources with very little work being done, effectively synchronization costs takes most CPU resources. This has been reported in previous articles.
.NET Core 2.1 seems to have a way better implementation of C# lock (
Monitor class /
AwareLock class in C++ of .NET Core 2.1 runtime).
Chart 2: Sequential Throughput Speedup Net / NetCore (Speedup is code dependent)
Based on Chart 2, it is very easy to conclude where the .NetCore 2.1 is gaining in performance. C# locks are used all over the code usually and are part of many common libraries and here, we see considerable CPU resources improvement in certain timing cases with more than 80% improvement in CPU waste! Please compare blue and green trend lines for C# lock,
JIT improvements are important, but multithreaded code full of C# locks is still prevalent in many projects from simple to Line Of Business code bases.
Even with simple projects, the performance gains can be considerable and incentive to upgrade to .NetCore 3.0 with access to WinForms and WPF can be very interesting.
This is an important improvement in the .NetCore 2.1 but there is still room for improvement left, as an example, we can consider the LockUC, please compare the green and red trend lines, which suggests that there is still about 10% to gain, but it is usually bought in part by little worse throughput below 1ms throughput periods, where atomic instructions based synchronization primitives can help gain throughput while managing reasonable CPU waste, but that requires modern architectural designs counting with many-core era processors.
This article serves as an announcement of the NetStandard 2.0 version of the
GreenSuperGreen library (with built-in Unified Concurrency) including some improvements of the library.
We have discussed and showed with cross-platform benchmarks a great potential and incentive for the upgrade from .NET to .NetCore 2.1+, thanks to JIT compilation improvements and also improvements to multithreaded code, thanks to C# lock /
Monitor class improvement in reduction of CPU waste.
- 16-03-2019: First version