Introduction
Modern operating systems offer synchronization primitives to allow programs to utilize the CPU efficiently. Much has been written regarding the proper use of synchronization in multithreaded programs. Jeffrey Richter and Christophe Nasarre's Window via C/C++ is the current offering from Microsoft Press. If we examine the book, we find nearly half of the 800+ pages is related to the topic. As if multithreading in a sterile environment were not difficult enough, Michael Howard and David LeBanc add additional requirements in Writing Secure Code.
Many architects, programmers, and QA teams do an excellent job of designing, writing, and testing code for functional requirements. However, some neglect to examine the software system for flaws in a hostile runtime environment. Books on this subject include Michael Howard and Steve Lipner's The Security Development Lifecycle and Gallagher, Jeffries, and Landauer's Hunting Security Bugs.
When modeling an application for security threats, much attention is paid to the malicious input. But there are other defects which can be much more damaging to an application - namely unsecured synchronization. Gary McGraw, author of Software Security: Building Security In, stated synchronization flaws will become a future threat:
[developers] often overlook subtle time-based attacks. I believe that timing attacks (both data races and starvation attacks) are a future category that will be more commonly encountered than they are now [p. 203]
One of the first papers on the subject was Michal Zalewski's Delivering Signals for Fun and Profit. Zalewski's paper, written in the context of Linux/Unix, was more concerned with using signals as a stepping stone in obtaining root. Root is a lofty goal, and much mayhem can be caused without it.
This article will examine effects of an attacker on a multithreaded program, and what steps we can take to alleviate most of the issues. We will use a familiar example - a multithreaded program which spawns a thread to perform counting. In general, we will focus on an a single program with two threads. But what we learn easily extends to multiple programs. In addition, the techniques cross the user land/kernel boundary, so software such as services and drivers can be effected by the issue if not properly designed and hardened.
Downloads
There are two downloads available with the article. The downloads and their use are as follows:
- Sample.zip - The base program described under Sample Program
- Attacker.zip - The evil program which is hell bent on breaking the sample program
The Sample Program
The sample program consists of a main program thread and a second (computation) thread. The main thread creates the computation thread and then enters an efficient wait state while the second thread sums all values from 1 to 10,000. Before the second thread exits, the thread signals to the first thread that computation is complete..
The main thread is shown below. The variables start, end, and sum are global in scope. We could have passed them to the thread function using lpParameters, but using globals does not materially effect the discussion. The event which is created by the main thread grants full access to the main thread (EVENT_ALL_ACCESS).
DWORD sum=0, start=1, end=10000;
...
hEvent = CreateEvent( NULL, TRUE, FALSE, L"Computation Complete" );
if( NULL == hEvent ) {...}
hThread = CreateThread( NULL, 0, ComputationThread, NULL, 0, NULL );
if( NULL == hThread ) {...}
DWORD dwWait = WaitForSingleObject( hEvent, INFINITE );
if( WAIT_OBJECT_0 != dwWait ) {...}
cout << "Sum: " << sum << endl;
ComputationThread simply sums values. We use the "Computation Complete" event to signal the main thread that work is complete. The event was opened with EVENT_MODIFY_STATE since we only need to signal the event (we do not need to wait on the event, so SYNCHRONIZE was not included in the access mask).
const DWORD dwAccess = EVENT_MODIFY_STATE;
hEvent = OpenEvent( dwAccess, TRUE, L"Computation Complete" );
if( NULL == hEvent ) {...}
for( DWORD i = start; i <= end; i++ ) {
sum += i;
}
BOOL b = SetEvent( hEvent );
if( FALSE == b ) {...}
Our sample looks like any boiler plate multithreaded code. We have a program that runs in user land, which creates a manual reset event named Computation Complete, performs the work, signals the event, and then outputs the results. The only offensive observation is that EVERYONE has Full Control over the event since we used a NULL DACL.
Since we have read Writing Secure Code and we are observing security best practices, we #include <sddl.h> and clamp the ACL using ConvertStringSecurityDescriptorToSecurityDescriptor. SDDL, introduced in Windows 2000, makes life much easier when working with DACLs. For example, if we wanted to clamp an ACL on a securable object so that only LocalSystem (SYSTEM) received full access, we would perform the following. In the code below, SY is the SID string used by SDDL to represent SYSTEM (for a complete treatment, refer to Writing Secure Code, Chapter 6).
SECURITY_ATTRIBUTES sa;
sa.nLength = sizeof(sa);
WCHAR wszSddl[] = L"D:P(A;OICI;GA;;;SY)";
ConvertStringSecurityDescriptorToSecurityDescriptor(
wszSddl, SDDL_REVISION, &(sa.lpSecurityDescriptor), NULL );
...
LocalFree( sa.lpSecurityDescriptor );
In our sample code, we do not want SYSTEM or Administrators interfering with our endeavors so we build a SID string that represents ourself to the system (and no others). Unfortunately, the two obvious choices SDDL_PERSONAL_SELF (PS) and SDDL_CREATOR_OWNER (CO) result in our worker thread receiving an ACCESS_DENIED when opening or creating the event. So we have to rough it a bit and append the sid string ("A;OICI;GA;;;") with our user SID. In the author's case on a test machine, the SID is "S-1-5-21-1208687838-169375588-2040895995-1146". So author's full SID string for SDDL will be "D:P(A;OICI;GA;;;S-1-5-21-1208687838-169375588-2040895995-1146)".
To build the full SID string for SDDL, we perform the following. Cleanup code has been omitted for clarity.
SECURITY_ATTRIBUTES sa;
sa.nLength = sizeof(sa);
HANDLE hToken = NULL;
BYTE* pBuffer = NULL;
WCHAR* pwszSid = NULL;
bResult = OpenProcessToken( GetCurrentProcess(), TOKEN_QUERY, &hToken );
if( FALSE == bResult ) { ... }
DWORD dwLength = 0;
bResult = GetTokenInformation( hToken, TokenUser, NULL, 0, &dwLength );
if( FALSE != bResult || ERROR_INSUFFICIENT_BUFFER != GetLastError() ) { ... }
pBuffer = new BYTE[ dwLength ];
if( NULL == pBuffer ) { ... }
bResult = GetTokenInformation( hToken, TokenUser, pBuffer, dwLength, &dwLength );
if( FALSE == bResult ) { ... }
bResult = ConvertSidToStringSid( ((TOKEN_USER*)pBuffer)->User.Sid, &pwszSid );
if( FALSE == bResult || NULL == pwszSid ) { ... }
const DWORD SIZE = 255;
WCHAR wszSddl[SIZE];
HRESULT hr = StringCchPrintf( wszSddl, SIZE, L"D:P(A;OICI;GA;;;%s)", pwszSid );
if( FAILED( hr ) ) { ... }
bResult = ConvertStringSecurityDescriptorToSecurityDescriptor(
wszSddl, SDDL_REVISION_1, &(sa.lpSecurityDescriptor), NULL );
...
Using the code above, the DACL on the event is shown in Figure 1.

Figure 1: DACL on Event
A complete run of the program results in the very boring output shown in Figure 2.

Figure 2: Boring Program Execution
The Attack Program
The attack program is simple - it wants to acquire a handle to the event used by the sample program and pulse the event in the hopes of producing incorrect results. The attack program is lucky - it does not need to worry about details such as security or reputation. The code is shown below:
const DWORD dwAccess = EVENT_MODIFY_STATE;
HANDLE h = OpenEvent( dwAccess, FALSE, L"Computation Complete" );
if( NULL != h ) {
PulseEvent( h );
CloseHandle( h );
}
To help with the timing of the attack, the loop of the worker thread in sample will be modified to throttle the computation process:
for( DWORD i = start; i <= end; i++ ) {
if( 0 == (i % 1000) ) { Sleep(1000); }
sum += i;
}
A run of the programs (starting sample first and then attack) results in the output shown in Figure 3. Note that the result is not 50005000 as expected.

Figure 3: Sample Program under Attack
Even though we diligently applied a DACL and kept computation in a single process, an evil program was able to corrupt the computation. If you find this alarming, you have good reason.
The Problem
Before we attempt to fix the problem, we need to understand what went wrong and why. First, we observe that sample was like any other user application which runs under a standard user account. Attack was also a user land program running under a standard account. A user logged on and both programs were available for execution with no fanfare.
Next, there was no need to attempt to Write-Up (as in Vista Integrity Levels or Bell-La Padula security models), trampoline into code with elevated privileges, or perform any other exotic tricks. In fact, the operating system gave us the keys to the kingdom because both programs were running under the security context of the logged in user.
When we think about this, nearly all Ring 3 code is at risk from other Ring 3 code (as is Ring 0 from Ring 0 without the need to subvert access control). In addition, Ring 0 code is at risk from Ring 3 code when sharing synchronization and other objects. It's not hard to image a need for user land code to signal a kernel component to perform a requested action, and Microsoft offers guidance in Q106387, How To Share Kernel Objects Between Processes.
It appears that these high performance primitives are not as secure as one might be lead to believe. So when we ask, What went wrong and why?, the short answer is that nothing went wrong - everything functioned as advertised and expected.
The Landscape
Most current mechanisms focus on system survivability, and seem to overlook the needs and requirements of non-operating system programs. That is, programs are sandboxed or restricted to protect the operating system, and not the program in question. Unfortunately, the system as a whole is made up of both operating system and non-operating system components. So sacrificing one child for another does not appear to be the best strategy.
In addition, existing implementations push the burden on the System Administrator rather than with the rightful owner: the programmers. Part of this trend is because programmers have done a very poor job of observing security best practices in the past.
Linux has at least three mechanisms in place: chroot, SELinux (NSA/RedHat/Fedora), and AppArmor (Novell). Reading on SELinux can be found at Wiki, while Novell publishes an AppArmor FAQ and the AppArmor Guides. SELinux vs AppArmor is debated like the religion of Operating Systems.
When using chroot, all program resources, including other programs, are placed in a sandbox by the Administrator. SELinux uses extensions such as Mandatory Access Controls (MACs) in addition to DACLs. In RedHat's system, a program gets only what it needs and nothing more (Principle of Least Privilege). But again, the operating system and Administrators are forced to devise a solution for programmers. Finally, Novell's AppArmor is another least privilege tool similar to SELinux except that kernel recompiles are not required. But as with the others, AppArmor requires Administrator intervention.
The Fixes
There are a range of fixes available to the programmer. Some require operating system support, others require abandoning Win32 best programming practices to achieve an acceptable level of hardening. Unfortunately, application measures such as .NET Framework Roles, Database Roles, COM+ Roles, and Trusted Code Assemblies do not fully address the problems. For example, COM+ Roles roles are similar to security groups. They are designed and implemented by the software author. While roles can limit user activity and protect code within the software module, they cannot limit malicious activity outside of the sandbox. Yet another example is .NET Framework Roles, which are enforced by the application and runtime environment. Unfortunately, languages such as C# defers primitives such mutexes to the operating system through use of Interop Services and P-Invoke.
Software ACE
For the programmer, the cleanest fix would be implemented by the operating system. During the NT 4.0 days, only users were considered security principals. It was not uncommon to run a service using a fictitious user account to access a file on the network on behalf of the machine so that work could be performed. Under Windows 2000, machines were added to the list of security principals, so machines could be assigned privileges to access an object.
What we desire is to have the operating system to recognize programs as a security principal on the local machine. If installed software were a security principal on the local machine, the clamped DACL would contain the programs (for example, program 'Sample.exe' with hash ... ) rather than the user 'John Doe'. In this scenario, the program Attack.exe would not have access to the synchronization object. In the hypothetical situation where a user land program needed to synchronize with a kernel component, two programs would be present in the DACL of the securable object - the user land program and the kernel component.
The benefit to using a 'Software ACE" is that Win32 programmers are very accustomed to working with DACLs and ACEs. Based on Microsoft's current architecture, it would be trivial to create program groups similar to other security groups once software ACEs are in place. Microsoft has documented procedures for modifying DACLs, and authors such as Howard, LeBlanc, and Richter offer plenty of guidance. Q193073, How To Modify Default DACL for Sharing Objects, is one such example. Since an attacker might attempt to modify an ACE by changing its status from allow to deny, there should not be a deny ACE for programs - only a whitelist of authorized software modules.
Anyone who is familiar with Authenticode should immediately recognize that many of Authenticode's shortcomings due to infrastructure also exist in this enhanced access control scheme. For Authenticode, the problem is that the signature can be easily removed, rendered useless, or untrusted. If the operating system required authentication on all modules, and the signatures could be traced back to a trusted source or vendor, many complaints with Authenticode could be withdrawn.
In our example, the immediate short coming is that evil programs, with enough rights and privileges, will assign themselves to the DACL of the object. It is fairly obvious we need an Authenticode-like mechanism which is enforced by the operating system. In the modified system, the operating system would not allow the malicious program to add itself to the DACL.
With executables as security principals and an infrastructure in place, we will see that this solution is very extensible when we begin examining objects other than events.
Nameless Events within a Process
Our first attempt at hardening uses a nameless event within a process. We move the event handle created in the main thread to global scope for convenient access from the computation thread. The computation thread, once running, performs its work and then signals the event. We then run the program and examine the results using Process Explorer. As Figure 4 shows, an attacker will have a more difficult time signalling the event since the event name is not available.

Figure 4: Event Listing using Process Explorer
This method works well when events are contained in a single process, or some sort of parent-child relationship exists between processes so that handle inheritance can be utilized. But, under certain circumstances, a duplicate handle to an event can be obtained to the object.
Nameless Events Across Processes
Richter and Nassre explain how we use inheritance to share synchronization objects in Windows via C/C++. However, there are times when a parent-child (or perhaps parent-grandchild) relationship does not exist between modules. In this case, we can take advantage of a little known fact about executables in Windows (as in EXE, not DLL): executables can export functions. In this situation, our first executable would export a function similar to below. The function would return a process relative handle to the nameless event.
HANDLE FetchTheSecretHandle( DWORD dwTargetProcessId )
The function is somewhat inconvenient, since the executable exporting the function must convert dwTargetProcessId to a handle in the context of the callee, call DuplicateHandle, and then return the duplicate to the caller. But more concerning is the fact that if a good guy can call it, a bad guy can call it.
To use FetchTheSecretHandle securely so that a bad guy cannot obtain a handle, we require some sort of authentication on the calling process. Interestingly, we can easily build this functionality into FetchTheSecretHandle. The authenticator is the same as we proposed for use by the operating system: the hash of the image of the caller. In this scenario (callee performs an authentication check), the program duplicating the handle must carry around signatures (i.e., hashes) of allowable callers. For information on checksumming a module in memory, see Dynamic TEXT Section Image Verification. Unfortunately, the caller must have READ rights to the caller's virtual memory to achieve the validation which poses problems for Ring 3 in general.
As with events within a single process, a duplicate handle to an event can be obtained to the object under certain circumstances.
Nameless Events Using Software Devices
Another interesting prospect is using a software device to shepherd signaling within the Windows I/O model. For example, the main thread would open the pseudo device and then issue a synchronous or asynchronous I/O operation. In the synchronous form, the main thread would simply block on a call to ReadFile. For the asynchronous version, the main thread would use the hEvent member of the OVERLAPPED structure to pass events between the processes. Alertable I/O and I/O Completion Ports are also well suited when using a software device.
In addition to the various I/O mechanisms such as ReadFile/Write file, a software device also allows us to use Device I/O Control to manage state of processes. While the service cannot control which process manipulate the synchronization, it can perform validation services for modules involved in the sharing.
However, using I/O infrastructure and device control codes to break the chain of custody still does not fully mitigate the fact that a handle can be obtained at times.
Named Events and Shadow Variables
For years Jeffrey Richter has told us not to use a boolean value to signal a thread. Richer's classic example is the main thread spinning on the boolean variable waiting for the worker thread to modify the value. We can combine an event and a boolean shadow to validate the modification of the event. This scenario satisfies:
- efficient wait states and
- resists synchronization attacks
The modified code of the main thread is shown below (note that the volatile BOOL bReallyStop was declared with global scope).
volatile BOOL bReallyStop = FALSE;
...
hEvent = CreateEvent( &sa, TRUE, FALSE, L"Computation Complete" );
...
for( ; ; ) {
dwWait = WaitForSingleObject( hEvent, INFINITE );
if( bReallyStop == TRUE ) { break; }
if( WAIT_OBJECT_0 != dwWait ) { ... }
continue;
}
The worker thread code would be modified to assign TRUE to bReallyStop before setting the event "Computation Complete". In the worker thread code below, an additional modification was added to directly inject the security fault. Half way through summing, the worker pulses the event to simulate the attacker.
for( DWORD i = start; i <= end; i++ ) {
if( i == (end-start)/2 ) { PulseEvent( hEvent ); }
sum += i;
}
bReallyStop = TRUE;
Named Events and Shadow Variables Across Processes
The simplest way to share shadow variables across processes is through the use of a DLL and a shared data segment. However, Microsoft specifically recommends against sharing data across processes though it may be prudent to investigate due to the landscape.
Auto-Reset Events
While manual reset events can use a simple boolean value to authenticate an event action, an event which is pulsed requires additional logic. In this case, a monotonically increasing counter seems to work best. Before a thread (which waits on an auto-repeat event) enters a wait state, the thread should note the current sequence number. If awakened by the event and the sequence number has not changed, a thread should return to its wait state. When using sequence numbers, the thread which pulses the event should increment the counter before pulsing. In this respect, threads participating in the wait use the counter similar to secure communication protocols which defend against playback attacks.
Unpredictable Names
It is not uncommon for one process to generate an unpredictable name and share the name among concerned parties. For example, a process might generate a GUID and use the textual representation of the GUID as a unique synchronization object name. A weakened variation on the GUID approach is the practice of appending the Process Id to a well known prefix; and appending the user's SID to a well known string.
Unfortunately, these techniques have the short comings of other named synchronization objects - namely, an attacker may be able to discover the name using runtime inspection, guessing, or asking a module for the object name through an exported function similar to FetchTheSecretHandle.
File System Monitoring
It's hard to imagine going to disk as an event signaling mechanism, but technically feasible. For example, before computation begins, the main thread creates a lock file on disk. Once computation completes, the worker deletes the lock file to signal completion. The main thread would monitor the directory for changes and respond to the missing lock file. Unfortunately, we encounter the same 'known name' issues of other schemes. In addition, the system suffers from poor performance due to disk access.
Mutexes and Singletons
Q243953, How to limit 32-bit applications to one instance in Visual C++, offers source code for a classical Singleton. However, the classic Singleton is susceptible to denial of service since an attacker can grab the mutex in advance. Microsoft, being aware of the issue, recommends using a lock file if a denial of service is a concern. Using a lock file could expose an application to the problems of FetchTheSecretHandle, except the function is now named FetchTheSecretName.
A variable in a shared data section manipulated using atomic operations such as InterlockedCompareExchange or InterlockedBitTestAndSet (and friends) may prove to be the most secure method of assuring a single instance. Again, Microsoft does not recommend sharing data across processes but the landscape has changed.
Mutexes on Shared Data Structures
While Microsoft recommends using a file as a lock for the singleton issue, the solution does not lend itself well to frequently accessed data structures. For efficiency and hardening against an adversary, we would again want to use a shadow variable to validate ownership.
Other Objects
At times, we also need to use other objects such as Critical Sections, Semaphores, and Shared Memory (semaphores are a generalized case of a mutex). As we observed earlier, nameless objects such semaphores can usually be safely inherited and used. Pipes (Named and Anonymous) are very similar to sockets and will be discussed below.
Critical sections are inherently safe to use because the operating system is not used as an entry point. That is, the critical section only exists in the current process and the malware cannot obtain access. In the past, developers would use the critical section simply because of efficiency. But in light of an attacker's ability to exploit a heavier weight synchronization object, code bases should be reviewed for cases where a critical section would suffice but another object is being used. We have all heard of the Principle of Least Privilege - now we have the Principal of Least Object.
Named semaphores may be cause for concern. Fortunately, semaphores can synthesized using the various Interlocked* functions if the variable data is shared among modules (similar to the shadow variable). When using a semaphore, the simple boolean variable to validate ownership is not sufficient since many threads will be allowed access through the gate. In this case, we would want to validate using a shadow integer variable (rather than boolean) using the various Interlocked functions.
Shared memory is a somewhat harder problem since we may not know the exact size of required memory in advance. If the handle to shared memory cannot be inherited, it is probably exposed. We could share a key and then require that threads MAC their shared memory writes. However, an attacker can still write to the memory (even if he cannot forge authenticity). In this case, consider using a named pipe, socket, RPC (even though local), or a software device for IPC. Named pipes and sockets in lieu of shared memory are discussed below.
Pipes and sockets are very similar in our regards. Sockets are the 4-tuple { source IP, source port, destination IP, destination port }. Pipes can be one way or a bidirectional channel. Once a connection is established using a socket or pipe, the connection is fairly secure on the local machine (an attacker will have to try to inject messages over the socket without the aide of the wire). Since an attacker could establish a connection to either module, an authentication mechanism should be used. Similar to other schemes, we could share a MAC key across modules using a 64 bit integer. The data would not be encrypted - only authenticated - so that the source of the data can be verified.
Blake Watts has a paper on security issues related to pipes. The paper is somewhat dated, but it is good reading none the less. See Discovering and Exploiting Named Pipe Security Flaws for Fun and Profit. We should almost always use anonymous pipes; or create the pipe with a random name if used across processes and use the FILE_FLAG_FIRST_PIPE_INSTANCE flag. Keep in mind that using a random name reduces the attack surface - it does not completely remove vulnerabilities.
We recognized the utility of using a secure socket for IPC across processes, but be aware that the process involves managing key pairs which adds additional overhead and attack surface. Secure sockets offer privacy, authentication, or both; but we found that we only required the authentication - encryption was simply defence in depth.
Enforced Cooperative Sharing
At times, it is often desired to proceed with an action even though a synchronization object cannot be acquired. A classic example is sharing the Desktop. As another example, suppose we used a mutex as a singleton with a shadow variable to validate ownership. The malicious attacker has ownership of the mutex, but the shadow variable indicates the mutex should be free. Our expectation that other programs are cooperating is most likely wrong. In this case, we would like a method to acquire the resource so we can proceed with the operation.
Though we can wait indefinitely for access to an object, there is no built in safety for releasing objects. We would like a means to specify a maximum hold time to the operating system to ensure there are no starvation issues where appropriate. In this example, CreateMutex might be modified as follows:
HANDLE WINAPI CreateMutex(
LPSECURITY_ATTRIBUTES lpMutexAttributes,
BOOL bInitialOwner,
LPCTSTR lpName
DWORD dwMaxHoldTime
);
As with WaitForSingleObject and related functions, the modified function would allow us to either pass a 'sane' hold time or INFINITE to assure starvation does not occur. An offending application could be closed by the operating system if it is misbehaving.
Since forcefully terminating an application by the operating system is a drastic measure, programs may benefit from a middle ground. In this scheme, an application would call the enhanced version of create mutex. During program execution, when it is found that a mutex owner exceeded a reasonable amount of time, WaitForSingleObject would return a value indicating the current owner has exceeded dwMaxHoldTime (similar to a return of WAIT_FAILED or WAIT_ABANDONED). The waiting program would then take appropriate actions.
While the addition of a maximum hold time seems like it would add undo hardships on the operating system, this is not the case. For example, rather than set a timer for precisely determining a violation, the operating system could maintain a simple variable which is incremented each time the holder receives a time quantum. Once the accumulated time (quantum * count) is exceeded, a waiting object could be notified. In this scenario, a violation of dwMaxHoldTime is somewhat soft but effective none the less. Based on the code to perform various simulations, we found this method required minimal changes to data structures and code to determine offenders.
We do realize there are other issues such as thread scheduling and priorities; and that an unrelated high priority thread could incorrectly give the impression that a malicious program is holding an asset. The most obvious fix is to verify that the program in question has exceeded its hold time and received adequate CPU time. Simulations indicated that the benefits of ensuring cooperation through maximum hold times (balanced with CPU scheduling) outweighs the risk of runaway high priority threads in most cases.
Ring 0 - Ring 3 Sharing
Using nameless objects and handle inheritance is one of the most effective methods for neutralizing synchronization attacks. Unfortunately, Microsoft does not provide a public mechanism for a driver to launch a user mode component. This means a user land process has most likely not inherited handles from a kernel parent (there are exceptions, but the reader will have to fully investigate).
Others have hacked the functionality by injecting a thread from a kernel component into a user mode process, and then using the injected thread to launch a separate user mode process. For an example, see MetaSploit's Ring 0 to Ring 3 stager at http://metasploit.com/svn/framework3/trunk/lib/rex/payloads/win32/kernel/stager.rb. However, hacking, injecting, staging, and surreptitious launching is probably something we want to avoid.
Because it is difficult (if not impossible) to inherit handles across the kernel/user boundary, both the user and kernel component must find common ground to share objects. This common ground is usually the currently logged on user's SID. This means a kernel component must relax security to accommodate sharing with user land processes. An alternative is to add SYSTEM to the user component, which is usually a bad idea.
In lieu of threats that take advantage of this fact, a method for kernel components to launch user land modules would be a welcomed addition. Similar to the Software ACE, the Operation System should enforce a security policy. That is, kernel component with hash=0xAAAAAA... may only launch Ring 3 executable with hash=0xBBBBBBB.
The recommendations of Q106387, How To Share Kernel Objects Between Processes, can be dangerous with an adversary nearby. The issue of Ring 0/Ring 3 sharing is the single most compelling argument to implement Software ACEs.
A Bad Idea
One method that was examined (to simulate operating system support) was to use local user accounts in place of Software ACEs. While the system worked as expected within the bounds of the test, the system also carried with it the poor coding practices of attempting to manage a hard wired password. In addition, using this mechanism was simply awkward - by using a separate user account, additional logons were occurring which has its own set of side effects.
It was clear that trying to fulfill our needs using the existing security system (i.e., without the Software ACE) was not a good idea. In the end, there does not appear to be a means to synthesize operating system supported Software ACEs using local user accounts.
An Interesting Idea
One interesting idea that occurred to us was the use of canary synchronization objects. The canary object lives its life as a honey pot attempting to lure malware in. For example, a process might open an event named "SHUTDOWN_PROGRAM" (which it never honors). If the event fires, there is a good chance the event originated from a hostile program. In this case, the program could save data state and exit before possible corruption. The program under attack could also attempt to gather intelligence. But a more interesting defense is to kill the originating process. The later is especially appealing for Ring 0 modules under attack from Ring 3.
A variation on this theme is to build a list of processes which have a particular (and perhaps legitimate) handle open. Since the module performing the check would know a priori what other modules might posses a handle to an object (i.e., a whitelist), the module could terminate any infringes or squatters.
As intriguing as this approach seems, it gets even better for antivirus and malware detection vendors: an antivirus company could sell services to 'protect' a program or group of programs by acquiring a whitelist from the software authors. Then, when a hostile program performs a suspicious or damaging act (such as pulsing an event or acquiring a mutex), the antivirus could terminate the hostile program on behalf of the besieged.
Conclusions
Due to a malicious adversary, the customary use of synchronization objects must be revisited. In general, current code bases that only use 'naked' synchronization objects will require validators or authenticators to confirm object manipulation; and threads which wait on these objects must perform a validation that the event is genuine and not an attacker attempting to manipulate or otherwise influence program execution. Programmers who previously used unnamed objects and handle inheritance because it saved time and effort will now be required to use nameless objects and inheritance due to the hostile environment.
While it is possible to harden applications by augmenting existing uses of synchronization objects, an operating system supported Software ACE would be the easiest mechanism to use generically across securable objects. In the case of sharing objects between kernel components and user components, the Software ACE is a necessity. And with Software ACEs in place, the burden is shifted from the Administrator and back to where it belongs: the programmer.