Several of my recent consulting projects dealt with composite applications,
specifically desktop composite applications. A composite application consists
of a host (shell) and a number of plugins, often developed by different teams
In this scenario it is usually desirable to isolate the host from plugin failures.
I often used
for this purpose. Eventually, I came to the conclusion
AppDomains are not very good isolators, for two main reasons:
- Error handling is very difficult to do right.
- Unloading plugins is not guaranteed.
Thus, meeting even basic robustness requirements for the application is difficult or even impossible.
This does not mean that
AppDomains are useless. They still provide convenient
partitioning mechanism, especially if one team controls all moving parts. The shortcomings outlined in this
article may or may not be important for a particular project. If the project is able to tolerate certain
degree of failure,
AppDomains may still be a viable isolation solution for it.
The Idea Behind AppDomains
In a nutshell,
were invented for efficient isolation of third party code (plugins, components, web applications).
A host process, such as ASP.NET server, needs to load plugins (web applications) securely and
efficiently. Of course, Win32 processes already provide such isolation, but they were deemed too
heavyweight for the job, as described in
this blog entry by
Chris Brumme from Microsoft.
Main difference between an
AppDomain and a process is that processes have
their own threads, and
AppDomains don't. To visualize that, let's compare
threads to cars. While you drive your Mercedes Thread in the
of USA you can see only American data. You can drive it to the
AppDomain of Canada,
but the moment you cross the border, you are cut off from American data and can now see
only Canadian data. Your Mercedes Thread is not pinned to a particular
However, no matter what road it takes, it cannot leave the process
of North America (we ignore Panama land bridge for the sake of argument).
Similarly, someone driving a Mercedes Thread in the process of Europe may cross
AppDomain of France to the
AppDomain of Spain, but they
can never reach North America and access American or Canadian data.
To run a reliable, secure, and efficient host, our isolation mechanism should have the following properties:
- We must be able to load and execute plugins, with restricted security if necessary.
- Plugins should not be able to corrupt host data.
- If a plugin fails, the host must be able to detect this and unload the failing plugin.
- It must be possible to unload plugins on demand.
- Unloading a plugin should clean up any resources allocated for that plugin. If it does not,
the host process will accumulate waste and will eventually fail.
Operating system processes satisfy all these requirements. Achieving restricted security
for a child process may be tricky, but it is typically possible.
Unfortunately, AppDomains do not fare very well with these requirements. They do excellent job
with #1 and #2. One can easily restrict security of the plugin,
and host data is protected. However, we run into major difficulties with #3, #4.
The sad reality is that
- There is no way to reliably detect a failure in an
AppDomain. And, even if we could
- There is no way to reliably unload a failing
Also, there are some issues with #5. Per Chris Brumme
there is a small memory leak on each
AppDomain unload. More importantly, there is no way to unload
any domain neutral assemblies: once loaded into the process, they are there to stay. This, however, looks minor compared
to the problems we have with the exception handling.
Legacy vs. Default Exception Handling
Default Exception Handling
By default, an unhandled exception in any thread terminates
the application unconditionally. This is bad news for runtime hosts. If a plugin creates a thread and that thread causes
an unhandled exception, the whole host process dies. We can do last ditch effort error handling in
handler, but termination of the process cannot be stopped.
In WPF and Windows Forms applications, UI threads can be protected from unhandled exceptions, because they have a
try/catch block supplied by the UI framework. However, worker threads lack such protection.
In a desktop application it is considered best practice to perform long operations on a worker thread. So, the
scenario where a plugin spawns a worker thread and that thread causes and unhandled exception is very real and possible.
This makes default exception handling policy a bad choice for host-plugin architecture.
Legacy Exception Handling
Fortunately, default exception handling is not the only option. Prior to .NET 2.0 unhandled
exceptions in worker threads did not automatically kill the process. To revert to this legacy behavior
we can add the following snippet to the application configuration:
Unfortunately, this still does not buy us full protection from plugin failures - read on.
Exception! Whose Fault Is That?
To effectively unload the crashing plugin we must first detect which plugin has crashed.
Frankly, even with legacy exception handling this is virtually impossible.
When an unhandled exception occurs, the framework raises
AppDomain may have its own
UnhandledException handler. In a typical scenario,
UnhandledException will first be raised in the failing
AppDomain and then again
in the main
AppDomain. This works reasonably well if the exception type is
But if it's not, by the time the flow execution reaches main
AppDomain things become muddy:
- The original exception is replaced with
- Information about
AppDomain that caused the exception is lost.
- A parasitic
SerializationException will be thrown in the main
SerializationException contains surprisingly little information about what happened.
At this point is not distinguishable from a genuine unhandled
that could have occurred in the host itself.
The original idea of the
AppDomain.UnhandledException design was perhaps to allow main
to process all unhandled exceptions regardless of origin. In practice that goal was not achieved.
It is also worth noting that most user-defined exception classes will not be marked as
simply because application programmers don't see a need to do that.
The host may try to pass exception information from the plugin's
AppDomain using some custom
UnhandledException handler in the plugin's
AppDomain can explicitly call
a centralized exception monitor object located in the main
AppDomain, passing it only
serializable objects like plugin's
AppDomain name and exception string. This scheme, however,
would still be prone to failure, because the plugin's
AppDomain may be in unknown state after
an unhandled exception, and successful communication with the host's exception monitor cannot be guaranteed.
A mechanism supported by the framework is required for reliable operation, but such mechanism does not exist.
Unloading Failing Plugin
Even if we managed to figure out what plugin is causing trouble, this is not the end of the story.
There is no way to gracefully unload plugin that is in an unknown state.
If plugin is executing native code that cannot be interrupted (e.g., file I/O), it will not
be unloaded at all.
will fail with an exception similar to this:
System.CannotUnloadAppDomainException: Error while unloading appdomain. (Exception from HRESULT: 0x80131015)
If plugin is executing background threads, they will be aborted with
In default exception handling mode this exception will be then quietly swallowed by the framework. However,
in legacy exception handling mode it will raise
AppDomain.UnhandledException in the main
AppDomainUnloaded exception carries surprisingly little information. In particular,
it does not say what
AppDomain was unloaded. Therefore, it is impossible to figure out
whether this is an expected exception from dying background threads of a plugin that is being unloaded, or
some other peculiar error.
ASP.NET uses AppDomains. How Does It Survive?
Experiment shows that ASP.NET takes a hands-off approach to reliability. Each application pool runs
a worker process (
w3wp.exe). Each web application in the pool runs in an
When an application causes an exception on a worker thread, the whole process dies, taking down all
other applications, perfectly good applications with it. If those applications were processing web
requests, these requests will be remembered. ASP.NET will then create a new worker process, and pass
it cached requests (if any) for handling.
This approach works relatively well mostly because the Web is stateless. Any state passed between
requests, such as cookies is small and well-defined. The demise and resurrection of the ASP.NET worker
process remains invisible to the user or the application programmer, unless they take special steps
to detect it.
Obviously, such hands-off approach is not viable for a desktop application: restarting the whole
application and losing unsaved data when a single plugin fails would not be welcome by the users.
AppDomains provide certain degree of isolation between parts of the application,
but this isolation is limited. A number of design decisions and features of .NET framework
make proper error handling very difficult. Exceptions pop up in unexpected places, and exception
objects carry very little context with them.
Unloading plugins is not guaranteed. This is hardly framework designer's fault: Windows
threads were not designed to be gracefully interruptible, but this gives little consolation
to the application authors.
Depending on the requirements,
AppDomains still can be very useful, especially
if efficiency is more important than absolutely reliability, such as in case of ASP.NET.
However, for truly isolated application one may want to consider using processes
AppDomains, like in Baktun Shell.
Unfortunately, this is not a panacea either: multi-process desktop application are not
mainstream, and many unexpected pitfalls may arise, especially when using third party libraries.
For better or for worse, such is the nature of software development: there is no easy way out,
it is all about tradeoffs.