This article tries to mark the important milestones, events and potholes we witnessed as we are moving from EXEs/DLLs to COM to Assemblies. I will try to be less verbose and more concise in this article so that we can avoid the jargon and quickly learn the important points.
The Stone Age of EXEs
In the days of good old MS DOS, we used to compile and link our nice little C code into an executable. Any additional code that we used in our program (for example, standard library functions provided by C) would get inserted into the EXE itself. This additional code was provided in the form of object (.OBJ) files by the Turbo C/Borland C environment. Those EXEs were self sufficient and you could *install* it on any machine by merely copying it to a folder.
- No way to share binaries. If you want to use code developed by someone else, you have to have the actual source code.
The era of LIBS (Static linking)
As software became bigger and more complex, there was a need to use (consume) what we call a ‘third party code’. For example, you would need to use a sockets library to do some network programming inside your own program. Such libraries were available in the form of library (.LIB) files. A LIB file would contain the object code of the library. The linking step had the task to ensure that the object code would be merged with the EXE. This is called as static linking.
- Binary reuse without the need to replicate the source code. All you need to have is a LIB file from a vendor and some documentation or the header file (.h) to check the function signatures.
- Bigger EXE size and waste of storage: as the library code used to get replicated in each EXE.
- Waste of memory: in multitasking environment, two copies of the same library function would be in memory unnecessarily.
- Any change in library required recompilation of the EXE to include that change in the EXE.
The progress to DLLs (Dynamic Linking)
The idea of code sharing remained the same. However, the library code was isolated from the executables to form a DLL (dynamic link library) file. These libraries would get linked with the application calling them at *runtime*. Also, multiple applications calling the library functions would share only one copy of them from the memory. The operating system was responsible to do this linking at runtime. A user just needed to place the DLL in the same folder or in the system folder of the OS, for this scheme to work.
- Solved problems related with static linking.
- Difficult cross language usage: DLLs and EXEs written in different languages were difficult to interoperate because the metadata stored in the DLL was language dependent.
- DLL versioning problems: If your program has been tested with a particular version of a library, but some other program installation updates the library with a latest version during the installation, your program may not work well. This is the infamous DLL hell.
- Lack of location transparency.
Then came the COM
COM provided an infrastructure so that the client applications can bind to the components at runtime. Unlike DLLs, at compile time, you had very little dependency on the library objects when your program is consuming COM objects.
- Language independent: COM objects have to follow the same memory layout laid out by the specification.
- Object oriented: Unlike the DLLs which were from the procedural programming era, COM offered object oriented programming based design.
- Complexity: COM was difficult to learn and understand.
- Registry storage: A reference to a COM object would be resolved with GUIDs-location mapping stored in the Windows registry which makes it (Windows) platform dependent and tedious to work with.
- Deployment and maintenance were painful.
Have we reached utopia with Assemblies?
The compiled .NET code is stored as an assembly. The assembly stores:
- IL code.
- Assembly metadata (manifest).
- Identity: Name, version and culture info.
- Names of files within the assembly.
- Types access data: private or otherwise.
- Security permissions.
- Type metadata.
- Details of types, methods and properties within the assembly.
- Resources: an assembly is therefore self-describing. It does not depend on external things like registry entries or type library files for reuse. ILDASM tool shipped with .NET gives an insight into the manifest and type metadata.
- Completely self describing and language independent.
- Two versions of the same assembly can be loaded: this is possible because the manifest also stores version info. (This ends the DLL hell.)
- Easy installation: a consumer program can just have the assembly in the same folder or GAC (Global Assembly Cache) and start using the objects defined.
- Requires the .NET framework installed (just like Java requires the JVM). For lightweight desktop utilities, users may complain about the huge .NET runtime download or install.
- Since the assembly stores very detailed metadata, reverse engineering of the assembly to the source code is possible. Obfuscators are available which take care of this to some extent by garbling the assembly so that it is difficult to understand for a human.
- Assemblies run in a managed environment. So the cons for the environment (like no direct control over resource freeing and so on) are applicable to assemblies.
- 03 January 2005: Included assembly cons.
- 16 December 2004: First draft.