COM Marshaling is a subject most developers can and do take for granted. Indeed the Microsoft COM engineers have done such a marvelous job of concealing the internals of marshaling that we are left totally unaware of the precision that take place with each cross-apartment method call.
This article is part one of a multi-part series of articles in which I plan to expound on the various methods of achieving COM Custom Marshaling. In this first installment, I will touch on the concept of a proxy and that of a marshal data packet. To this end, I have prepared basic sample implementation codes which we will walk through thoroughly. It is my hope that as we study this subject in-depth, we will come to realize and appreciate how marshaling works.
This series of articles is advanced in nature and I do expect the reader to possess prior knowledge on the following topics :
- COM development in general.
- Apartments (MTA and STA in particular).
- Interface marshaling in general (including the concepts of proxies and stubs).
Armed with the above-mentioned knowledge, we will now proceed to explore the world of custom marshaling. Let us start by going through some background on COM marshaling in general.
A Primer On COM Marshaling
The term custom marshaling is a misnomer. It is in fact a generic architecture that stipulates the rules and requirements of marshaling in general. What we know of as standard marshaling is actually a specific instance of custom marshaling . Any other forms of marshaling will be treated as various manifestations of custom marshaling. Each of these manifestations is transparent to the COM sub-system. For convenience of discussion, we will refer to standard marshaling as just that even though we know that it is Microsoft's specific implementation of custom marshaling. We will also regard all private, non-Microsoft marshaling as custom marshaling.
Marshaling is indelibly associated with apartments. We know that an apartment is a logical container inside an application for COM objects which share the same thread access rules (i.e. regulations governing how the methods and properties of an object are invoked from threads within and without the apartment in which the object belongs). We also know that all COM objects live inside exactly one apartment.
In order that an object be used in an apartment other than its own, the object's interfaces must first be exported from its original apartment and then imported into the target (client) apartment. The resultant interface pointer which is used by the client apartment, however, will not be the original object itself. It is instead something known as a proxy. When COM performs the exporting and importing of interface pointers, it does so using a conjugate pair of procedures known as marshaling and unmarshaling. For the sake of convenience, we will refer to all types of client apartments (as mentioned above) as simply client apartments.
When a method of the imported interface is invoked, the proxy must somehow pass control to the original object (in its original apartment), get it to execute the method and then return control back to the proxy. In this way, it is ensured that a method is always executed in the correct apartment. When control is passed to the original object, the proxy's thread must halt until control is returned to the proxy.
This passing of control from one apartment to another is also known as method remoting. Method remoting is how all cross-thread, cross-process and cross-machine communications occur in COM.
Custom marshaling is essentially the generic mechanism that lets a COM object control the way it communicates with its proxy in a client apartment. The client apartment can either be in the same process as the COM object or in another process or even in a process in another machine. When we implement custom marshaling, what we eventually wind up doing is to implement a custom proxy.
We mentioned earlier that when a method of an imported interface is invoked, the proxy must somehow pass control to the original object. Under custom marshaling, the protocol for this is totally within the domain of the object and its proxy. COM plays no part in the communication process. However, when a proxy is created, COM gives the object a one time chance to pass something to the proxy. This is done to help the object establish its protocol with its proxy. This something is known as a marshal data packet. We shall explore these in greater detail in the coming sections below.
A term which will be covered in a future article but is worth mentioning at this time is that of a stub. Stubs are not always required although they can serve to simplify object-to-proxy communications. As will be made clear later on, stubs will not be relevant in the sample codes that we present in this article.
The IMarshal interface.
To support custom marshaling, a COM object must implement the IMarshal interface. If it does not, COM deems that standard marshaling is to be used whenever this object's interfaces are to be marshaled. Hence the first point about creating custom marshaling is to ensure that your COM object implements the IMarshal interface.
The second point about creating custom marshaling is that a proxy object must be defined. This proxy is also a COM object and it must also implement the IMarshal interface albeit not all of the IMarshal methods need to be non-trivial.
Note also that once an object has declared that it will implement custom marshaling, it must custom marshal all of its interfaces. The IMarshal interface consists of 6 methods. We will study these more in-depth when we explore our sample codes. For now, a short examination of how COM interacts with the IMarshal methods would be appropriate as much of what will be expounded on very soon afterwards will depend on a good understanding of this interface.
When called upon to perform marshaling, COM first queries the object for its
IMarshal interface. Once COM discovers that it supports
IMarshal, it will call its
IMarshal::GetUnmarshalClass() method. This method returns to COM the CLSID of the object that will be used for the unmarshaling process. In other words, COM is asking for the CLSID of the proxy that will be created when unmarshaling takes place. Having a CLSID implies that this proxy is a COM object.
Later, when unmarshaling is to occur, COM will use this CLSID to create the proxy object within the importing client apartment. Sensibilities dictate that such a COM object should be housed within a DLL and that it should support both the Single-Threaded and Multi-Threaded Apartments. These requirements will ensure that there will be no need for a proxy to the proxy that has just been created !
Still within the context of marshaling, COM will next call the
IMarshal::GetMarshalSizeMax() method to determine the size of its marshal data packet. We will start to go deep into marshal data packets in the next section. The
IMarshal::MarshalInterface() method of the object will then be invoked. From this method, COM expects to obtain the marshal data packet from the object. This marshal data packet can then either be passed to client code (for later unmarshaling) or be kept by COM itself inside a table in memory (to be retrieved multiple-times later by client code for unmarshaling).
Now, when unmarshaling takes place, a proxy COM object must first be created. Then the marshal data packet must be passed to it via its
IMarshal::UnmarshalInterface() method. This is how a proxy becomes initialized. Being initialized could mean that a proxy now has what it takes to communicate with its original object, or that a proxy constructs itself as a clone of the original object. Whichever is the case, once a proxy has been initialized, COM will no longer play any part in object-to-proxy communications.
The Marshal Data Packet
From a high-level point of view, marshaling and unmarshaling is the collective act of transforming an interface pointer from a source apartment into a series of bytes and then transporting this series of bytes to a client apartment which will reverse-transform the bytes back into an interface pointer usable by the client apartment.
The transformation of anything into a series of bytes is known as serialization. The series of bytes obtained from serialization is more commonly referred to as a stream. The serialization of an interface pointer is better known as marshaling the interface pointer. The stream obtained from marshaling is also known as a marshal data packet. The marshal data packet stream object is always referenced by a pointer to an
The format of a marshal data packet used for custom marshaling is as follows :
The above diagram is a slightly modified version of the same taken from Don Box's book Essential COM (page 245).
The first part of the marshal data packet is a 4-byte signature which is hardcoded to the characters "MEOW" which is an acronym for Microsoft Extended Object Wire. This is followed by a single byte flags field. I personally do not know the purpose of this field. Next comes a 16-byte GUID field which is filled with the IID of the interface which is being marshaled. After that comes another 16-byte GUID field which is filled with the CLSID of the custom proxy which will be created in the client apartment.
We then have a 4 byte data the purpose of which is also unknown to me at this time. I resorted to calling this 4-byte field "Reserved" rather than speculate its actual intended purpose. The next field "Byte Count" is important as it indicates the length (in bytes) of the custom data which will follow.
This last section of the marshal data packet (i.e. "Custom Marshal Data") is particularly important. It is created by the original COM object and will be passed to a proxy during the proxy's initialization. To COM, this custom marshal data is essentially an opaque array of bytes that can contain anything as long as a proxy knows how to use it to establish communication with the original COM object or to re-create the object within the context of the client apartment. This array of bytes is the something that gets passed between an object and its proxy. This is the essence of custom marshaling.
We will revisit this structure again later when we go through our sample codes.
Note that unmarshaling may not always occur after an object has been marshaled. There is no rule stating this necessity. However because a marshal data packet potentially represents an object, its presence may warrant the need to add a reference count to the original object. This has to do with something known as strong and weak connections. We shall discuss this in a later article.
In this section, we present a simple example of marshaling known as "Marshal-by-Value". It is a perfect illustratory example for custom marshaling. The premise behind "Marshal-by-Value" is to create a proxy that is a clone of the original object. This concept applies to immutable objects (i.e. objects whose properties which will not change once they have been initialized).
Because immutable objects will never change the values of their properties, it makes no difference whether a proxy is a reference to the original object or is an exact copy of the object itself. This being the case, when a proxy to the object is required by a client apartment, we may as well clone the object and deliver it to the apartment.
And because a "Marshal-by-Value" proxy need not communicate with its original object, this example is simple enough for me to illustrate in-depth the proxy creation process. Proxy creation is a very important first step in understanding custom marshaling. Furthermore, since method calls will be made directly from the client code to the proxy, there will be no need for any marshaling of method arguments. This will be covered in a future article of this series.
The complete source codes for the basic example implementation are included in the zip file. Once unzipped, it will be stored in the following folder :