Binary Data Marshaling

Hatem Mostafa

Rate me:

4.63/5 (12 votes)

13 Jan 20055 min read

60.7K

1.9K

Fast binary data marshaling using simple CMarshal class.

Introduction

Data Marshalling is the process of converting data objects into a data stream that corresponds with the packet structure of the network transfer protocols. Or represent data objects in a standard format that can be sent and received by network protocols, and retranslated in the other side.

Many ideas are used for data marshalling, but the common thing between them is that they are trying to precede any data object with its type, and all represent that in many formats:

XML: uses XML tags and attributes to represent data information and XML text to hold values, and the XML can be sent in a text format (array of characters), and pared in the outer side to construct data objects again.
Binary: uses binary header before each data object to identify its type and length if needed. (Figure 1)
Text: uses only text to represent data, like: (string:plapla,int:3232,short:43,...) and uses simple parsing I the other side.

First and third ways are good and representable, and human readable, but they are slow in applications that need performance, so I followed the second way in my class to gain speed. In this article, I will try to simplify the idea by introducing a simple and fast marshal class, that can collect data in many formats and, send it to another marshal object cross sockets connections.

Binary Marshalling:

Binary Marshalling means butting data objects in a binary format, each data object preceded by its type as in Figure 1:

73:    's' character means the current element is a string
0d 00: 2 bytes to keep string length
...:   string ASCII bytes
69:    'i' character means the current element is a short

and so on,  'type',' value',...

The advantage here is that, the type is always saved in one byte, and each type is saved in its maximum allowed bytes, and if the type is variable length like string, the length is kept in 2 bytes. So the parsing process will be so fast, just direct access. But there is some points that should be taken to do some special cases, like:

Marshaling object: To marshal objects like classes and structures, it is needed to inherit from a simple class CMarshalObject, that has two functions for serializing and deserializing object data, so your class that you need to marshal must implement these two functions, as the CMarshal class calls them internally during the marshalling and unmarshaling processes. At the marshal buffer, the type of the object is the char 'o'.
Marshaling vectors: To marshal vector of any type, just the type is preceded with 'v' character, mean vector, so the marshaled buffer will be like that to marshal array of characters:
```
Text: vcHatem Mostafa
Binary: 76 63 48 61 74 65 6d 20 4d 6f 73 74 61 66 61
```
Marshaling objects vectors: You can marshal vector of objects by preceding the object type 'o' with the character 'v' as in the previous point.

Remember, you don't have to do all of that yourself, I have introduced helpful functions with my class to do all that.

Class functions

High level functions:

`Marshal`	Marshal any number of data type in just one call, using Optional arguments function.
`Unmarshal`	Unmarshal any number of data type in just one call, using Optional arguments function.
`Send`	Send marshaled data through the connected socket.
`Recv`	Receive marshaled data through the connected socket.

bool Marshal(LPCSTR lpcsFormat, ...);
bool Unmarshal(LPCSTR lpcsFormat, ...);

Ex:
Client side:
    char c;
    int n;
    vector<string> vs;
    ...
    CMarshal obj;
    obj.Marshal("%c%vs%d", c, &vs, n);
    obj.Send(socket);
Server side:
    CMarshal obj;
    obj.Recv(socket);
    obj.Unmarshal("%c%vs%d", c, &vs, n);

The case in these functions is simple, just <marshal, send> in the client side, and <receive, unmarshal> in the server side.

Note: You can send and receive from any side.

Low level functions:

`PopType`	Pop the type at current index in the marshaled buffer.
`Pop`	Pop current data at current index in the marshaled buffer.
`PopObject`	Pop current object at current index in the marshaled buffer.
`PopVector`	Pop current vector at current index in the marshaled buffer.
`PopObjectVector`	Pop current object vector at current index in the marshaled buffer.
`Push`	Push data at the index of the marshal buffer.
`PushVector`	Push vector at the index of the marshal buffer.
`PushObjectVector`	Push object vector at the index of the marshal buffer.

All of these functions deal directly with the internal buffer of the marshal object, to adjust the buffer as in Figure 1, or parse the buffer to fill data objects in the unmarshaling process.

Points of Interest

The marshal object uses a String class for all internal buffer handling, just I offered some helpful operators with the String class, like:

const String & operator+=(const String & string);
const String & operator+=(LPCTSTR lpsz);
const String & operator+=(LPTSTR lpsz);
const String & operator+=(const unsigned char* lpsz);
const String & operator+=(int n);
const String & operator+=(short s);
const String & operator+=(double d);
const String & operator+=(float f);
const String & operator+=(char c);

which helps me in pushing any data types in the stack of the marshal object.

String class that I use in this code is like the MFC CString class with some added operators as in the previous point.

Sockets synchronization is the best thing you can find in this article.

The functions Send and Recv at the marshal object can be used from client or server sides, but what will be the case if one client used a marshal object in two threads and want to send at the same time with the same socket?
From the sockets library documentation, sockets are not thread safe. So at the client side you should take care of calling Send from multiple threads with the same socket. You should use synchronization objects to serialize calls to the Send function.

If the client calls Send from many threads (using synchronization objects), and each thread calls Recv for the same socket, how can they get there replies correctly, the thread that has the current time slice will receive first!!!. So, I followed a good technique here to solve this problem:

Each thread should send its unique ID to the server in the Send function.
Client should receive replies for this socket in one place (thread).
All threads should be suspended in the Recv function, waiting for its replay from the common place (thread).
The server should precede each client reply with client ID.

That what I have did in my code:

At the Send function:

// create event to be used at the Recv
m_hEvent = ::CreateEvent(NULL, FALSE, FALSE, NULL);
// insert marshal pointer as a unique ID
m_data.Insert(0, (int)this);

At the client, I am using a common thread for receiving from this socket:

void ClientRecv(void *lpv)
{
    SOCKET sock = (SOCKET)lpv;
    CMarshal* pMarshal;
    try
    {    
        while(true)
        {
            // recv client marshal pointer
            if(recv(sock, (char*) & pMarshal, 
                     sizeof(int), 0) != sizeof(int))
                break;
            // check for version
            if(pMarshal->m_fVer != 1)
                continue;
            // recv data using recieved marshal
            if(pMarshal->RecvData(sock) > 0)
                // set the marshal event to 
                //let its thread continue execution
                if(::SetEvent(pMarshal->m_hEvent) == false)
                    continue;// just to put breakpoint
                              // for debuging
        }
    }
    catch(...)
    {
    }
}

At the Recv of the client thread, I suspended using the event created at the Send function:

// check if m_hEvent initialized in the Send
if(m_hEvent)
{    // wait tell the recv thread fire my event
    if(::WaitForSingleObject(m_hEvent, 60000) == WAIT_TIMEOUT)
        return 0;
    ::CloseHandle(m_hEvent);
    m_hEvent = 0;
    return GetLength();
}

Source code files

Marshal.cpp, Marshal.h
Socket.cpp, Socket.h
String.cpp, String.h
mem.cpp, mem.h

Thanks to...

I awe a lot to my colleagues for helping me in implementing and testing this module. (JAK)

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Written By

Hatem Mostafa

Software Developer (Senior)

Egypt

Hatem Mostafa CV

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.