Click here to Skip to main content
Click here to Skip to main content

Passing parameters between C++, PHP, JavaScript, etc...

, 2 Nov 2006 CPOL
Rate this:
Please Sign up or sign in to vote.
Easy to implement minimal format for simple data exchange, especially between new and obscure scripting languages.

Introduction

A common programming task (the only task?) is moving and manipulating data, sometimes between different languages. This can be a chore since language creators don't implement any standard serializing format for nested data such as arrays. In fact, many languages don't support any native serializing at all.

Scripting is all the rage these days. Perhaps, it's just the particular field I work in, but I seem to run into more and more scripting languages. Yes, there are the ubiquitous languages, JavaScript, ASP, PHP, etc... but there are countless other scripting languages for industrial robotics control, camera control, phone switching, on and on.

At some point, you want to get data into or out of these languages, and the choices can be vague or less than optimal. XML is standard, but XML data is bulky and hard to parse. An XML parser is not something I really want to add to ACME's Burger Flipper Scripting Language, just so you can get burger flipping statistics into your Total Meal Performance package. Ultra simple string formats are not extensible, and require you to constantly upgrade everything to handle new parameters. Other custom formats I see used are often inefficient, limited in usefulness, or buggy.

The goal of this article then, is to provide an easy to implement minimal format for simple data exchange especially between new and obscure scripting languages.

Instead of calling it ETIMFFSDEEBNAOSL, which sounds German, I'm going to refer to this format formally as Simple Cross-language Serializing or SCS, so we have a shorthand, and later, in design reviews, we can use cool phrases like, 'I'll just SCS that data to you' and make nearby management people feel dumb.

Goals

Unfortunately, the section above already one-upped this one by not only mentioning the goal, but making a joke about it. So, instead of restating it, I'll just go into more detail.

Above all, simplicity and flexibility will be stressed. There will be ways in which the protocol could be expanded to reduce the size of the serialized data, such as using binary, or compression, etc.... But the trade off would be to increase the complexity of the encoder / decoder, and thus increase the implementation / debugging time and raise the risk of compatibility flaws and lazy implementations.

Since this data format is chiefly for scripting formats that are new and/or obscure, we want something that is quickly and easily implemented.

So, steps to obtaining our goal will be to...

  • Define a compact portable data format for nested data.
  • Outline an easy to implement encoder / decoder.
  • Provide example encoder / decoder in C++, PHP, and JavaScript.

Use

To make this article a little easier to read and understand, I'm going to cover a few example uses before I go further, since I don't have a cool picture to put on top of the article. Here is a simple example of how SCS will be used for, say, passing data from PHP to JavaScript, for instance...

// Create PHP array
$person_info = array( 'Fred'=>array( 'hair'=>'blonde', 'eye'=>'blue', 'age'=>26 ) );

// Send to Javascript
echo 'var javaArray = DeserializeArray( "' . SerializeArray( $person_info ) . '" );';

Or how about sending data from PHP to C++:

// PHP - Return data on fred

$person_info = QueryDatabase( "fred" );
echo SerializeArray( $person_info );

// C++ - Use fred's age

CPsPropertyBag  cPersonInfo( GetPhpReturnString() );
long lAge = cPersonInfo[ "Fred" ][ "Age" ].ToLong();

Why another format?

If you Google 'pass array PHP JavaScript', substituting your favorite languages, you'll find various implementations including many lazy implementations of standards like XML. The goal here is to define the simplest possible practical implementation. In fact, ideally, a lazier implementation will not even be possible.

Note on XML. It seems I've run into many people that claim XML is the end all format. I've never seen such strong claims of one-size-fits-all before, despite the many thousands of formats that have gone before. But, it seems many are determined to make XML the only format for data exchange. Someone has even once suggested to me that I base-64 encode streaming video data and wrap it in XML to make it 'standard'. I don't share, and am not going to consider, such a narrow view on any format or language. There is a clear trade-off between a formats feature set and the complexity of implementation. I will offer what I hope is an objective comparison with XML for the challenges within the scope of this article.

What about other standards? There is a huge list of similar protocols, check out XML Alternatives for a start. After many hours of pouring through many formats, I was unable to find a good fit for the objectives explained here.. If I've missed an identical solution, you're welcome to rub it in. But, I doubt you will be able to say it was obvious. At some point, one just has to bite the bullet and get things done; at least, I'm sharing in an obvious place...

Custom solutions. Another thing I'm going to look at, is a common runaway encoding technique many people use on custom implementations. It seems that their implementation started as a flat representation, such as a=1,b=2,c=3, then recursion was added to handle nested data. Though the simplicity is hard to beat, the data expansion from nested encoding can be pretty significant. It also makes the nested values hard for a human to read. Because of these two factors, I will stray from simplicity to solve this one problem.

Here is such a runaway implementation in PHP:

function RunawayEncode( &$params, $sep = '&', $arr = '?' )
{
    $ret = '';
    foreach( $params as $key => $val )
    {
        if ( strlen( $key ) && $val )
        {
            // Continue link

            if ( strlen( $ret ) ) $ret .= $sep;

            // Multidimensional assignment

            if ( $arr && is_array( $val ) )
                $ret .= $key . '=' . $arr . rawurlencode( MakeParams( $val, $sep, $arr ) );

            // Flat assignment

            else $ret .= $key . '=' . rawurlencode( $val );

        } // end if


    } // end foreach


    return $ret;
}

function RunawayDecode( $params, $sep = '&', $arr = '?' )
{
    $ret = array();
    $parr = split( $sep, $params );

    foreach( $parr as $val )
    {
        $kv = split( '=', $val );

        // NULL param

        if ( !isset( $kv[ 0 ] ) || !strlen( $kv[ 0 ] ) );

        // One dimensional

        else if ( !isset( $kv[ 1 ] ) )
            $ret[ $kv[ 0 ] ] = 1;

        // Flat assignment

        else if ( !$arr || $kv[ 1 ][ 0 ] != $arr )
            $ret[ $kv[ 0 ] ] = rawurldecode( $kv[ 1 ] );

        // Multi dimensional assignment

        else $ret[ $kv[ 0 ] ] = 
            ParseParams( rawurldecode( substr( $kv[ 1 ], 1 ) ), $sep, $arr );

    }

    return $ret;
}

The data format

The encoding rules for SCS data:

  • The equality character '=' will separate name from value.
  • The comma character ',' will separate name/value pairs.
  • The curly bracket characters '{' and '}' will enclose nested data.
  • All data will be encoded as strings using the URL encoding scheme as described in RFC 1738.

Pretty simple? I choose RFC 1738 encoding because many script languages have built-in functions, and if not, it's easy to implement (see the C++ ScsSerialize.h header for an example). Additionally, much of the same logic behind using this encoding in URLs applies here. This format also gives us the advantage of being able to read most data rather easily. Here is an example of an encoded array:

Mary{Married=No,DOB=7-2-82}

The PHP declaration of the above array would be:

$test_array = array( 
                        'Mary'=>
                            array(
                                    'Married'=>'No',
                                    'DOB'=>'7-2-82',
                                ),
                    );

Here's a slightly more complex example demonstrating nested encoding. Notice how data is only encoded once.

Mary{Married=No,DOB=7-2-82,Pets{Dog=1},Invalid%20Characters=%21%40%23%24%25%5E%26%2A%28%29}

Declared in PHP...

$test_array = array( 
                        'Mary'=>
                            array(
                                    'Married'=>'No',
                                    'DOB'=>'7-2-82',
                                    'Pets'=>
                                        array(
                                                'Dog'=>1,
                                            ),
                                    'Invalid Characters'=>'!@#$%^&*()'
                                ),
                    );

Parsing

As mentioned, the big pro here is that this can be easily parsed. Here is an example PHP implementation. You'll notice that the encoding function is similar in complexity to the above runaway example; however, the decoding function is more complex. This extra complexity avoids the re-encoding of data. Myself and a few others attempted a simpler decoder and struck out. I'd be very interested if anyone is able to do better.

function ScsSerialize( &$params )
{
    $ret = '';

    foreach( $params as $key => $val )
    {
        if ( $ret ) $ret .= ',';

        // Save the key

        $ret .= rawurlencode( $key );

        // Save array

        if ( is_array( $val ) ) $ret .= '{' . ScsSerialize( $val ) . '}';

        // Save single value if any

        else if ( strlen( $val ) ) $ret .= '=' . rawurlencode( $val );

    } // end foreach


    return $ret;
}

function ScsDeserialize( $params, &$last = 0 )
{
    $arr = array();

    $l = strlen( $params ); $s = 0; $e = 0;

    while ( $e < $l )
    {
        switch( $params[ $e ] )
        {
            case ',' : case '}' : 
            {
                // Any data here?

                if ( 1 < $e - $s )
                {
                    // Divide

                    $a = split( '=', substr( $params, $s, $e - $s ) );
                    
                    // Valid?

                    if ( !isset( $a[ 0 ] ) ) $a[ 0 ] = 0;

                    else $a[ 0 ] = rawurldecode( $a[ 0 ] );

                    // Single value?

                    if ( !isset( $a[ 1 ] ) ) $arr[ $a[ 0 ] ] = '';

                    // Key / value pair

                    else $arr[ $a[ 0 ] ] = rawurldecode( $a[ 1 ] );

                } // end if


                // Move start

                $s = $e + 1;

                // Punt if end of array

                if ( '}' == $params[ $e ] ) 
                {   if ( $last ) $last = $e + 1; return $arr; }

            } break;

            case '{' :
            {
                $k = rawurldecode( substr( $params, $s, $e - $s ) );

                if ( isset( $k ) ) 
                {
                    $end_array = 1;

                    $arr[ $k ] = ScsDeserialize( substr( $params, $e + 1 ), $end_array );

                    $e += $end_array; 

                } // end if

                
                $s = $e + 1;
                
            } break;

        } // end switch


        // Next e

        $e++;

    } // end while


    return $arr;
}

Here's the JavaScript version. Not quite a one to one, as JavaScript apparently doesn't support references of generic types.

function ScsSerialize( x_params )
{
    var ret = '';
    for ( var key in x_params )
    {
        if ( key && x_params[ key ] )
        {
            // Continue link

            if ( ret ) ret += ',';

            // Save the key

            ret += escape( key );

            if( x_params[ key ].constructor == Array ||
                x_params[ key ].constructor == Object )
            {
                ret += '{' + ScsSerialize( x_params[ key ] ) + '}'; 
            }
            else ret += '=' + escape( x_params[ key ] );

        } // end if

    }

    return ret;
}

function ScsDeserialize( x_params, x_arr )
{
    var l = x_params.length, s = 0, e = 0;

    while ( e < l )
    {
        switch( x_params[ e ] )
        {
            case ',' : case '}' :
            {
                var a = x_params.substr( s, e - s ).split( '=' );

                if ( 1 < e - s )
                {
                    // Valid?

                    if ( null == a[ 0 ] ) a[ 0 ] = 0;

                    // Decode

                    else a[ 0 ] = unescape( a[ 0 ] );

                    // Single value?

                    if ( null == a[ 1 ] ) x_arr[ 0 ] = '';

                    // Key / value pair

                    else x_arr[ a[ 0 ] ] = unescape( a[ 1 ] );

                } // end if


                // Next data

                s = e + 1;

                // Punt if end of array

                if ( '}' == x_params[ e ] ) return e + 1;

            } break;

            case '{' :
            {
                // Get the key

                var k = x_params.substr( s, e - s );

                if ( k.length )
                {
                    // Decode the key

                    k = unescape( k );

                    // Decode array

                    x_arr[ k ] = Array();
                    e += ScsDeserialize( x_params.substr( e ), x_arr[ k ] );

                } // end if


                // Next data

                s = e + 1;

            } break;

        } // end switch


        // Next e

        e++;

    } // end while


    return e;
}

I know it's kinda long, but I'll go ahead and post the C++ version just so this article is as complete as possible. The mad cut-and-pasters will appreciate it, I'm sure. The actual encode / decode functions are about the same, but I added functions for converting from strings to integers and doubles etc... Just to make things easy to use. C++ does not have built-in support of this type.

#include <span class="code-keyword"><map></span>

#include <span class="code-keyword"><string></span>


//==================================================================

// TScsPropertyBag

//

/// Implements a multi-dimensional property bag with nested serialization

/**

    This class provides functionality for a multi-dimensional
    property bag. It also provides automatic type conversions
    and, hopefully, easily ported serialization.

    Typical use

    CScsPropertyBag arr1, arr2;

    arr1[ "A" ][ "AA" ] = "Hello World!";
    arr1[ "A" ][ "AB" ] = (long)1;
    arr1[ "B" ][ "BA" ] = (double)3.14159;
    
    for ( long i = 0; i < 4; i++ )
        arr1[ "list" ][ i ] = i * 2;

    // Encode
    CScsPropertyBag::t_String str = arr.serialize();

    // Let's have a look at the encoded string...
    TRACE( str.c_str() ); TRACE( _T( "\n" ) );

    // Decode
    arr2.deserialize( str );    

    // 'Hello World!' check...
    TRACE( arr2[ "A" ][ "AA" ] ); TRACE( _T( "\n" ) );

    // Get long value
    long lVal = arr2[ "A" ][ "AB" ].ToLong();

    // Get double
    double dVal = arr2[ "B" ][ "BA" ].ToDouble();

    // Get string value
    LPCTSTR pString = arr2[ "list" ][ 0 ];

*/
//==================================================================

template < class T > class TScsPropertyBag
{
public:

    //==================================================================

    // CAutoMem

    //

    /// Just a simple auto pointer

    /**
        This class is a simple auto pointer. It has properties that I
        particularly like for this type of job. I'll quit making my
        own when boost comes with VC... 
    */
    //==================================================================

    template < class T > class CAutoMem
    {
        public:

            /// Default constructor

            CAutoMem() { m_p = NULL; }

            /// Destructor

            ~CAutoMem() { release(); }

            /// Release allocated object

            void release() { if ( m_p ) { delete m_p; m_p = NULL; } }

            /// Returns a pointer to encapsulated object

            T& Obj() { if ( !m_p ) m_p = new T; return *m_p; }

            /// Returns a pointer to the encapsulated object

            operator T&() { return Obj(); }

        private:

            /// Contains a pointer to the controlled object

            T       *m_p;
            
    };

    /// Unicode friendly string

    typedef std::basic_string< T > t_String;

    /// Our multi-dimensional string array type

    typedef std::map< t_String, CAutoMem< TScsPropertyBag< T > >

Comparison

In terms of simplicity, it's hard to get much simpler. The only simpler versions I have seen are of the runaway type, or language specific. Such as manually outputting a JavaScript array, for instance. In this case, our work is lost if we want to now switch to another target language.

One way in which XML excels, as seen below, is human readability. Although it is possible to decipher the SCS string, it is not as clear unless you add new line characters. It would have been possible to make the decoder white space agnostic, but it would have required tokenizing the data. This would have just been something someone could leave out of the implementation, and thus we would have strayed from our goals. Also, the introduction of white space could potentially cause problems when pasting data as strings into source files. This has priority here as being closer to our goals of cross-language communication. Though we attempt to make it somewhat readable, take into account that human readability is not a priority for SCS when considering your options.

In terms of bandwidth, say for an AJAX project. Consider the following array...

// Test array
$A = array( 'Department'=>
                array(
                    'Accounting'=>
                        array(
                            'John'=>
                                array(
                                    'Married'=>'Yes',
                                    'DOB'=>'1-14-78',
                                    'Pets'=>
                                        array(
                                            'Fish'=>8,
                                            'Dog'=>1,
                                            'Cat'=>2
                                        ),
                                    'ValidCharacters'=>'.-_',
                                    'InvalidCharacters'=>'[,=]'
                                ),
                            'Mary'=>
                                array(
                                    'Married'=>'No',
                                    'DOB'=>'7-2-82',
                                    'Pets'=>
                                        array(
                                            'Dog'=>1,
                                        ),
                                    'InvalidCharacters'=>'!@#$%^&*()'
                                ),
                        ),
                ),
            );

Our SCS implementation comes in at 218 bytes. 42% less than the XML equivalent. But is harder to read. It looks like this:

Department{Accounting{John{Married=Yes,DOB=1-14-78,Pets{Fish=8,Dog=1,Cat=2},\
    ValidCharacters=.-_,InvalidCharacters=%5B%2C%3D%5D},\
    Mary{Married=No,DOB=7-2-82,Pets{Dog=1},\
    InvalidCharacters=%21%40%23%24%25%5E%26%2A%28%29}}}

This typical XML output weighs in at 517 bytes. I struggled a little with whether or not to remove the header and formatting characters. I decided to leave them since this really is a lot of the argument for using XML, to be 'standard'. This is actually cheating a little since I used the same URL encoding instead of the more common base-64. But, XML allows me this.

<?xml version="1.0" encoding="UTF-8" ?>

<Department>
    <Accounting>
        <John>
            <Married>Yes</Married>
            <DOB>1-14-78</DOB>

            <Pets>
                <Fish>8</Fish>
                <Dog>1</Dog>
                <Cat>2</Cat>
            </Pets>
            <ValidCharacters>.-_</ValidCharacters>
            <InvalidCharacters>%5B%2C%3D%5D</InvalidCharacters>

        </John>
        <Mary>
            <Married>No</Married>
            <DOB>7-2-82</DOB>
            <Pets>
                <Dog>1</Dog>
            </Pets>

            <InvalidCharacters>%21%40%23%24%25%5E%26%2A%28%29</InvalidCharacters>
        </Mary>
    </Accounting>
</Department>

The typical runaway implementation. It should be noted that this example particularly amplifies the redundant encoding issue. There are other instances where it would be competitive though never significantly better. Notice the severe mangling due to the recursive encoding.

Department=?Accounting%3D%3FJohn%253D%253FMarried%25253DYes%252526DOB%25253D1-14-78%252526\
    Pets%25253D%25253FFish%2525253D8%25252526Dog%2525253D1%25252526Cat%2525253D2%252526\
    ValidCharacters%25253D.-_%252526InvalidCharacters%25253D%2525255B%2525252C%2525253D%\
    2525255D%2526Mary%253D%253FMarried%25253DNo%252526DOB%25253D7-2-82%252526\
    Pets%25253D%25253FDog%2525253D1%252526InvalidCharacters%25253D%25252521%25252540\
    %25252523%25252524%25252525%2525255E%25252526%2525252A%25252528%25252529

Flexibility

We are not going to attempt to encode variable types or other properties such as minimum and maximum values at the parser level. But these things can still be done in the framework of the current protocol. For example, consider the following XML:

<variable name=x type=float min=-10 max=10>3.14</variable>

We can represent this type of information by just adding a sub array. In the case of XML, the content or value field is implicit between the tags. We will need to add an explicit 'value' field. And the result is actually shorter than the minimal XML.

variable{name=x,type=float,min=-10,max=10,value=3.14}

Or better still...

x{type=float,min=-10,max=10,value=3.14}

Also, there cannot be similar names at a given scope. For instance...

<table><tr><td>One</td><td>Two</td></tr>
<tr><td>Three</td></tr><table>

Would have to be represented as something like:

table{tr{0{td{0=One,1=Two}},1{td{0=three}}}}

// For clarity
table
{
    tr
    {
        0
        {
            td
            {
                0 = One,
                1 = Two
            }
        },

        1
        {
            td
            {
                0 = three
            }
        }
    }
}

You'll find most data structures can be represented well enough in this protocol. It's usually just a matter of efficiency, especially when dealing with high-bandwidth, binary data like live video or audio. Then again, what format covers everything well?

Conclusion

I think that supplies a good idea of what was being attempted, and what was achieved. A few notes...

Notice that the supplied functions allow you to easily serialize parts of the array as well as the whole array. Also, you can decode one array into a larger array. This is a subtle but powerful construct.

The property bag concept achieved in the C++ implementation is a powerful addition to the language. It can severely cut development time when dealing with data. The nice thing about C++ is that you can describe how exactly you want operators to behave. I actually use a more advanced form of this class that allows serializing/deserializing into lots of formats like the Windows Registry, INI files, URL GET and POST variables, MIME formats, database, etc... This can be an enormously powerful way to handle generic data. I know I didn't invent this by the way, there are many examples out there...

I'd like to add more languages to this example. Perl, Python, VB, come to mind. If anyone wants to donate, please feel free.

Thanks everybody!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Robert Umbehant
Software Developer (Senior)
United States United States
No Biography provided

Comments and Discussions

 
GeneralAlternative version of JavaScript deserializer Pinmemberfrevabogo7-Sep-12 6:55 
SuggestionUrlDecode mfc PinmemberMember 850531220-Dec-11 21:43 
GeneralBug fixes for php and javascript PinmemberWirehand20-Nov-08 10:35 
Generalc++ version PinmemberKeepclear3-Sep-07 11:35 
GeneralMulti dimensional arrays Pinmemberfat32129-Aug-07 13:02 
GeneralThe code didn't work for me Pinmemberfat32127-Aug-07 1:26 
QuestionHave you considered ... PinmemberRoland Pibinger2-Nov-06 11:18 
AnswerRe: Have you considered ... PinmemberRobert Umbehant2-Nov-06 14:34 
GeneralYou might want to fix.... PinmemberColin Angus Mackay4-Sep-06 5:45 
GeneralRe: You might want to fix.... PinmemberRobert Umbehant4-Sep-06 7:48 
GeneralRe: You might want to fix.... PinmemberColin Angus Mackay4-Sep-06 10:20 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.1411023.1 | Last Updated 2 Nov 2006
Article Copyright 2006 by Robert Umbehant
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid