Unification of Text and Binary File APIs

Shao Voon Wong

Rate me:

5.00/5 (28 votes)

25 Sep 2012CPOL19 min read

69.9K

3.9K

The almost perfect unification of text and binary file APIs to read/write structured data

This is an old version of the currently published article.

Introduction
Text File Usage
Binary File Usage
Code Design
Porting to Linux
Caveat
Points of Interest
Demo
Conclusion
Related Links
History

Introduction

For C/C++ programmers, the standard way to write and read files are through the C file API and C++ file streams. Because of its cryptic and unintuitive class and function names, I have always found C++ file streams 'too difficult' to use whereas C file API are not type-safe. In this article, I am going to introduce my own type-safe file library, which is based on C file API, unifies the text file and binary file APIs in an almost seamless way. There are still some difference between the text and binary APIs where it makes absolute no sense to make them similar. The library is meant to write and read structured data, meaning to say write and read integers, boolean, floats, strings and so on. The library can be used to write and read unstructured data (for example, C++ source files) with its lower level classes. However, that is not the focus of the library and the article. This article is meant to teach the readers how to easily access structured data in files.

For the .NET people who happened to chance upon this article, you can stop reading this article now. This article is about native C++, not .NET, though I tried to write a C# version of my file library but I failed miserably.

Text File Usage

In this section, we are going to look at how to write and read text files. Let us begin at learning how to write integer and double to a text file.

C++

using namespace Elmax;

xTextWriter writer;
std::wstring file = L"Unicode.txt";
if(writer.Open(file, FT_UNICODE, NEW))
{
    int i = 25698;
    double d = 1254.69;
    writer.Write(L"{0},{1}", i, d);
    writer.Close();
}

The code above tries open a new Unicode file and upon success, writes a integer and double value and closes the file. Other text file types supported are ASCII, Big Endian Unicode and UTF-8. Though not shown in the code, user should check the boolean return value of write. xTextWriter delegates its file work to AsciiWriter, UnicodeWriter, BEUnicodeWriter and UTF8Writer. Likewise, xTextReader delegates its file work to AsciiReader, UnicodeReader, BEUnicodeReader and UTF8Reader. These file writers write the BOM on their first write while the readers read the BOM automatically if it is present. For those readers who are not familiar what is BOM, BOM is an acronym for byte order mark. BOM is a Unicode character used to signal the endianness (byte order) of a text file or stream. BOM is optional but it is generally accepted as good practice to write BOM. The reader might ask for the reason to write a Unicode file library and why not pick up one from CodeProject. I have decided to write my own Unicode file classes because most of those featured on CodeProject make use of MFC CStdioFile class which does not work on other platforms. Let us now look at how to read the same data we have just written.

C++

using namespace Elmax;

xTextReader reader;
std::wstring file = L"Unicode.txt";
if(reader.Open(file))
{
    if(reader.IsEOF()==false)
    {
        int i2 = 0;
        double d2 = 0.0;

        StrtokStrategy strat(L",");
        reader.SetSplitStrategy(&strat);
        size_t totalRead = reader.ReadLine(i2, d2); // i2 = 25698 and d2 = 1254.69
}
reader.Close();

The reader opens the same file and set its text split strategy. In this case, it is set to use strtok and its delimiter is set to comma. Other split strategies includes Boost and Regex but it is highly recommended for user to choose strtok because it is fast. We have seen how to write and read an integer and double. Writing and reading strings are no difference but special care must be taken for delimiter which may appears inside the string. That means we must escape the string when writing and unescape the string when reading. There is a function, ReplaceAll in StrUtil class which users can use to escape and unescape their strings.

There is an overloaded Open function which takes in the additional Unicode file type as parameter. But foremost, it will always respect the BOM if it detects its presence. Only in the absence of BOM that the xTextReader will open the file according to the Unicode file type which the user specified.

Binary File Usage

Writing binary file is similar to writing text file, except the user does not have to write the delimiters in between the data.

C++

using namespace Elmax;

xBinaryWriter writer;
std::wstring file = L"Binary.bin";
if(writer.Open(file))
{
    int i = 25698;
    double d = 1254.69;
    writer.Write(i, d);
    writer.Close();
}

Write returns number of the values successfully written. As shown below, reading is almost similar to writing.

C++

using namespace Elmax;

xBinaryReader reader;
std::wstring file = L"Binary.bin";
if(reader.Open(file))
{
    if(reader.IsEOF())
    {
        int i2 = 0;
        double d2 = 0.0;
        size_t totalRead = reader.Read(i2, d2); // i2 = 25698 and d2 = 1254.69
    }
    reader.Close();
}

Writing strings in binary, most of the time, involves in writing the string length beforehand and before reading the string, we need to read the length and allocate the array first.

C++

using namespace Elmax;

xBinaryWriter writer;
std::wstring file = GetTempPath(L"Binary.bin");
if(writer.Open(file))
{
    std::string str = "Coding Monkey";
    double d = 1254.69;
    writer.Write(str.size(), str, d);
    writer.Close();
}

xBinaryReader reader;
if(reader.Open(file))
{
    if(reader.IsEOF()==false)
    {
        size_t len = 0;
        double d2 = 0.0;
        StrArray arr;
        size_t totalRead = reader.Read(len);

        totalRead = reader.Read(arr.MakeArray(len), d2);

        std::string str2 = arr.GetPtr(); // str2 contains "Coding Monkey"
    }
    reader.Close();
}

We use StrArray to read a char array. We read its length first and use the length to allocate the array through MakeArray method. It is possible to read the length and make the array at the same time, using DeferredMake. Unlike MakeArray, DeferredMake does not allocate the array: the allocation is delayed until when it comes to its turn to read the file. DeferredMake captures the address of the len, so when the len gets updated with the length, it also gets the length. See below.

C++

xBinaryReader reader;
if(reader.Open(file))
{
    if(reader.IsEOF()==false)
    {
        size_t len = 0;
        double d2 = 0.0;
        StrArray arr;
        size_t totalRead = reader.Read(len, arr.DeferredMake(len), d2);

        std::string str2 = arr.GetPtr(); // str2 contains "Coding Monkey"
    }
    reader.Close();
}

WStrArray is available to read wchar_t array. However, it is not recommended to write std::wstring and use WStrArray to read it if you want to keep your file format portable across different OSes. The reason is due to wchar_t size is different on Windows, Linux and Mac OSX. We will explore this issue on the later section. Note:Text file API do not have this problem as conversion are in place to keep it automatic. The workaround if the user need to write Unicode strings is write UTF-8 string. Another option is to use BaseArray class to write 16 bit string. There are 2 types of 16 bit encoding for Unicode, namely UCS-2 and UTF-16. UCS-2 unit is always 16 bits and can only represent 97% of the Unicode. UTF-16 can encode all Unicode code points but its unit could consist of a single or two 16 bit words. For some use cases, UCS-2 is sufficient to store the text of the choice language. UTF-16 is able to store everything that is Unicode but the tradeoff is the conversion time and the need to take note of the potential difference in text length before and after conversion.

xBinaryWriter and xBinaryReader also provides Seek and GetCurrPos to do file seeking (a common operation in binary file parsing).

Code Design

xTextWriter and xTextReader makes use of DataType and DataTypeRef respectively to do the conversion between data types and string. Basically, this library depends on implicit conversion of Plain Old Data(POD) to DataType object to work. xTextWriter has many overloaded Write and WriteLine which differs by the number of DataType parameters. WriteLine basically just add the linefeed (LF) after writing the string. The Write below, has 5 DataType parameters.

C++

bool xTextWriter::Write( const wchar_t* fmt, DataType D1, DataType D2, DataType D3, DataType D4, DataType D5 )
{
    if(pWriter!=NULL)
    {
        std::wstring str = StrUtilRef::Format(fmt, D1, D2, D3, D4, D5);
        return pWriter->Write(str);
    }

    return false;
}

DataType consists many overloaded constructors which convert the Plain Old Data (POD) to string and store it in string member (m_str).

C++

namespace Elmax
{
class DataType
{
public:
    ~DataType(void);

    DataType( int i );

    DataType( unsigned int ui );

    DataType( const ELMAX_INT64& i64 );

    DataType( const unsigned ELMAX_INT64& ui64 );

    DataType( float f );

    DataType( const double& d );

    DataType( const std::string& s );

    DataType( const std::wstring& ws );

    DataType( const char* pc );

    DataType( const wchar_t* pwc );

    DataType( char c );

    DataType( unsigned char c );

    DataType( wchar_t wc );

    std::wstring& ToString() { return m_str; }

protected:
    std::wstring m_str;
};

As mentioned earlier, xTextReader makes use of DataTypeRef to do the conversion from string to Plain Old Data (POD). xTextReader has 10 overloaded Read and ReadLine which differs only by the number of DataTypeRef parameters. The ReadLine shown below, has 5 DataTypeRef parameters.

C++

size_t xTextReader::ReadLine( DataTypeRef D1, DataTypeRef D2, DataTypeRef D3, DataTypeRef D4,
    DataTypeRef D5 )
{
    if(pReader!=NULL)
    {
        std::wstring text;
        bool b = pReader->ReadLine(text);

        if(b)
        {
            StrUtilRef strUtil;
            strUtil.SetSplitStrategy(m_pSplitStrategy);

            return strUtil.Split(text.c_str(), D1, D2, D3, D4, D5);
        }
    }

    return 0;
}

size_t StrUtilRef::Split( const std::wstring& StrToExtract, DataTypeRef& D1, DataTypeRef& D2, DataTypeRef& D3, 
    DataTypeRef& D4, DataTypeRef& D5 )
{
    std::vector<DataTypeRef*> vecDTR;
    vecDTR.push_back(&D1);
    vecDTR.push_back(&D2);
    vecDTR.push_back(&D3);
    vecDTR.push_back(&D4);
    vecDTR.push_back(&D5);

    assert( m_pSplitStrategy );
    return m_pSplitStrategy->Extract( StrToExtract, vecDTR );
}

size_t StrtokStrategy::Extract( 
    const std::wstring& StrToExtract, 
    std::vector<Elmax::DataTypeRef*> vecDTR )
{
    std::vector<std::wstring> vecSplit;
    const size_t size = StrToExtract.size()+1;
    wchar_t* pszToExtract = new wchar_t[size];
    wmemset( pszToExtract, 0, size );
    Wcscpy( pszToExtract, StrToExtract.c_str(), size );

    wchar_t *pszContext = 0;
    wchar_t *pszSplit = 0;
    pszSplit = wcstok( pszToExtract, m_sDelimit.c_str() );

    while( NULL != pszSplit )
    {
        size_t len = wcslen(pszSplit);
        if(pszSplit[len-1]==65535&&vecSplit.size()==vecDTR.size()-1) // bug workaround: wcstok_s/wcstok will put 65535 at the back of last string.
            pszSplit[len-1] = L'\0';

        vecSplit.push_back(std::wstring( pszSplit ) );

        pszSplit = wcstok( NULL, m_sDelimit.c_str() );
    }

    delete [] pszToExtract;

    size_t fail = 0;
    for( size_t i=0; i<vecDTR.size(); ++i )
    {
        if( i < vecSplit.size() )
        {
            if( false == vecDTR[i]->ConvStrToType( vecSplit[i] ) )
                ++fail;
        }
        else
            break;
    }

    return vecSplit.size()-fail;
}

DataTypeRef keeps a big union to store the address of each POD parameter as a destination for result.

C++

namespace Elmax
{
class DataTypeRef
{
public:
    ~DataTypeRef(void);

    union UNIONPTR
    {
        int* pi;
        unsigned int* pui;
        short* psi;
        unsigned short* pusi;
        ELMAX_INT64* pi64;
        unsigned ELMAX_INT64* pui64;
        float* pf;
        double* pd;
        std::string* ps;
        std::wstring* pws;
        char* pc;
        unsigned char* puc;
        wchar_t* pwc;
    };

    enum DTR_TYPE
    {
        DTR_INT,
        DTR_UINT,
        DTR_SHORT,
        DTR_USHORT,
        DTR_INT64,
        DTR_UINT64,
        DTR_FLOAT,
        DTR_DOUBLE,
        DTR_STR,
        DTR_WSTR,
        DTR_CHAR,
        DTR_UCHAR,
        DTR_WCHAR
    };

    DataTypeRef( int& i )                    { m_ptr.pi = &i;       m_type = DTR_INT;   }

    DataTypeRef( unsigned int& ui )          { m_ptr.pui = &ui;     m_type = DTR_UINT;  }

    DataTypeRef( short& si )                 { m_ptr.psi = &si;     m_type = DTR_SHORT; }

    DataTypeRef( unsigned short& usi )       { m_ptr.pusi = &usi;   m_type = DTR_USHORT;}

    DataTypeRef( ELMAX_INT64& i64 )          { m_ptr.pi64 = &i64;   m_type = DTR_INT64; }

    DataTypeRef( unsigned ELMAX_INT64& ui64 ){ m_ptr.pui64 = &ui64; m_type = DTR_UINT64;}

    DataTypeRef( float& f )                  { m_ptr.pf = &f;       m_type = DTR_FLOAT; }

    DataTypeRef( double& d )                 { m_ptr.pd = &d;       m_type = DTR_DOUBLE;}

    DataTypeRef( std::string& s )            { m_ptr.ps = &s;       m_type = DTR_STR;   }

    DataTypeRef( std::wstring& ws )          { m_ptr.pws = &ws;     m_type = DTR_WSTR;  }

    DataTypeRef( char& c )                   { m_ptr.pc = &c;       m_type = DTR_CHAR;  }

    DataTypeRef( unsigned char& uc )         { m_ptr.puc = &uc;     m_type = DTR_UCHAR; }

    DataTypeRef( wchar_t& wc )               { m_ptr.pwc = &wc;     m_type = DTR_WCHAR; }

    bool ConvStrToType( const std::string& Str );

    bool ConvStrToType( const std::wstring& Str );

    DTR_TYPE m_type;

    UNIONPTR m_ptr;
};

xBinaryWriter makes use of BinaryTypeRef. The overloaded Write is different by the number of parameters. xBinaryWriter has no WriteLine function. The Write function shown below, has 2 BinaryTypeRef parameters.

C++

size_t xBinaryWriter::Write( BinaryTypeRef D1, BinaryTypeRef D2 )
{
    size_t totalWritten = 0;
    if(fp!=NULL)
    {
        if(D1.m_type != BinaryTypeRef::DTR_STR && D1.m_type != BinaryTypeRef::DTR_WSTR && D1.m_type != BinaryTypeRef::DTR_BASEARRAY)
        {
            size_t len = fwrite(D1.GetAddress(), D1.size, 1, fp);
            if(len==1)
                ++totalWritten;
        }
        else
        {
            size_t len = fwrite(D1.GetAddress(), D1.elementSize, D1.arraySize, fp);
            if(len==D1.arraySize)
                ++totalWritten;
        }

        if(D2.m_type != BinaryTypeRef::DTR_STR && D2.m_type != BinaryTypeRef::DTR_WSTR && D2.m_type != BinaryTypeRef::DTR_BASEARRAY)
        {
            size_t len = fwrite(D2.GetAddress(), D2.size, 1, fp);
            if(len==1)
                ++totalWritten;
        }
        else
        {
            size_t len = fwrite(D2.GetAddress(), D2.elementSize, D2.arraySize, fp);
            if(len==D2.arraySize)
                ++totalWritten;
        }

    }

    if(totalWritten != 2)
    {
        errNum = ELMAX_WRITE_ERROR;
        err = StrUtil::Format(L"{0}: Less than 2 elements are written! ({1} elements written)", GetErrorMsg(errNum), totalWritten);
        if(enableException)
            throw new std::runtime_error(StrUtil::ConvToString(err));
    }

    return totalWritten;
}

BinaryTypeRef keeps a union to store the address of the POD. No textual to string conversion is necessary: POD is written as it is into the binary file.

C++

namespace Elmax
{
class BinaryTypeRef
{
public:
    ~BinaryTypeRef(void);

    union UNIONPTR
    {
        const int* pi;
        const unsigned int* pui;
        const short* psi;
        const unsigned short* pusi;
        const ELMAX_INT64* pi64;
        const unsigned ELMAX_INT64* pui64;
        const float* pf;
        const double* pd;
        std::string* ps;
        const std::wstring* pws;
        const char* pc;
        const unsigned char* puc;
        const wchar_t* pwc;
        const char* arr;
    };

    enum DTR_TYPE
    {
        DTR_INT,
        DTR_UINT,
        DTR_SHORT,
        DTR_USHORT,
        DTR_INT64,
        DTR_UINT64,
        DTR_FLOAT,
        DTR_DOUBLE,
        DTR_STR,
        DTR_WSTR,
        DTR_CHAR,
        DTR_UCHAR,
        DTR_WCHAR,
        DTR_BASEARRAY
    };

    BinaryTypeRef( const int& i )                     { m_ptr.pi = &i; m_type = DTR_INT; size=sizeof(i); }

    BinaryTypeRef( const unsigned int& ui )           { m_ptr.pui = &ui; m_type = DTR_UINT; size=sizeof(ui); }

    BinaryTypeRef( const short& si )                  { m_ptr.psi = &si; m_type = DTR_SHORT; size=sizeof(si); }

    BinaryTypeRef( const unsigned short& usi )        { m_ptr.pusi = &usi; m_type = DTR_USHORT; size=sizeof(usi); }

    BinaryTypeRef( const ELMAX_INT64& i64 )           { m_ptr.pi64 = &i64; m_type = DTR_INT64; size=sizeof(i64); }

    BinaryTypeRef( const unsigned ELMAX_INT64& ui64 ) { m_ptr.pui64 = &ui64; m_type = DTR_UINT64; size=sizeof(ui64); }

    BinaryTypeRef( const float& f )                   { m_ptr.pf = &f; m_type = DTR_FLOAT; size=sizeof(f); }

    BinaryTypeRef( const double& d )                  { m_ptr.pd = &d; m_type = DTR_DOUBLE; size=sizeof(d); }

    BinaryTypeRef( std::string& s )                   { m_ptr.ps = &s; m_type = DTR_STR; elementSize=sizeof(char);size=s.length(); 
                                                            arraySize=s.length();}

    BinaryTypeRef( const std::wstring& ws )           { m_ptr.pws = &ws; m_type = DTR_WSTR; elementSize=sizeof(wchar_t);
                                                            size=ws.length()*sizeof(wchar_t); arraySize=ws.length();}

    BinaryTypeRef( const char& c )                    { m_ptr.pc = &c; m_type = DTR_CHAR; size=sizeof(c); }

    BinaryTypeRef( const unsigned char& uc )          { m_ptr.puc = &uc; m_type = DTR_UCHAR; size=sizeof(uc); }

    BinaryTypeRef( const wchar_t& wc )                { m_ptr.pwc = &wc; m_type = DTR_WCHAR; size=sizeof(wc); }

    BinaryTypeRef( const BaseArray& arr )             { m_ptr.arr = arr.GetPtr(); m_type = DTR_BASEARRAY; 
                                                            size=arr.GetTotalSize(); elementSize=arr.GetElementSize(); 
                                                            arraySize=arr.GetArraySize(); }
    char* GetAddress();

    DTR_TYPE m_type;

    UNIONPTR m_ptr;

    size_t size;

    size_t elementSize;

    size_t arraySize;
};

Lastly, we have come to xBinaryReader. xBinaryReader makes use of BinaryTypeReadRef to do data conversion. Like xTextReader, xBinaryReader has overloaded Read to do its work but it has no ReadLine.

C++

size_t xBinaryReader::Read( BinaryTypeReadRef D1, BinaryTypeReadRef D2 )
{
    size_t totalRead = 0;
    if(fp!=NULL)
    {
        if(D1.m_type != BinaryTypeReadRef::DTR_STRARRAY && D1.m_type != BinaryTypeReadRef::DTR_WSTRARRAY && D1.m_type != BinaryTypeReadRef::DTR_BASEARRAY)
        {
            size_t cnt = fread(D1.GetAddress(), D1.size, 1, fp);
            if(cnt==1)
                ++totalRead;
        }
        else
        {
            D1.DeferredMake();
            size_t cnt = fread(D1.GetAddress(), D1.elementSize, D1.arraySize, fp);
            if(cnt == D1.arraySize)
                ++totalRead;
        }

        if(D2.m_type != BinaryTypeReadRef::DTR_STRARRAY && D2.m_type != BinaryTypeReadRef::DTR_WSTRARRAY && D2.m_type != BinaryTypeReadRef::DTR_BASEARRAY)
        {
            size_t cnt = fread(D2.GetAddress(), D2.size, 1, fp);
            if(cnt==1)
                ++totalRead;
        }
        else
        {
            D2.DeferredMake();
            size_t cnt = fread(D2.GetAddress(), D2.elementSize, D2.arraySize, fp);
            if(cnt==D2.arraySize)
                ++totalRead;
        }

    }

    if(totalRead != 2)
    {
        errNum = ELMAX_READ_ERROR;
        err = StrUtil::Format(L"{0}: Less than 2 elements are read! ({1} elements read)", GetErrorMsg(errNum), totalRead);
        if(enableException)
            throw new std::runtime_error(StrUtil::ConvToString(err));
    }

    return totalRead;
}

For simplicity, I do not show the BinaryTypeReadRef class here because the code is quite complicated as it supports DeferredMake of the array class.

Porting to Linux

When I was writing the Windows code, I took special care to separate the Windows and Non-Windows code with a _MICROSOFT macro. _WIN32 macro is not used instead because the Mingw defines it as well. The main difference between Windows and Non-Windows code at that point, is on Windows, linefeed ("\n") is converted to a combination of carriage return and line feed ("\r\n") during file writing and the reverse process is applied during file reading; On Non-Windows platform, linefeed ("\n") remains as linefeed: no conversion is done.

I downloaded and installed Orwell Dev-C++ to test my code on Mingw and GCC on Windows. Orwell Dev-C++ is a continuation of the work of (currently non-active) the Bloodshed Dev-C++. Orwell Dev-C++ comes bundled with Mingw and fairly recent GCC 4.6.x. During compilation, Orwell Dev-C++ complains about the unavailable secure c function (typically name which ends with _s) such as _itow_s. So I changed them to non-secure version for Non-Windows implementation while Windows implementation is still using the secure version. Dev-C++ also complained it could not find a std::exception constructor which takes in a string. It turned out that std::exception was meant to be derived from and not used directly. I changed the use of std::exception to proper exception types, such as logic_error, runtime_error and so on. With these change done, I assume most of my Linux work is done. I estimated, excluding the time to learn G++ and write makefile, that it would take me at most 1 hour to get the code working. That was when I found out I have grossly underestmated the time that would taken me to resolve the errors on Ubuntu Linux 12.04.

After converting the Orwell Dev-C++ makefile to work on Ubuntu Linux and GCC 4.6.3, the first error which G++ complained was it did not understand the included paths. So I changed the backslash to forward slash.

C++

#include "..\\..\\Common\\Common.h"

The above path is changed to below.

C++

#include "../../Common/Common.h"

This was an easy change, though I had to update almost all of the 66 source files. The next G++ complaint was it could not find the data conversion function (typically name which starts with underscore) such as _ultow. It turned out that Microsoft standard conversion functions were not the standard after all. I have to use stringstream to replace _ultow and its cousins. All compilation errors are resolved at this point. And I ran the unit tests. It crashed at the first Unicode test! Upon some investigation, I discovered, to my dismay, the size of wchar_t on Linux and Mac OSX is 4 bytes, instead of 2 bytes! That meant all the wchar_t related functions did not work correctly on Linux and Mac OSX. It was clearly a showstopper! It took me 3 laborious days to implement UTF-16 conversions and handle all the instances where wchar_t size was 4; Unicode files are essentially UTF16 files. On Windows, UTF-16 is supported natively. On Ubuntu Linux, I have to convert the 4 bytes wchar_t (UTF-32) to UTF-16 before writing to Unicode file. The reverse conversion applies during reading.

If you are interested to run the Linux tests, you can run the command line below to build the library (FileLib.a) and the test application (UnitTest.exe) and execute it

C++

cd FileLib
cd FileIO
make all
cd ..
cd PreVS2012UnitTest
make all
./UnitTest.exe

In total, there are 48 unit tests for Windows and 52 unit tests for Linux. Whenever I made a change or fix a bug for either OS, I ran the unit tests for both to make sure I have not broken anything on the either side.

Caveat

This is a list of issues that the users need to be aware of when using this file library.

Do not use size_t type for binary files: size_t is 32 bit unsigned integer on 32 bit platform and is 64 bit unsigned integer on 64 bit platform. The automatic promotion to 64 bit on 64 bit OS is sometimes desired but is wrong in file format. When a data is 32 bit in binary, we always want it to remain 32 bit in file to be consistent.
Non-Windows implementation use fopen: Windows provide a _wfopen function to open file with Unicode name. Unfortunately, Linux and GCC (or rather C Standard Library) does not have such function. C and C++ Standard does not make any notes on how to open Unicode named file. The workaround is, on other platforms, when your user is about to open a file with a name which consists of Unicode code point (> 255), the application should copy the file to another ASCII name and open that file instead.

Points of Interest

The reader may have or may not have noticed the Elmax namespace used in the code snippets. As anyone would have guessed the file library is for future cross-platform Elmax XML Library, but why include a binary file API as well? The reason is because there will be a version of Elmax which can save XML in binary form. Let us briefly recall the Elmax syntax to write a value to a XML element.

C++

using namespace Elmax;
Element elem;

elem[L"Price"] = 30.65f;
elem[L"Qty"] = 1200;

As the reader can see from the above sample code, Elmax element is aware of the data type before it converts the data to textual form. By using the data type information, Elmax can build a metadata section about the XML. The metadata can be separated or embedded inside the Binary XML. If the XML contains mainly recurring elements, the metadata can be concise and small. However, if the XML file is consisted free form XML like SOAP XML, HTML or XAML, the metadata can be quite big with respect to the Binary XML. Binary file has the advantage of being fast because the data-type conversion from textual form is out of the picture.

Demo

I have modified an old OpenGL demo to read binary file to showcase the file library. Set the global variable, g_bLoadBinary according to which file type you want the demo to load. Please note the OpenGL code is not cross-plaform and only runs on Windows. Previously, I have uploaded an OpenGL demo for another article. Since I have only access to NVidia graphics card, I was not aware that the code does not run correctly on Intel graphics chipset. This demo should not have the same problem. Please let me know if you have any problem running the OpenGL demo. The demo is written in OpenGL 2.0. A OpenGL 4.0 version is being developed for a future OpenGL article. Stay tuned if you are interested in OpenGL 4.0!

This is the wood clip model loaded. The model is modelled using very old Milkshape shareware.

This is the screenshot of the demo.

Conclusion

In this article, we have seen a new file API which makes writing and reading structured data intuitive and productive. By keeping both the text and binary API similar, the user can maintain both file formats with minimal efforts. The file library would be used for the new Elmax XML library to save to textual and binary XML files. The XML work is a ongoing effort. The estimated date of completion is unknown. The Portable Elmax library is hosted at SourceForge. The SourceForge page is currently empty because I am a newcomer to SourceForge and is still figuring out how to use website.

History

2012-09-25 : Initial release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Shao Voon Wong

Software Developer (Senior)

Singapore

Shao Voon is from Singapore. His interest lies primarily in computer graphics, software optimization, concurrency, security, and Agile methodologies.

In recent years, he shifted focus to software safety research. His hobby is writing a free C++ DirectX photo slideshow application which can be viewed here.

Unification of Text and Binary File APIs

Table Of Contents

Introduction

Text File Usage

Binary File Usage

Code Design

Porting to Linux

Caveat

Points of Interest

Demo

Conclusion

Related Links

History

License

Comments and Discussions