Click here to Skip to main content
13,042,819 members (74,624 online)
Click here to Skip to main content
Add your own
alternative version

Stats

26.8K views
37 bookmarked
Posted 12 Apr 2016

C++: Simplistic Binary Streams

, 16 Feb 2017
Rate this:
Please Sign up or sign in to vote.
Simplistic Binary Streams with endian swap support

Introduction

More than a few C++ developers having accustomed to << and >> operators on text stream, missed them on binary streams. Simplistic Binary Stream is nothing than a barebone wrapper over STL fstream's read and write functions. Readers may compare it to other serialization libraries like Boost Serialization Library and MFC Serialization due to seemingly similar <<, >> overloads. Simplistic Binary Stream is not a serialization library: it does not handle versioning, backward/forward compatible, endianess correctness, leaving everything to developer. Every developer who had used Boost Serialization in the past, is fresh in their memory having bitten when the version 1.42-1.44 files rendered unreadable by newer version. Using a serialization library is like putting your file format under a third party not within your control. While Simplistic Binary Stream offers none of the serialization convenience, it puts the developer in the driver seat over their file format.

Anyone using the library to read/write file format, are advisable to implement another layer above it. In this article, we will look at the usage before looking at the source code. Simplistic Binary Stream comes in two flavors: file and memory streams. File stream encapsulate the STL fstream while memory stream uses STL vector<char> to hold the data in memory. Developer can use the memory stream to parse in memory for files downloaded from network.

Simple Examples

The examples of writing and then reading is similar to both memory and file streams, except that we flush and close the output file stream before reading it with the input stream.

#include <iostream>
#include "MiniBinStream.h"

void TestMem()
{
    simple::mem_ostream out;
    out << 23 << 24 << "Hello world!";

    simple::mem_istream in(out.get_internal_vec());
    int num1 = 0, num2 = 0;
    std::string str;
    in >> num1 >> num2 >> str;

    cout << num1 << "," << num2 << "," << str << endl;
}

void TestFile()
{
    simple::file_ostream out("file.bin", std::ios_base::out | std::ios_base::binary);
    out << 23 << 24 << "Hello world!";
    out.flush();
    out.close();

    simple::file_istream in("file.bin", std::ios_base::in | std::ios_base::binary);
    int num1 = 0, num2 = 0;
    std::string str;
    in >> num1 >> num2 >> str;

    cout << num1 << "," << num2 << "," << str << endl;
}

The output is the same for both.

23,24,Hello world!

Overloading the operators

Say we have a Product structure. We can overload them like below.

#include <vector>
#include <string>
#include "MiniBinStream.h"

struct Product
{
    Product() : product_name(""), price(0.0f), qty(0) {}
    Product(const std::string& name, float _price, int _qty) : product_name(name), price(_price), qty(_qty) {}
    std::string product_name;
    float price;
    int qty;
};

simple::mem_istream& operator >> (simple::mem_istream& istm, Product& val)
{
    return istm >> val.product_name >> val.price >> val.qty;
}

simple::file_istream& operator >> (simple::file_istream& istm, Product& val)
{
    return istm >> val.product_name >> val.price >> val.qty;
}

simple::mem_ostream& operator << (simple::mem_ostream& ostm, const Product& val)
{
    return ostm << val.product_name << val.price << val.qty;
}

simple::file_ostream& operator << (simple::file_ostream& ostm, const Product& val)
{
    return ostm << val.product_name << val.price << val.qty;
}

If the struct only contains fundamental types and developer can pack the struct members with no padding or alignment as shown below, then he/she can write/read the whole struct at one go, instead of processing the members one by one. Reader should notice that we overload the memory and file streams with the same code. That is unfortunate because both types of streams are not derived from the same base class. Even if they are, it wouldn't work because the write and read functions are template functions and template functions cannot be virtual for reasons that template function is determined at compile time while virtual polymorphism is determined at runtime: they cannot be used together.

#if defined(__linux__)
#pragma pack(push)
#pragma pack(1)
// Your struct declaration here.
#pragma pack(pop)
#endif

#if defined(WIN32)
#pragma warning(disable:4103)
#pragma pack(push,1)
// Your struct declaration here.
#pragma pack(pop)
#endif

Next, we overload the operators for writing/reading vector of Product and also outputing it on console. Rule of thumb: never use a size_t because its size is dependent on platform(32/64bits).

simple::mem_istream& operator >> (simple::mem_istream& istm, std::vector<Product>& vec)
{
    int size=0;
    istm >> size;

    if(size<=0)
        return istm;

    for(int i=0; i<size; ++i)
    {
        Product product;
        istm >> product;
        vec.push_back(product);
    }

    return istm;
}

simple::file_istream& operator >> (simple::file_istream& istm, std::vector<Product>& vec)
{
    int size=0;
    istm >> size;

    if(size<=0)
        return istm;

    for(int i=0; i<size; ++i)
    {
        Product product;
        istm >> product;
        vec.push_back(product);
    }

    return istm;
}

simple::mem_ostream& operator << (simple::mem_ostream& ostm, const std::vector<Product>& vec)
{
    int size = vec.size();
    ostm << size;
    for(size_t i=0; i<vec.size(); ++i)
    {
        ostm << vec[i];
    }

    return ostm;
}

simple::file_ostream& operator << (simple::file_ostream& ostm, const std::vector<Product>& vec)
{
    int size = vec.size();
    ostm << size;
    for(size_t i=0; i<vec.size(); ++i)
    {
        ostm << vec[i];
    }

    return ostm;
}

void print_product(const Product& product)
{
    using namespace std;
    cout << "Product:" << product.product_name << ", Price:" << product.price << ", Qty:" << product.qty << endl;
}

void print_products(const std::vector<Product>& vec)
{
    for(size_t i=0; i<vec.size() ; ++i)
        print_product(vec[i]);
}

We test the overloaded operators for Product using the code below.

void TestMemCustomOperatorsOnVec()
{
    std::vector<Product> vec_src;
    vec_src.push_back(Product("Book", 10.0f, 50));
    vec_src.push_back(Product("Phone", 25.0f, 20));
    vec_src.push_back(Product("Pillow", 8.0f, 10));
    simple::mem_ostream out;
    out << vec_src;

    simple::mem_istream in(out.get_internal_vec());
    std::vector<Product> vec_dest;
    in >> vec_dest;

    print_products(vec_dest);
}

void TestFileCustomOperatorsOnVec()
{
    std::vector<Product> vec_src;
    vec_src.push_back(Product("Book", 10.0f, 50));
    vec_src.push_back(Product("Phone", 25.0f, 20));
    vec_src.push_back(Product("Pillow", 8.0f, 10));
    simple::file_ostream out("file.bin", std::ios_base::out | std::ios_base::binary);
    out << vec_src;
    out.flush();
    out.close();

    simple::file_istream in("file.bin", std::ios_base::in | std::ios_base::binary);
    std::vector<Product> vec_dest;
    in >> vec_dest;

    print_products(vec_dest);
}

The output is as follows.

Product:Book, Price:10, Qty:50
Product:Phone, Price:25, Qty:20
Product:Pillow, Price:8, Qty:10

Source code

All the source code are in a header file, just include the MiniBinStream.h to use the stream class. The class is not using any C++11/14 features. It has been tested on VS2008, GCC4.4 and Clang 3.2. The class is just a thin wrapper over the fstream: there isn't any need for me to explain anything.

// The MIT License (MIT)
// Simplistic Binary Streams 0.9
// Copyright (C) 2014, by Wong Shao Voon (shaovoon@yahoo.com)
//
// http://opensource.org/licenses/MIT
//

#ifndef MiniBinStream_H
#define MiniBinStream_H

#include <fstream>
#include <vector>
#include <string>
#include <cstring>
#include <stdexcept>
#include <iostream>

namespace simple
{

class file_istream
{
public:
    file_istream() {}
    file_istream(const char * file, std::ios_base::openmode mode) 
    {
        open(file, mode);
    }
    void open(const char * file, std::ios_base::openmode mode)
    {
        m_istm.open(file, mode);
    }
    void close()
    {
        m_istm.close();
    }
    bool is_open()
    {
        return m_istm.is_open();
    }
    bool eof() const
    {
        return m_istm.eof();
    }
    std::ifstream::pos_type tellg()
    {
        return m_istm.tellg();
    }
    void seekg (std::streampos pos)
    {
        m_istm.seekg(pos);
    }
    void seekg (std::streamoff offset, std::ios_base::seekdir way)
    {
        m_istm.seekg(offset, way);
    }

    template<typename T>
    void read(T& t)
    {
        if(m_istm.read(reinterpret_cast<char*>(&t), sizeof(T)).bad())
        {
            throw std::runtime_error("Read Error!");
        }
    }
    void read(char* p, size_t size)
    {
        if(m_istm.read(p, size).bad())
        {
            throw std::runtime_error("Read Error!");
        }
    }
private:
    std::ifstream m_istm;
};

template<>
void file_istream::read(std::vector<char>& vec)
{
    if(m_istm.read(reinterpret_cast<char*>(&vec[0]), vec.size()).bad())
    {
        throw std::runtime_error("Read Error!");
    }
}

template<typename T>
file_istream& operator >> (file_istream& istm, T& val)
{
    istm.read(val);

    return istm;
}

template<>
file_istream& operator >> (file_istream& istm, std::string& val)
{
    int size = 0;
    istm.read(size);

    if(size<=0)
        return istm;

    std::vector<char> vec((size_t)size);
    istm.read(vec);
    val.assign(&vec[0], (size_t)size);

    return istm;
}

class mem_istream
{
public:
    mem_istream() : m_index(0) {}
    mem_istream(const char * mem, size_t size) 
    {
        open(mem, size);
    }
    mem_istream(const std::vector<char>& vec) 
    {
        m_index = 0;
        m_vec.clear();
        m_vec.reserve(vec.size());
        m_vec.assign(vec.begin(), vec.end());
    }
    void open(const char * mem, size_t size)
    {
        m_index = 0;
        m_vec.clear();
        m_vec.reserve(size);
        m_vec.assign(mem, mem + size);
    }
    void close()
    {
        m_vec.clear();
    }
    bool eof() const
    {
        return m_index >= m_vec.size();
    }
    std::ifstream::pos_type tellg()
    {
        return m_index;
    }
    bool seekg (size_t pos)
    {
        if(pos<m_vec.size())
            m_index = pos;
        else 
            return false;

        return true;
    }
    bool seekg (std::streamoff offset, std::ios_base::seekdir way)
    {
        if(way==std::ios_base::beg && offset < m_vec.size())
            m_index = offset;
        else if(way==std::ios_base::cur && (m_index + offset) < m_vec.size())
            m_index += offset;
        else if(way==std::ios_base::end && (m_vec.size() + offset) < m_vec.size())
            m_index = m_vec.size() + offset;
        else
            return false;

        return true;
    }

    const std::vector<char>& get_internal_vec()
    {
        return m_vec;
    }

    template<typename T>
    void read(T& t)
    {
        if(eof())
            throw std::runtime_error("Premature end of array!");

        if((m_index + sizeof(T)) > m_vec.size())
            throw std::runtime_error("Premature end of array!");

        std::memcpy(reinterpret_cast<void*>(&t), &m_vec[m_index], sizeof(T));

        m_index += sizeof(T);
    }

    void read(char* p, size_t size)
    {
        if(eof())
            throw std::runtime_error("Premature end of array!");

        if((m_index + size) > m_vec.size())
            throw std::runtime_error("Premature end of array!");

        std::memcpy(reinterpret_cast<void*>(p), &m_vec[m_index], size);

        m_index += size;
    }

    void read(std::string& str, const unsigned int size)
    {
        if (eof())
            throw std::runtime_error("Premature end of array!");

        if ((m_index + str.size()) > m_vec.size())
            throw std::runtime_error("Premature end of array!");

        str.assign(&m_vec[m_index], size);

        m_index += str.size();
    }

private:
    std::vector<char> m_vec;
    size_t m_index;
};

template<>
void mem_istream::read(std::vector<char>& vec)
{
    if(eof())
        throw std::runtime_error("Premature end of array!");
        
    if((m_index + vec.size()) > m_vec.size())
        throw std::runtime_error("Premature end of array!");

    std::memcpy(reinterpret_cast<void*>(&vec[0]), &m_vec[m_index], vec.size());

    m_index += vec.size();
}

template<typename T>
mem_istream& operator >> (mem_istream& istm, T& val)
{
    istm.read(val);

    return istm;
}

template<>
mem_istream& operator >> (mem_istream& istm, std::string& val)
{
    int size = 0;
    istm.read(size);

    if(size<=0)
        return istm;

    istm.read(val, size);

    return istm;
}

class file_ostream
{
public:
    file_ostream() {}
    file_ostream(const char * file, std::ios_base::openmode mode)
    {
        open(file, mode);
    }
    void open(const char * file, std::ios_base::openmode mode)
    {
        m_ostm.open(file, mode);
    }
    void flush()
    {
        m_ostm.flush();
    }
    void close()
    {
        m_ostm.close();
    }
    bool is_open()
    {
        return m_ostm.is_open();
    }
    template<typename T>
    void write(const T& t)
    {
        m_ostm.write(reinterpret_cast<const char*>(&t), sizeof(T));
    }
    void write(const char* p, size_t size)
    {
        m_ostm.write(p, size);
    }

private:
    std::ofstream m_ostm;

};

template<>
void file_ostream::write(const std::vector<char>& vec)
{
    m_ostm.write(reinterpret_cast<const char*>(&vec[0]), vec.size());
}

template<typename T>
file_ostream& operator << (file_ostream& ostm, const T& val)
{
    ostm.write(val);

    return ostm;
}

template<>
file_ostream& operator << (file_ostream& ostm, const std::string& val)
{
    int size = val.size();
    ostm.write(size);

    if(val.size()<=0)
        return ostm;

    ostm.write(val.c_str(), val.size());

    return ostm;
}

file_ostream& operator << (file_ostream& ostm, const char* val)
{
    int size = std::strlen(val);
    ostm.write(size);

    if(size<=0)
        return ostm;

    ostm.write(val, size);

    return ostm;
}

class mem_ostream
{
public:
    mem_ostream() {}
    void close()
    {
        m_vec.clear();
    }
    const std::vector<char>& get_internal_vec()
    {
        return m_vec;
    }
    template<typename T>
    void write(const T& t)
    {
        std::vector<char> vec(sizeof(T));
        std::memcpy(reinterpret_cast<void*>(&vec[0]), reinterpret_cast<const void*>(&t), sizeof(T));
        write(vec);
    }
    void write(const char* p, size_t size)
    {
        for(size_t i=0; i<size; ++i)
            m_vec.push_back(p[i]);
    }

private:
    std::vector<char> m_vec;
};

template<>
void mem_ostream::write(const std::vector<char>& vec)
{
    m_vec.insert(m_vec.end(), vec.begin(), vec.end());
}

template<typename T>
mem_ostream& operator << (mem_ostream& ostm, const T& val)
{
    ostm.write(val);

    return ostm;
}

template<>
mem_ostream& operator << (mem_ostream& ostm, const std::string& val)
{
    int size = val.size();
    ostm.write(size);

    if(val.size()<=0)
        return ostm;

    ostm.write(val.c_str(), val.size());

    return ostm;
}

mem_ostream& operator << (mem_ostream& ostm, const char* val)
{
    int size = std::strlen(val);
    ostm.write(size);

    if(size<=0)
        return ostm;

    ostm.write(val, size);

    return ostm;
}

} // ns simple

#endif // MiniBinStream_H

Version 0.9.5 Breaking Changes

Requires C++11 now. The classes are templates.

template<typename same_endian_type>
class file_istream {...}

template<typename same_endian_type>
class mem_istream  {...}

template<typename same_endian_type>
class ptr_istream  {...}

template<typename same_endian_type>
class file_ostream {...}

template<typename same_endian_type>
class mem_ostream  {...}

How to pass in same_endian_type to the class? Use std::is_same<>().

// 1st parameter is data endian and 2 parameter is platform endian, if they are different, swap.
using same_endian_type = std::is_same<simple::BigEndian, simple::LittleEndian>;
simple::mem_ostream<same_endian_type> out;
out << (int64_t)23 << (int64_t)24 << "Hello world!";

simple::ptr_istream<same_endian_type> in(out.get_internal_vec());
int64_t num1 = 0, num2 = 0;
std::string str;
in >> num1 >> num2 >> str;

cout << num1 << "," << num2 << "," << str << endl;

If your data and platform always shares the same endianness, you can skip the test by specifying std::true_type directly.

simple::mem_ostream<std::true_type> out;
out << (int64_t)23 << (int64_t)24 << "Hello world!";

simple::ptr_istream<std::true_type> in(out.get_internal_vec());
int64_t num1 = 0, num2 = 0;
std::string str;
in >> num1 >> num2 >> str;

cout << num1 << "," << num2 << "," << str << endl;

Advantages of compile-time check

  • For same_endian_type = true_type, the swap function is a empty function which is optimised away.
  • For same_endian_type = false_type, the swapping is done without any prior runtime check cost.

Disadvantages of compile-time check

  • Cannot parse file/data which is sometimes different endian. I believe this scenario is rare.

Swap functions are listed below.

enum class Endian
{
    Big,
    Little
};
using BigEndian = std::integral_constant<Endian, Endian::Big>;
using LittleEndian = std::integral_constant<Endian, Endian::Little>;

template<typename T>
void swap(T& val, std::true_type)
{
    // same endian so do nothing.
}

template<typename T>
void swap(T& val, std::false_type)
{
    std::is_integral<T> is_integral_type;
    swap_if_integral(val, is_integral_type);
}

template<typename T>
void swap_if_integral(T& val, std::false_type)
{
    // T is not integral so do nothing
}

template<typename T>
void swap_if_integral(T& val, std::true_type)
{
    swap_endian<T, sizeof(T)>()(val);
}

template<typename T, size_t N>
struct swap_endian
{
    void operator()(T& ui)
    {
    }
};

template<typename T>
struct swap_endian<T, 8>
{
    void operator()(T& ui)
    {
        union EightBytes
        {
            T ui;
            uint8_t arr[8];
        };

        EightBytes fb;
        fb.ui = ui;
        // swap the endian
        std::swap(fb.arr[0], fb.arr[7]);
        std::swap(fb.arr[1], fb.arr[6]);
        std::swap(fb.arr[2], fb.arr[5]);
        std::swap(fb.arr[3], fb.arr[4]);

        ui = fb.ui;
    }
};

template<typename T>
struct swap_endian<T, 4>
{
    void operator()(T& ui)
    {
        union FourBytes
        {
            T ui;
            uint8_t arr[4];
        };

        FourBytes fb;
        fb.ui = ui;
        // swap the endian
        std::swap(fb.arr[0], fb.arr[3]);
        std::swap(fb.arr[1], fb.arr[2]);

        ui = fb.ui;
    }
};

template<typename T>
struct swap_endian<T, 2>
{
    void operator()(T& ui)
    {
        union TwoBytes
        {
            T ui;
            uint8_t arr[2];
        };

        TwoBytes fb;
        fb.ui = ui;
        // swap the endian
        std::swap(fb.arr[0], fb.arr[1]);

        ui = fb.ui;
    }
};

The code is hosted at Github.

2016-08-01: Version 0.9.4 Update: Added ptr_istream which shares the same interface as mem_istream except it does not copy the array.

2016-08-06: Version 0.9.5 Update: Added Endian Swap.

2017-02-16: Version 0.9.6 Using C File APIs, instead of STL file streams

2017-02-16: Version 0.9.7 Add memfile_istream.

Benchmark of 0.9.7(C file API) against 0.9.5(C++ File stream)

   # File streams (C++ File stream versus C file API)

   old::file_ostream:  359ms
   old::file_istream:  416ms
   new::file_ostream:  216ms
   new::file_istream:  328ms
new::memfile_ostream:  552ms
new::memfile_istream:   12ms

   # In-memory streams (No change in source code)

    new::mem_ostream:  534ms
    new::mem_istream:   16ms
    new::ptr_istream:   15ms

2017-03-07: Version 0.9.8: Fixed GCC and Clang template errors

 

Related Articles

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Cake Processor
Software Developer (Senior)
United States United States

Semi-retired from writing articles but may contribute tips from time to time.



IT Certifications


  • IT Infrastructure Library Foundational (ITIL v3)

  • Scrum Alliance Certified Scrum Master (CSM)

  • Certified Secure Software Lifecycle Professional (CSSLP)


You may also be interested in...

Comments and Discussions

 
QuestionThrow on read error Pin
Robin27-Apr-17 2:20
memberRobin27-Apr-17 2:20 
AnswerRe: Throw on read error Pin
Cake Processor27-Apr-17 3:14
professionalCake Processor27-Apr-17 3:14 
QuestionNDK Pin
Robin6-Mar-17 18:28
memberRobin6-Mar-17 18:28 
AnswerRe: NDK Pin
Shao Voon Wong6-Mar-17 19:47
professionalShao Voon Wong6-Mar-17 19:47 
GeneralRe: NDK Pin
Robin6-Mar-17 22:32
memberRobin6-Mar-17 22:32 
GeneralRe: NDK Pin
Shao Voon Wong7-Mar-17 2:38
professionalShao Voon Wong7-Mar-17 2:38 
GeneralRe: NDK Pin
Robin7-Mar-17 19:11
memberRobin7-Mar-17 19:11 
QuestionComparision Pin
Robin18-Feb-17 1:06
memberRobin18-Feb-17 1:06 
AnswerRe: Comparision Pin
Shao Voon Wong18-Feb-17 20:42
professionalShao Voon Wong18-Feb-17 20:42 
QuestionPerformance Pin
Andy Bantly18-Aug-14 9:20
memberAndy Bantly18-Aug-14 9:20 
AnswerRe: Performance Pin
SV Wong19-Aug-14 16:16
professionalSV Wong19-Aug-14 16:16 
GeneralRe: Performance Pin
Andy Bantly21-Aug-14 2:23
memberAndy Bantly21-Aug-14 2:23 
GeneralRe: Performance Pin
Shao Voon Wong25-Aug-14 18:46
professionalShao Voon Wong25-Aug-14 18:46 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170713.1 | Last Updated 16 Feb 2017
Article Copyright 2016 by Cake Processor
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid