C++: Simplistic Binary Streams






4.85/5 (21 votes)
Simplistic Binary Streams with endian swap support
Table of Contents
- Introduction
- Simple Examples
- Overloading the Operators
- Source Code
- Version 0.9.5 Breaking Changes
- Related Articles
Introduction
More than a few C++ developers having accustomed to << and >> operators on text stream, missed them on binary streams. Simplistic Binary Stream is nothing but a barebone wrapper over STL fstream's read
and write
functions. Readers may compare it to other serialization libraries like Boost Serialization Library and MFC Serialization due to seemingly similar <<, >> overloads. Simplistic Binary Stream is not a serialization library: it does not handle versioning, backward/forward compatible, endianess correctness, leaving everything to the developer. Every developer who had used Boost Serialization in the past, is fresh in their memory having bitten when the version 1.42-1.44 files rendered unreadable by newer version. Using a serialization library is like putting your file format under a third party not within your control. While Simplistic Binary Stream offers none of the serialization convenience, it puts the developer in the driver seat over their file format.
Anyone using the library to read/write file format, is advised to implement another layer above it. In this article, we will look at the usage before looking at the source code. Simplistic Binary Stream comes in two flavors: file and memory streams. File stream encapsulates the STL fstream
while memory stream uses STL vector<char>
to hold the data in memory. Developer can use the memory stream to parse in memory for files downloaded from network.
Simple Examples
The examples of writing and then reading is similar to both memory and file streams, except that we flush and close the output file stream before reading it with the input stream.
#include <iostream>
#include "MiniBinStream.h"
void TestMem()
{
simple::mem_ostream out;
out << 23 << 24 << "Hello world!";
simple::mem_istream in(out.get_internal_vec());
int num1 = 0, num2 = 0;
std::string str;
in >> num1 >> num2 >> str;
cout << num1 << "," << num2 << "," << str << endl;
}
void TestFile()
{
simple::file_ostream out("file.bin", std::ios_base::out | std::ios_base::binary);
out << 23 << 24 << "Hello world!";
out.flush();
out.close();
simple::file_istream in("file.bin", std::ios_base::in | std::ios_base::binary);
int num1 = 0, num2 = 0;
std::string str;
in >> num1 >> num2 >> str;
cout << num1 << "," << num2 << "," << str << endl;
}
The output is the same for both:
23,24,Hello world!
Overloading the Operators
Say we have a Product
structure. We can overload them like below:
#include <vector>
#include <string>
#include "MiniBinStream.h"
struct Product
{
Product() : product_name(""), price(0.0f), qty(0) {}
Product(const std::string& name,
float _price, int _qty) : product_name(name), price(_price), qty(_qty) {}
std::string product_name;
float price;
int qty;
};
simple::mem_istream& operator >> (simple::mem_istream& istm, Product& val)
{
return istm >> val.product_name >> val.price >> val.qty;
}
simple::file_istream& operator >> (simple::file_istream& istm, Product& val)
{
return istm >> val.product_name >> val.price >> val.qty;
}
simple::mem_ostream& operator << (simple::mem_ostream& ostm, const Product& val)
{
return ostm << val.product_name << val.price << val.qty;
}
simple::file_ostream& operator << (simple::file_ostream& ostm, const Product& val)
{
return ostm << val.product_name << val.price << val.qty;
}
If the struct
only contains fundamental types and the developer can pack the struct
members with no padding or alignment as shown below, then he/she can write/read the whole struct
at one go, instead of processing the members one by one. Reader should notice that we overload the memory and file streams with the same code. That is unfortunate because both types of streams are not derived from the same base class. Even if they are, it wouldn't work because the write
and read
functions are template functions and template functions cannot be virtual for reasons that template function is determined at compile time while virtual polymorphism is determined at runtime: they cannot be used together.
#if defined(__linux__)
#pragma pack(push)
#pragma pack(1)
// Your struct declaration here.
#pragma pack(pop)
#endif
#if defined(WIN32)
#pragma warning(disable:4103)
#pragma pack(push,1)
// Your struct declaration here.
#pragma pack(pop)
#endif
Next, we overload the operators for writing/reading vector
of Product
and also outputting it on console. Rule of thumb: never use a size_t
because its size is dependent on platform(32/64bits).
simple::mem_istream& operator >> (simple::mem_istream& istm, std::vector<Product>& vec)
{
int size=0;
istm >> size;
if(size<=0)
return istm;
for(int i=0; i<size; ++i)
{
Product product;
istm >> product;
vec.push_back(product);
}
return istm;
}
simple::file_istream& operator >> (simple::file_istream& istm, std::vector<Product>& vec)
{
int size=0;
istm >> size;
if(size<=0)
return istm;
for(int i=0; i<size; ++i)
{
Product product;
istm >> product;
vec.push_back(product);
}
return istm;
}
simple::mem_ostream& operator << (simple::mem_ostream& ostm, const std::vector<Product>& vec)
{
int size = vec.size();
ostm << size;
for(size_t i=0; i<vec.size(); ++i)
{
ostm << vec[i];
}
return ostm;
}
simple::file_ostream& operator << (simple::file_ostream& ostm, const std::vector<Product>& vec)
{
int size = vec.size();
ostm << size;
for(size_t i=0; i<vec.size(); ++i)
{
ostm << vec[i];
}
return ostm;
}
void print_product(const Product& product)
{
using namespace std;
cout << "Product:" << product.product_name << ",
Price:" << product.price << ", Qty:" << product.qty << endl;
}
void print_products(const std::vector<Product>& vec)
{
for(size_t i=0; i<vec.size() ; ++i)
print_product(vec[i]);
}
We test the overloaded operators for Product
using the code below:
void TestMemCustomOperatorsOnVec()
{
std::vector<Product> vec_src;
vec_src.push_back(Product("Book", 10.0f, 50));
vec_src.push_back(Product("Phone", 25.0f, 20));
vec_src.push_back(Product("Pillow", 8.0f, 10));
simple::mem_ostream out;
out << vec_src;
simple::mem_istream in(out.get_internal_vec());
std::vector<Product> vec_dest;
in >> vec_dest;
print_products(vec_dest);
}
void TestFileCustomOperatorsOnVec()
{
std::vector<Product> vec_src;
vec_src.push_back(Product("Book", 10.0f, 50));
vec_src.push_back(Product("Phone", 25.0f, 20));
vec_src.push_back(Product("Pillow", 8.0f, 10));
simple::file_ostream out("file.bin", std::ios_base::out | std::ios_base::binary);
out << vec_src;
out.flush();
out.close();
simple::file_istream in("file.bin", std::ios_base::in | std::ios_base::binary);
std::vector<Product> vec_dest;
in >> vec_dest;
print_products(vec_dest);
}
The output is as follows:
Product:Book, Price:10, Qty:50
Product:Phone, Price:25, Qty:20
Product:Pillow, Price:8, Qty:10
Source Code
All the source code is in a header file, just include the MiniBinStream.h to use the stream
class. The class is not using any C++11/14 features. It has been tested on VS2008, GCC4.4 and Clang 3.2. The class is just a thin wrapper over the fstream
: there isn't any need for me to explain anything.
// The MIT License (MIT)
// Simplistic Binary Streams 0.9
// Copyright (C) 2014, by Wong Shao Voon (shaovoon@yahoo.com)
//
// http://opensource.org/licenses/MIT
//
#ifndef MiniBinStream_H
#define MiniBinStream_H
#include <fstream>
#include <vector>
#include <string>
#include <cstring>
#include <stdexcept>
#include <iostream>
namespace simple
{
class file_istream
{
public:
file_istream() {}
file_istream(const char * file, std::ios_base::openmode mode)
{
open(file, mode);
}
void open(const char * file, std::ios_base::openmode mode)
{
m_istm.open(file, mode);
}
void close()
{
m_istm.close();
}
bool is_open()
{
return m_istm.is_open();
}
bool eof() const
{
return m_istm.eof();
}
std::ifstream::pos_type tellg()
{
return m_istm.tellg();
}
void seekg (std::streampos pos)
{
m_istm.seekg(pos);
}
void seekg (std::streamoff offset, std::ios_base::seekdir way)
{
m_istm.seekg(offset, way);
}
template<typename T>
void read(T& t)
{
if(m_istm.read(reinterpret_cast<char*>(&t), sizeof(T)).bad())
{
throw std::runtime_error("Read Error!");
}
}
void read(char* p, size_t size)
{
if(m_istm.read(p, size).bad())
{
throw std::runtime_error("Read Error!");
}
}
private:
std::ifstream m_istm;
};
template<>
void file_istream::read(std::vector<char>& vec)
{
if(m_istm.read(reinterpret_cast<char*>(&vec[0]), vec.size()).bad())
{
throw std::runtime_error("Read Error!");
}
}
template<typename T>
file_istream& operator >> (file_istream& istm, T& val)
{
istm.read(val);
return istm;
}
template<>
file_istream& operator >> (file_istream& istm, std::string& val)
{
int size = 0;
istm.read(size);
if(size<=0)
return istm;
std::vector<char> vec((size_t)size);
istm.read(vec);
val.assign(&vec[0], (size_t)size);
return istm;
}
class mem_istream
{
public:
mem_istream() : m_index(0) {}
mem_istream(const char * mem, size_t size)
{
open(mem, size);
}
mem_istream(const std::vector<char>& vec)
{
m_index = 0;
m_vec.clear();
m_vec.reserve(vec.size());
m_vec.assign(vec.begin(), vec.end());
}
void open(const char * mem, size_t size)
{
m_index = 0;
m_vec.clear();
m_vec.reserve(size);
m_vec.assign(mem, mem + size);
}
void close()
{
m_vec.clear();
}
bool eof() const
{
return m_index >= m_vec.size();
}
std::ifstream::pos_type tellg()
{
return m_index;
}
bool seekg (size_t pos)
{
if(pos<m_vec.size())
m_index = pos;
else
return false;
return true;
}
bool seekg (std::streamoff offset, std::ios_base::seekdir way)
{
if(way==std::ios_base::beg && offset < m_vec.size())
m_index = offset;
else if(way==std::ios_base::cur && (m_index + offset) < m_vec.size())
m_index += offset;
else if(way==std::ios_base::end && (m_vec.size() + offset) < m_vec.size())
m_index = m_vec.size() + offset;
else
return false;
return true;
}
const std::vector<char>& get_internal_vec()
{
return m_vec;
}
template<typename T>
void read(T& t)
{
if(eof())
throw std::runtime_error("Premature end of array!");
if((m_index + sizeof(T)) > m_vec.size())
throw std::runtime_error("Premature end of array!");
std::memcpy(reinterpret_cast<void*>(&t), &m_vec[m_index], sizeof(T));
m_index += sizeof(T);
}
void read(char* p, size_t size)
{
if(eof())
throw std::runtime_error("Premature end of array!");
if((m_index + size) > m_vec.size())
throw std::runtime_error("Premature end of array!");
std::memcpy(reinterpret_cast<void*>(p), &m_vec[m_index], size);
m_index += size;
}
void read(std::string& str, const unsigned int size)
{
if (eof())
throw std::runtime_error("Premature end of array!");
if ((m_index + str.size()) > m_vec.size())
throw std::runtime_error("Premature end of array!");
str.assign(&m_vec[m_index], size);
m_index += str.size();
}
private:
std::vector<char> m_vec;
size_t m_index;
};
template<>
void mem_istream::read(std::vector<char>& vec)
{
if(eof())
throw std::runtime_error("Premature end of array!");
if((m_index + vec.size()) > m_vec.size())
throw std::runtime_error("Premature end of array!");
std::memcpy(reinterpret_cast<void*>(&vec[0]), &m_vec[m_index], vec.size());
m_index += vec.size();
}
template<typename T>
mem_istream& operator >> (mem_istream& istm, T& val)
{
istm.read(val);
return istm;
}
template<>
mem_istream& operator >> (mem_istream& istm, std::string& val)
{
int size = 0;
istm.read(size);
if(size<=0)
return istm;
istm.read(val, size);
return istm;
}
class file_ostream
{
public:
file_ostream() {}
file_ostream(const char * file, std::ios_base::openmode mode)
{
open(file, mode);
}
void open(const char * file, std::ios_base::openmode mode)
{
m_ostm.open(file, mode);
}
void flush()
{
m_ostm.flush();
}
void close()
{
m_ostm.close();
}
bool is_open()
{
return m_ostm.is_open();
}
template<typename T>
void write(const T& t)
{
m_ostm.write(reinterpret_cast<const char*>(&t), sizeof(T));
}
void write(const char* p, size_t size)
{
m_ostm.write(p, size);
}
private:
std::ofstream m_ostm;
};
template<>
void file_ostream::write(const std::vector<char>& vec)
{
m_ostm.write(reinterpret_cast<const char*>(&vec[0]), vec.size());
}
template<typename T>
file_ostream& operator << (file_ostream& ostm, const T& val)
{
ostm.write(val);
return ostm;
}
template<>
file_ostream& operator << (file_ostream& ostm, const std::string& val)
{
int size = val.size();
ostm.write(size);
if(val.size()<=0)
return ostm;
ostm.write(val.c_str(), val.size());
return ostm;
}
file_ostream& operator << (file_ostream& ostm, const char* val)
{
int size = std::strlen(val);
ostm.write(size);
if(size<=0)
return ostm;
ostm.write(val, size);
return ostm;
}
class mem_ostream
{
public:
mem_ostream() {}
void close()
{
m_vec.clear();
}
const std::vector<char>& get_internal_vec()
{
return m_vec;
}
template<typename T>
void write(const T& t)
{
std::vector<char> vec(sizeof(T));
std::memcpy(reinterpret_cast<void*>(&vec[0]), reinterpret_cast<const void*>(&t), sizeof(T));
write(vec);
}
void write(const char* p, size_t size)
{
for(size_t i=0; i<size; ++i)
m_vec.push_back(p[i]);
}
private:
std::vector<char> m_vec;
};
template<>
void mem_ostream::write(const std::vector<char>& vec)
{
m_vec.insert(m_vec.end(), vec.begin(), vec.end());
}
template<typename T>
mem_ostream& operator << (mem_ostream& ostm, const T& val)
{
ostm.write(val);
return ostm;
}
template<>
mem_ostream& operator << (mem_ostream& ostm, const std::string& val)
{
int size = val.size();
ostm.write(size);
if(val.size()<=0)
return ostm;
ostm.write(val.c_str(), val.size());
return ostm;
}
mem_ostream& operator << (mem_ostream& ostm, const char* val)
{
int size = std::strlen(val);
ostm.write(size);
if(size<=0)
return ostm;
ostm.write(val, size);
return ostm;
}
} // ns simple
#endif // MiniBinStream_H
Version 0.9.5 Breaking Changes
Requires C++11 now. The classes are templates.
template<typename same_endian_type>
class file_istream {...}
template<typename same_endian_type>
class mem_istream {...}
template<typename same_endian_type>
class ptr_istream {...}
template<typename same_endian_type>
class file_ostream {...}
template<typename same_endian_type>
class mem_ostream {...}
How to pass in same_endian_type
to the class? Use std::is_same<>()
.
// 1st parameter is data endian and 2 parameter is platform endian, if they are different, swap.
using same_endian_type = std::is_same<simple::BigEndian, simple::LittleEndian>;
simple::mem_ostream<same_endian_type> out;
out << (int64_t)23 << (int64_t)24 << "Hello world!";
simple::ptr_istream<same_endian_type> in(out.get_internal_vec());
int64_t num1 = 0, num2 = 0;
std::string str;
in >> num1 >> num2 >> str;
cout << num1 << "," << num2 << "," << str << endl;
If your data and platform always share the same endianness, you can skip the test by specifying std::true_type
directly.
simple::mem_ostream<std::true_type> out;
out << (int64_t)23 << (int64_t)24 << "Hello world!";
simple::ptr_istream<std::true_type> in(out.get_internal_vec());
int64_t num1 = 0, num2 = 0;
std::string str;
in >> num1 >> num2 >> str;
cout << num1 << "," << num2 << "," << str << endl;
Advantages of compile-time Check
- For
same_endian_type = true_type
, the swap function is a empty function which is optimised away. - For
same_endian_type = false_type
, the swapping is done without any prior runtime check cost.
Disadvantages of compile-time Check
- Cannot parse file/data which is sometimes different endian. I believe this scenario is rare.
Swap functions are listed below:
enum class Endian
{
Big,
Little
};
using BigEndian = std::integral_constant<Endian, Endian::Big>;
using LittleEndian = std::integral_constant<Endian, Endian::Little>;
template<typename T>
void swap(T& val, std::true_type)
{
// same endian so do nothing.
}
template<typename T>
void swap(T& val, std::false_type)
{
std::is_integral<T> is_integral_type;
swap_if_integral(val, is_integral_type);
}
template<typename T>
void swap_if_integral(T& val, std::false_type)
{
// T is not integral so do nothing
}
template<typename T>
void swap_if_integral(T& val, std::true_type)
{
swap_endian<T, sizeof(T)>()(val);
}
template<typename T, size_t N>
struct swap_endian
{
void operator()(T& ui)
{
}
};
template<typename T>
struct swap_endian<T, 8>
{
void operator()(T& ui)
{
union EightBytes
{
T ui;
uint8_t arr[8];
};
EightBytes fb;
fb.ui = ui;
// swap the endian
std::swap(fb.arr[0], fb.arr[7]);
std::swap(fb.arr[1], fb.arr[6]);
std::swap(fb.arr[2], fb.arr[5]);
std::swap(fb.arr[3], fb.arr[4]);
ui = fb.ui;
}
};
template<typename T>
struct swap_endian<T, 4>
{
void operator()(T& ui)
{
union FourBytes
{
T ui;
uint8_t arr[4];
};
FourBytes fb;
fb.ui = ui;
// swap the endian
std::swap(fb.arr[0], fb.arr[3]);
std::swap(fb.arr[1], fb.arr[2]);
ui = fb.ui;
}
};
template<typename T>
struct swap_endian<T, 2>
{
void operator()(T& ui)
{
union TwoBytes
{
T ui;
uint8_t arr[2];
};
TwoBytes fb;
fb.ui = ui;
// swap the endian
std::swap(fb.arr[0], fb.arr[1]);
ui = fb.ui;
}
};
The code is hosted at Github.
- 2016-08-01: Version 0.9.4 Update: Added
ptr_istream
which shares the same interface asmem_istream
except it does not copy the array - 2016-08-06: Version 0.9.5 Update: Added Endian Swap
- 2017-02-16: Version 0.9.6 Using C File APIs, instead of STL file streams
- 2017-02-16: Version 0.9.7 Added
memfile_istream
Benchmark of 0.9.7(C file API) against 0.9.5(C++ File Stream)
# File streams (C++ File stream versus C file API) old::file_ostream: 359ms old::file_istream: 416ms new::file_ostream: 216ms new::file_istream: 328ms new::memfile_ostream: 552ms new::memfile_istream: 12ms # In-memory streams (No change in source code) new::mem_ostream: 534ms new::mem_istream: 16ms new::ptr_istream: 15ms
- 2017-03-07: Version 0.9.8: Fixed GCC and Clang template errors
- 2017-08-17: Version 0.9.9: Fixed bug of getting previous value when reading empty
string
- 2018-01-23: Version 1.0.0: Fixed buffer overrun bug when reading
string
(reported by imtrobin) - 2018-05-14: Version 1.0.1: Fixed
memfile_istream tellg
andseekg
bug reported by macxfadz, and useis_arithmetic
instead ofis_integral
to determine a type is integer or floating point that can be swapped - 2018-08-12: Version 1.0.2: Add overloaded file open functions that take in file parameter in wide char string. (only available on win32)
Related Articles
- C++: Minimalistic CSV Streams
- C++14: CSV Stream based on C File API
- C++: Simplistic Binary Streams
- C++: New Text Streams