Click here to Skip to main content
15,896,201 members
Articles / Programming Languages / C++

Base64 Encoder and Boost

Rate me:
Please Sign up or sign in to vote.
0.00/5 (No votes)
3 Apr 2012CPOL 32.2K   4   2
Base64 Encoder and Boost

Well, I was looking for a Base64 library recently, and I thought, “I know, I bet it is in Boost, I have Boost, and Boost has everything.” And it turns out that it does! Kind of. But it’s a bit odd and sadly incomplete.

So let’s fix it.

To start with, I didn’t really have a clue how to plug these Boost components together, so a quick scout on the beloved StackOverflow (a Programmer’s Guide to the Galaxy) yields the following code, which I have slightly modified:

using namespace boost::archive::iterators;

typedef
  insert_linebreaks<         // insert line breaks every 76 characters
    base64_from_binary<    // convert binary values to base64 characters
      transform_width<   // retrieve 6 bit integers from a sequence of 8 bit bytes
        const unsigned char *
        ,6
        ,8
        >
      >
      ,76
    >
  base64Iterator; // compose all the above operations in to a new iterator

Disgusting, isn’t it? It also doesn’t work. It will only work when the data you use is a multiple of three, so we’ll have to pad it ourselves. Here’s the full code for that:

#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/insert_linebreaks.hpp>
#include <boost/archive/iterators/transform_width.hpp>

namespace Base64Utilities
{
  std::string ToBase64(std::vector<unsigned char> data)
  {
    using namespace boost::archive::iterators;

    // Pad with 0 until a multiple of 3
    unsigned int paddedCharacters = 0;
    while(data.size() % 3 != 0)
    {
      paddedCharacters++;
      data.push_back(0x00);
    }

    // Crazy typedef black magic
    typedef
      insert_linebreaks<         // insert line breaks every 76 characters
        base64_from_binary<    // convert binary values to base64 characters
          transform_width<   // retrieve 6 bit integers from a sequence of 8 bit bytes
            const unsigned char *
            ,6
            ,8
            >
          >
          ,76
        >
        base64Iterator; // compose all the above operations in to a new iterator

    // Encode the buffer and create a string
    std::string encodedString(
      base64Iterator(&data[0]),
      base64Iterator(&data[0] + (data.size() - paddedCharacters)));

    // Add '=' for each padded character used
    for(unsigned int i = 0; i < paddedCharacters; i++)
    {
      encodedString.push_back('=');
    }

    return encodedString;
  }
}

It’s not that elegant but it seems to work. Can you improve on this code? Can you write the decode function? Check your answers with this excellent Online Base64 Converter.
Leave your comments below! 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer Web Biscuit
United Kingdom United Kingdom
At Web Biscuit, you can find software, articles, a good dollop of quality and an unhealthy obsession over biscuits.
Website: http://www.webbiscuit.co.uk
Twitter Watch: http://twitter.com/WebBiscuitCoUk

Comments and Discussions

 
QuestionUsing raw poiners as iterators is bad form Pin
Oded Arbel4-Nov-13 9:22
Oded Arbel4-Nov-13 9:22 
The "input must be divisible by 3" problem is because the base64 transform iterator expects an iterator as a source, but feeding it a raw pointer (which is a "kind of" iterator) does not allow it to detect when there is no more data - so instead of padding with zeros, the transform iterator consumes bytes from after the end of the vector and you get unexpected behavior.

The solution should be easy:

C
typedef base64_from_binary<transform_width<decltype(std::begin(data)),6,8>> base64Iterator


Unfortunately, base64_from_binary doesn't check if the iterator has terminated to pad the missing bits - the transformation stops, but only after we try to resolve the end iterator, which crashes the program (compare with how boost::filter_iterator deals with the source running out of data). This whole thing looks like an exercise in decomposing algorithms, that was never intended to be used for actual work. The code's location under "archive" is probably a hint to that.

That being said, we can solve the padding problem by proving our own wrapper iterator that reports on stream end but can answer dereference requests of invalid iterators with zeros.

C
template<class _MyIt, class Elem = typename _MyIt::value_type> class zero_pad_iterator 
	: public boost::iterator_facade<zero_pad_iterator<_MyIt,Elem>,Elem,std::input_iterator_tag,Elem>
{
	friend class boost::iterator_core_access;
	Elem dereference() const {
		if (!_valid) return Elem(); // return "zero" from invalid iterators
		return _current_value; 
	}
	bool equal(const zero_pad_iterator & rhs) const { 
		return _valid == rhs._valid && (_valid ? _source == rhs._source 
			: true ); // two invalid operators are always the same
	}

	void increment(){
		if (!_valid)	return; // don't move past the end
		_valid = ++_source != _end;
		if (_valid)	_current_value = *_source;
	}

	_MyIt _source, _end;
	Elem _current_value;
	bool _valid;
public:
	template<class _Cont> // convenience constructor
	zero_pad_iterator(const _Cont& container) : _source(std::begin(container)), _end(std::end(container)), _current_value(*_source), _valid(true) {}
        // constructor that allows detection of when the source has met the end
	zero_pad_iterator(const _MyIt start, const _MyIt end) : _source(start), _end(end), _current_value(*_source), _valid(start != end) {}
	// helper constructor for making easy ends (invalid iterators are equal to iterators that have reached the end)
	zero_pad_iterator() : _valid(false) {}
};


All that is left to do is to add the '=' character padding at the end (which is not strictly needed to have a safe binary transformation, but some protocol implementation may be picky about this), so you use it something like this:

C
std::string encode(const std::vector<unsigned char>& data) {
	typedef zero_pad_iterator<decltype(data.begin())> padIt;
	typedef b64::base64_from_binary<b64::transform_width<padIt, 6, 8>> base64Iterator;
	auto out = std::string(base64Iterator(padIt(data)), base64Iterator(padIt()));
	return out + std::string(out.length() % 4, '=');
}


Then you get to trying to do the reverse transformation, with something like:
C
std::vector<unsigned char> decode(const std::string& data) {
	// filter out base64 padding
	auto predicate = [](char c) { return c != '='; };
	typedef boost::filter_iterator<decltype(predicate), decltype(std::begin(data))> filterbase64; 
	filterbase64 begin(predicate,std::begin(data), std::end(data)), 
	             end(predicate, std::end(data), std::end(data)); 

	// create transformer and use it
	typedef b64::transform_width<b64::binary_from_base64<filterbase64>, 8, 6> base64Iterator;
	return std::vector<unsigned char>(base64Iterator(begin), base64Iterator(end));
}


At which point I figured out that the reverse transformation also tried to dereference values from the source before checking if it hit the end, and this time there is no magic padding value that will prevent the table lookup from throwing an exception, which just hides the real problem - the reverse transformation doesn't calculate when the original bit stream have ended and so always tries to produce values past the end of the original stream.

So I gave up and went to write my own Baes64 encoder/decoder.
QuestionSame bug, same place Pin
net_storm3-Oct-13 2:05
net_storm3-Oct-13 2:05 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.