A BSTR Wrapper for Operations with Binary Data






4.86/5 (14 votes)
Presenting a C++ class for correct operations on BSTR strings with binary data contents
Introduction
In this article I will try to demonstrate why none of the existing BSTR
wrappers are appropriate for
managing BSTR
objects with binary contents, and therefore the necessity to implement a specialized one
for this purpose, which I have done.
Why another BSTR wrapper?
Recently I was involved in developing some ATL components for encryption/decryption operations. Since
the results of encryption operations are not restricted only to a given set of characters, like in the case
of text strings, some special methods for managing strings of binary data are needed. The BSTR
data
structure seems very appropriate for such kind of operations, since it can contain any characters and the
string is not mandatory 0 terminated, the string length being specified before the string data. For example you can
build a string of a given length like this:
BSTR bstr = ::SysAllocStringLen(L"ABCB\0DEFG", 9);
This string contains a 0 in the middle. The problem with this data structure it that it needs a wrapper in order to avoid some otherwise easy to make programming mistakes. In order to explain what can go wrong I will first introduce a simple COM component for converting data from binary strings to hexa strings and reversely. This is a subset of a real more complex component (notice that the code I give here is really enough for somebody willing to build the actual hexa converter component, but this is not the real purpose of this article). The IDL interface of this component is something like this (not all code details are given):
//... interface IHexObj : IDispatch { [id(10), helpstring("String convertion from Binary to Hexa")] HRESULT Binary2Hex([in]BSTR* pbsBin, [out]BSTR* pbsHex); [id(11), helpstring("String convertion from Hexa to Binary")] HRESULT Hex2Binary([in]BSTR* pbsHex, [out]BSTR* pbsBin); }; //...
and the implementation is something like this (not all code details are given):
//Error Messages static char szError1[] = "HexCom ERROR: WideCharToMultiByte() conversion Error!"; static char szError2[] = "HexCom ERROR: MultiByteToWideChar() conversion Error!"; static char szError3[] = "HexCom ERROR in Hex2Binary(): in is not a Hex string!"; //... //Some Auxiliary Functions //Optimized Function to convert an unsigned char to a Hex string of length 2 void Char2Hex(unsigned char ch, char* szHex) { static unsigned char saucHex[] = "0123456789ABCDEF"; szHex[0] = saucHex[ch >> 4]; szHex[1] = saucHex[ch&0xF]; szHex[2] = 0; } //Function to convert a Hex string of length 2 to an unsigned char bool Hex2Char(char const* szHex, unsigned char& rch) { if(*szHex >= '0' && *szHex <= '9') rch = *szHex - '0'; else if(*szHex >= 'A' && *szHex <= 'F') rch = *szHex - 55; //-'A' + 10 else //Is not really a Hex string return false; szHex++; if(*szHex >= '0' && *szHex <= '9') (rch <<= 4) += *szHex - '0'; else if(*szHex >= 'A' && *szHex <= 'F') (rch <<= 4) += *szHex - 55; //-'A' + 10; else //Is not really a Hex string return false; return true; } //Function to convert binary string to hex string void Binary2Hex(unsigned char const* pucBinStr, int iBinSize, char* pszHexStr) { int i; char szHex[3]; unsigned char const* pucBinStr1 = pucBinStr; *pszHexStr = 0; for(i=0; i<iBinSize; i++,pucBinStr1++) { Char2Hex(*pucBinStr1, szHex); strcat(pszHexStr, szHex); } } //Function to convert hex string to binary string bool Hex2Binary(char const* pszHexStr, unsigned char* pucBinStr, int iBinSize) { int i; unsigned char ch; for(i=0; i<iBinSize; i++,pszHexStr+=2,pucBinStr++) { if(false == Hex2Char(pszHexStr, ch)) return false; *pucBinStr = ch; } return true; } STDMETHODIMP CHexObj::Binary2Hex(BSTR* pbsBin, BSTR* pbsHex) { USES_CONVERSION; int iBinLen = ::SysStringLen(*pbsBin); char* pcBin = static_cast<char*>(_alloca(iBinLen)); if(!WideCharToMultiByte(CP_ACP, 0, *pbsBin, iBinLen, pcBin, iBinLen, NULL, FALSE)) { return Error(szError1, IID_IHexObj); } char* pcHex = static_cast<char*>(_alloca((iBinLen<<1)+1)); ::Binary2Hex(reinterpret_cast<unsigned char*>(pcBin), iBinLen, pcHex); ::SysReAllocString(pbsHex, T2OLE(pcHex)); return S_OK; } STDMETHODIMP CHexObj::Hex2Binary(BSTR* pbsHex, BSTR* pbsBin) { USES_CONVERSION; int iBinLen = ::SysStringLen(*pbsHex); if(iBinLen&1 != 0) { return Error(szError3, IID_IHexObj); } iBinLen >>= 1; string ostrHex(OLE2T(*pbsHex)); char* pcBin = static_cast<char*>(_alloca(iBinLen)); if(false == ::Hex2Binary(ostrHex.c_str(), reinterpret_cast<unsigned char*>(pcBin), iBinLen)) { return Error(szError3, IID_IHexObj); } WCHAR* pW = (WCHAR*)_alloca(iBinLen*sizeof(WCHAR)); if(!MultiByteToWideChar(CP_ACP, 0, pcBin, iBinLen, pW, iBinLen)) { return Error(szError2, IID_IHexObj); } ::SysReAllocStringLen(pbsBin, pW, iBinLen); return S_OK; }
First notice that the input argument in method Binary2Hex()
is also a pointer like the output argument:
HRESULT Binary2Hex([in]BSTR* pbsBin, [out]BSTR* pbsHex);
Somebody can argue that it is not necessary, but experimentally I have found out that if I use a signature like:
HRESULT Binary2Hex([in]BSTR bsBin, [out]BSTR* pbsHex);
and I want to pass a string like bstr
defined above, then the line:
int iBinLen = ::SysStringLen(bsBin);
inside method Binary2Hex()
would give length 4 instead of the correct value 9. It seems that during
marshalling
COM
is creating a copy of the original string, but is stopping to the first 0. It works fine for
text strings, but not for binary strings.
In conclusion, when you work with binary data you should transmit as pointers both the input and output
BSTR
arguments!
Now why do you need a wrapper for BSTR
?
Notice that in both Binary2Hex()
and Hex2Binary()
methods
I am using the ::SysReAllocStringLen()
function for reallocating the output string before returning. If the
BSTR
argument is not already allocated on the client side, then this function is generating a wonderful crash
on the client side. So what? somebody could argue, you could use the function ::SysAllocStringLen()
instead,
which is working fine in any situation. It is true, but if the string was already allocated on the client side, this
::SysAllocStringLen()
function would generate memory leaks in the system. So in this case, for avoiding the
memory leaks, the programmer should first, before calling the method, deallocate the string on the client side or he
should ensure that the string is not initialized, but you cannot impose it on him, and there is no compiling or execution
error if he is not doing it. Therefore I think the best solution is to use the reallocation functions on the server side and
recommend to the programmer on the client side to systematically use a BSTR
wrapper which is initializing the
encapsulated BSTR
to an empty string (or if he likes he can ensure that all the BSTR
strings are
allocated, which is more error prone and time consuming).
A second problem occurs when your component methods are throwing exceptions. In this case nobody
is deallocating the BSTR
strings allocated on the server side before exception
occurring. A wrapper could
do it in the destructor, so this is the second reason you should use one.
Now, analysing the wrappers already existing, I couldn't find one appropriate for the tasks I was concerned with,
namely managing BSTR
objects containing binary data. Let's consider for example the _bstr_t
wrapper.
There is no constructor for specifying the string length, for example the code:
_bstr_t _bstr(L"ABC\0DEF");
cout << _bstr.length() << endl;
will print the length as 3, i.e. it is stopping to the first 0. A second idea is to first allocate the string and then take ownership of it, like:
BSTR bstr = ::SysAllocStringLen(L"ABC\0DEF", 7); _bstr_t _bstr(bstr, false); cout << _bstr.length() << endl;
with the fCopy
flag false
. This time the length has the correct value 7, but
if you just make a copy of the external BSTR
, like this:
BSTR bstr = ::SysAllocStringLen(L"ABC\0DEF", 7); _bstr_t _bstr(bstr); //fCopy=true by default cout << _bstr.length() << endl;
you get the same wrong result, 3. It seems that there are inherent difficulties in _bstr_t
to
make copies of BSTR
objects with binary contents.
Another problem with _bstr_t
is that it cannot be passed a BSTR*
argument (for a
BSTR
argument it is OK, but as I showed above, we need pointers for correctly passing binary strings).
In this case the only purpose of the use of _bstr_t
wrapper would be to ensure the automatic deallocation.
For example if we would like to use _bstr_t
for calling the Binary2Hex()
method, the code
snippet on the client side would be something like this:
//... try { //Create the object IHexObjPtr pIHexObj(__uuidof(HexObj)); //Allocation outside the wrapper BSTR bstrBin = ::SysAllocStringLen(L"ABC\0DEF", 7); //Take ownership _bstr_t _bstrBin(bstrBin, false); //Allocation outside the wrapper BSTR bstrHex = ::SysAllocString(L""); //Take ownership _bstr_t _bstrHex(bstrHex, false); //Still need direct access to the encapsulated BSTRs pIHexObj->Binary2Hex(&bstrBin, &bstrHex); cout << (char*)_bstrHex << endl; } catch(_com_error const& re) { cout << "HRESULT Message: " << re.ErrorMessage() << endl; cout << "Description: " << (char*)re.Description() << endl; } //...
Implementation
In order to address all the above presented problems I decided to implement my own BSTR
wrapper specialized for
binary data (it works correctly with text data too). I am giving below only the interface,
the implementation details being in the associated project:
class CBinBstr { public: //Constructor CBinBstr(wchar_t const* const& rpwStr=L"", int iLen=0); //From Bytes CBinBstr(unsigned char const* bytes, int iLen=0); //Copy or Take Ownership depending on the bCopy flag CBinBstr(BSTR* pBSTR, bool bCopy= false); //Copy Constructor CBinBstr(CBinBstr const& rBstr); //Destructor virtual ~CBinBstr(); //Comparison Functions int Compare(wchar_t const* pwStr, int iLen=0) const; int Compare(CBinBstr const& rBstr) const; //Length int Length() const; //Returns a copy of the encapsulate BSTR BSTR Copy() const; //Check if Empty bool IsEmpty() const; //Make Empty void Empty(); wchar_t GetAt(int nIndex) const; void SetAt(int nIndex, wchar_t ch); void ToBytes(unsigned char* bytes, int& riLen) const; //Transform from Binary to Hex void BinaryToHex(); //Transform from Hex to Binary void HexToBinary(); //Operators: wchar_t operator[](int nIndex) const; //Pointer to BSTR operator BSTR*(); //Reference to BSTR operator BSTR&(); //Assignment Operator CBinBstr& operator=(CBinBstr const& rBstr); //Conversions from wchar_t* CBinBstr& operator=(wchar_t const* pwszStr); friend bool operator==(CBinBstr const& rBstr1, CBinBstr const& rBstr2); friend bool operator==(CBinBstr const& rBstr, wchar_t const* pwszStr); friend bool operator==(wchar_t const* pwszStr, CBinBstr const& rBstr); friend bool operator!=(CBinBstr const& rBstr1, CBinBstr const& rBstr2); friend bool operator!=(CBinBstr const& rBstr, wchar_t const* pwszStr); friend bool operator!=(wchar_t const* pwszStr, CBinBstr const& rBstr); friend bool operator<(CBinBstr const& rBstr1, CBinBstr const& rBstr2); friend bool operator<(CBinBstr const& rBstr, wchar_t const* pwszStr); friend bool operator<(wchar_t const* pwszStr, CBinBstr const& rBstr); friend bool operator>(CBinBstr const& rBstr1, CBinBstr const& rBstr2); friend bool operator>(CBinBstr const& rBstr, wchar_t const* pwszStr); friend bool operator>(wchar_t const* pwszStr, CBinBstr const& rBstr); friend bool operator<=(CBinBstr const& rBstr1, CBinBstr const& rBstr2); friend bool operator<=(CBinBstr const& rBstr, wchar_t const* pwszStr); friend bool operator<=(wchar_t const* pwszStr, CBinBstr const& rBstr); friend bool operator>=(CBinBstr const& rBstr1, CBinBstr const& rBstr2); friend bool operator>=(CBinBstr const& rBstr, wchar_t const* pwszStr); friend bool operator>=(wchar_t const* pwszStr, CBinBstr const& rBstr); //Concatenation Operator CBinBstr& operator+=(CBinBstr const& rBstr); CBinBstr& operator+=(wchar_t const* pwszStr); friend CBinBstr operator+(CBinBstr const& rBstr1, CBinBstr const& rBstr2); friend CBinBstr operator+(CBinBstr const& rBstr, wchar_t const* pwszStr); friend CBinBstr operator+(wchar_t const* pwszStr, CBinBstr const& rBstr); //Printing with wide streams. Printing is stopping at first 0. Is recommended to call //first BinaryToHex for correct results. friend std::wostream& operator<<(std::wostream& s, CBinBstr const& rBstr); };
Now look how easy and elegant it is to use, compared to the _bstr_t
case above! Let's consider the following code
snippet on the client side:
//... try { //Create the object IHexObjPtr pIHexObj(__uuidof(HexObj)); CBinBstr oBstrBin(L"ABC\0DEF", 7); CBinBstr oBstrHex; //initialized to L"" //Can be passed as BSTR* argument pIHexObj->Binary2Hex(oBstrBin, oBstrHex); //Can be easily printed wcout << (BSTR&)oBstrHex << endl; } catch(_com_error const& re) { cout << "HRESULT Message: " << re.ErrorMessage() << endl; cout << "Description: " << (char*)re.Description() << endl; } //...
The string length this time will be the correct value 7, not 3 like for _bstr_t
.
The initialization in the default constructor is done automatically to L""
.
It can be passed as BSTR*
argument (conversion done automatically by the BSTR*
operator).
It can be easily printed and in the case an exception is thrown, the destructor will take care of the deallocation.
Is this really the end of the troubles? Not really, but I hope it is making life easier. Working with binary data is requires a lot of discipline from the programmer. For example if in the definition
CBinBstr oBstrBin(L"ABC\0DEF", 7);
the programmer is putting 20 instead of 7, some undefined results can be generated. But this is a general problem when working with binary strings. Some possible sources of errors which should still be considered are:
- Declaring a string size larger then the real string size (as in the example above). In this case you should know what you are doing.
- Taking ownership of an external unallocated
BSTR
. Generally this should be avoided, but you would need to do it inside functions to take ownership of theBSTR*
pointer arguments (a case for which it would be safer if, in the calling function, you passed aCBinBstr
). - Abusing the
BSTR*
andBSTR&
conversion operators. These operators should be used only when you need to pass arguments (conversion which is done automatically) or when you need to print the contents withwcout
(this is also stops at the first 0, therefore is better for printing before transformation into hexa format using theBinaryToHex()
method). Otherwise all the operations should be done inside the wrapper, without direct access to the encapsulatedBSTR
.
If you follow the rules the problems can be kept under control!
Conclusion
The project zip file BinBstr.zip attached to this article includes the source code of the
presented CBinBstr
class and a test program. I am interested in any opinions and new ideas about
this implementation.