![]() |
Development Lifecycle »
Design and Architecture »
Data Structures
Intermediate
License: The Code Project Open License (CPOL)
A simple class to encapsulate VARIANTsBy Rob MandersonUsing Variants in your C++ code |
VC6, VC7, VC7.1Win2K, WinXP, Win2003, Visual Studio, MFC, Dev
|
|
Advanced Search |
|
|
|
||||||||||||||||
VARIANTs. I can see you shuddering already. What's a
self respecting C++ programmer doing dirtying his hands with VB/Scripting language datatypes? Well it's the lesser of two evils. Either I can learn about
VARIANTs or I can write my own HTML parser and editor. Which do you think is easier? I knew you'd agree :)
Truth be told, I think the VARIANT concept is actually pretty cool. Wrap your data up in a nice little package with a type descriptor or two, throw it
across a function call boundary and let the other side figure it out. If done right it can solve a lot of otherwise nasty problems. I just wish they were easier to work with!
So that's the apology out of the way. Let's look at what a VARIANT is.
Weakly typed languages allow you to pass arguments that don't match the types expected. So the question should arise - if you can pass the wrong argument type to a function how does the language respond? Most weakly typed languages 'coerce' the value that was passed into the expected type. What does this mean? It means that the language runtime will try and convert the data that was passed into the correct data type. For example, if you were to pass an integer to a function that expected to see a string the most natural 'coercion' is to convert the integer into a string representation. Pass a date where a string is expected and the natural 'coercion' is to convert it to a string representation.
As C++ programmers we're already used to coercion on a small scale - we're used to the idea that the compiler can do promotions from short
to int and so on. Weakly typed languages just take it a step or two further.
So what has this to do with VARIANTs?
Imagine you're designing your own programming language. You know the kinds of datatypes you want to support. You know the kinds of intrinsic operators you want. You can design your compiler to keep track of the datatype of everything in your program, so that when the programmer passes the wrong datatype to a function your compiler knows it and can insert the necessary code to convert the data.
Now imagine you're required to not only support your language but another language (say C++). You have complete control over your own language but no control whatsoever over the second language. Yet you want to be able to interoperate with that language. Since it's you who wants to interoperate with something you cannot change it's up to you to adapt to the 'something you cannot change'. So you design your datatypes in such a way that they contain sufficient information over and above the data they encapsulate to allow anyone else to decipher their contents.
VARIANT is a not so exotic way of solving this problem. Simplified a VARIANT looks like this.
struct tagVARIANT
{
union
{
VARTYPE vt;
WORD wReserved1;
WORD wReserved2;
WORD wReserved3;
union
{
LONG lVal;
BYTE bVal;
SHORT iVal;
FLOAT fltVal;
DOUBLE dblVal;
VARIANT_BOOL boolVal;
DATE date;
BSTR bstrVal;
SAFEARRAY *parray;
VARIANT *pvarVal;
};
};
};
This is a very simplified version of the full VARIANT definition to be found in your nearest copy of oaidl.h. I have no idea what
the wReserved values mean, nor do I care.
What we're interested in are the vt values and the union. vt is the valuetype and the union is the value. You'll see that the union
encompasses LONG, BYTE, SHORT, FLOAT and so on (there are a bucketload of em). vt tells us how to
interpret the value, using the member names. In C++, you might do it like this
void SomeFunc(VARIANT& v)
{
USES_CONVERSION;
if (v.vt == VT_I4)
printf(_T("variant value is %d\n"), v.lVal);
else if (v.vt == VT_BSTR)
printf(_T("variant value is %s\n"), W2A(v.bstrVal));
}
This checks the vt member of the VARIANT. If it's a VT_I4 then the data we want is contained in the lVal member
of the union. Since the lVal member is a LONG we can use %d as the format spec in the printf call. If it's a
VT_BSTR then the data is a BSTR contained in the bstrVal member of the union.
Notice how VARIANTs use the BSTR datatype to pass string data. This is done so that a VARIANT can be passed across
a process boundary without incurring marshaling overhead. There are many other datatypes (not discussed in this article) which do require marshaling to cross a
process boundary but the passing of strings is so common that using a BSTR to sidestep marshaling is a nice optimisation.
VARIANT in a class. We might do it thusly
class CVariant : public VARIANT
{
public:
CVariant();
CVariant(int iValue);
CVariant(LPCTSTR szValue);
LPCTSTR ToString() const;
int ToInt() const;
};
where the implementation of, say, the CVariant(int iValue) overloaded constructor might look like this.
CVariant::CVariant(int iValue)
{
vt = VT_I4;
lVal = iValue;
}
and where the implementation of the ToString() function might look like this.
LPCTSTR CVariant::ToString() const
{
USES_CONVERSION;
if (VT_BSTR == vt)
return W2A(bstrVal);
// It's not a string so return an empty string
return _T("");
}
That simplifies the code a little by hiding the dirty details of figuring out the VARIANT type or converting it's contents inside a method call on the
object but it's hardly enough to warrant a new class let alone an article about it.
VARIANT usage. It's certainly adequate for using the MSHTML control I alluded to in
the introduction. It may not be sufficient for other environments. For example, some years ago I wrote a whole bunch of software using the Microsoft Chat Protocol
control, which seems to have been designed by a committee whose members only knew VB. Almost all data passed between the host and the control is passed as
VARIANTs and some of those VARIANTs are arrays. A VARIANT represents an array using the SAFEARRAY structure.
The SAFEARRAY definition looks like this (this is the Win32 definition - it's a trifle different for WinCE).
typedef struct tagSAFEARRAY
{
USHORT cDims; // How many dimensions in this array
USHORT fFeatures; // Allocation control flags
ULONG cbElements; // The size of each array element
ULONG cLocks; // Array lock count.
PVOID pvData; // Points at the data in the array
SAFEARRAYBOUND rgsabound[1];
} SAFEARRAY;
You're going to love the purpose of the SAFEARRAYBOUND member. It's a structure that specifies the number of elements in this dimension and the
lower bound. This allows an index into a particular dimension of the SAFEARRAY to start at any arbitrary number rather than the 0 that we C/C++
programmers know and love. There's an array of these structures, one for each cDim.
So accessing a VARIANT array in C++ involves interpreting the contents of the VARIANT as a pointer to a SAFEARRAY, validating
the first array index against cDims to be sure it's in range, then indexing into pvData by the size of cbElements, accounting for
the contents of this indices entry in the rgsabound array. Phew, what a mouthful!
Suddenly it's starting to look like maybe a class to encapsulate this stuff might be useful.
VARIANT type is orders of magnitude
more complex than the class presented here.
This class can handle simple VARIANTS with signed integer datatypes or strings. It can also handle 1 dimensional arrays where each element of the array
is a VARIANT which can be any of the simple types handled by the class. If you want more you can follow the code to see how to handle extra types. I've
not needed types beyond those supported so I haven't written support for those types.
Ok so that's the caveat out of the way. Here's the class header.
class CVariant : public VARIANT
{
public:
CVariant();
CVariant(bool bValue);
CVariant(int nValue);
CVariant(LPCTSTR szValue);
CVariant(VARIANT *pV);
CVariant(int lBound, int iElementCount);
~CVariant(void);
// Attributes
BOOL IsArray(int iElement = 0);
BOOL IsString(int iElement = 0);
BOOL IsInt(int iElement = 0);
BOOL IsBool(int iElement = 0);
// Conversions
VARIANT *operator&() { return this; }
// Get operations
VARIANT *ElementAt(int iElement = 0);
CString ToString(int iElement = 0);
int ToInt(int iElement = 0);
BOOL ToBool(int iElement = 0);
// Set operations
void Set(LPCTSTR szString, int iElement = 0);
void Set(int iValue, int iElement = 0);
void Set(bool bValue, int iElement = 0);
};
You've already seen the simple constructors. There are two other constructors. The first constructor lets you define an array. It takes the lower bound for an index,
and a count of how many elements. The code looks like this.
CVariant::CVariant(int lBound, int iElementCount)
{
// Set the type to an array of variants...
vt = VT_ARRAY | VT_VARIANT;
parray = new SAFEARRAY;
// We only support 1 dimensional arrays..
parray->cDims = 1;
parray->fFeatures = FADF_VARIANT | FADF_HAVEVARTYPE | FADF_FIXEDSIZE | FADF_STATIC;
parray->cbElements = sizeof(VARIANT);
parray->cLocks = 0;
// Allocate the array of variants we point to...
parray->pvData = new VARIANT[iElementCount];
memset(parray->pvData, 0, sizeof(VARIANT) * iElementCount);
parray->rgsabound[0].lLbound = lBound;
parray->rgsabound[0].cElements = iElementCount;
}
From my description of the SAFEARRAY structure earlier this should all be pretty clear. We only support 1 dimensional arrays so we set the various
members of the newly created SAFEARRAY instance to reflect that fact. The new SAFEARRAYs rgsabound[0] structure is set with
our lower bound and count variables. It's important to remember that the VARIANT we're creating may be used to interoperate with a module created in
another language and we can't assume that indexes start at 0. Where you start your indexes depends on what you're interoperating with.
The fFeatures member needs some explanation. The flag values I used specify that the array contains VARIANTs of a fixed size and static (not
created on the stack). I specify that it's static because if I need to allocate memory I do it from the heap.
The other constructor lets you take an existing VARIANT (passed perhaps to an event handler for some foreign object you're hosting) and attach it to
a CVariant. The code looks like this.
CVariant::CVariant(VARIANT *pV)
{
// Validate the input (and make sure it's writeable)
ASSERT(pV);
ASSERT(AfxIsValidAddress(pV, sizeof(VARIANT), TRUE));
vt = VT_VARIANT;
pvarVal = pV;
}
If it's a debug build we do some asserts to be sure that it's a pointer to a block of valid memory at least large enough to actually contain a VARIANT.
There's not much more runtime validation we can do. Once we're sure it's something that could be a VARIANT we assign the pointer to the
pvarVal member and set the type to VT_VARIANT. Once that's done we can use any of the other member functions on the VARIANT as
though we'd created it ourselves.
CVariant::CVariant(VARIANT *pV) constructor to attempt to preserve a VARIANT across a function boundary.
The only reason you'd use this constructor is to put the class wrapper around a VARIANT you got from somewhere else. I don't want to say the only way you'd
get such a VARIANT is from an event but I'd put it at being asymptotically close to 100% of the time. This is why there's no Attach function.
The Attach idiom is a temptation to try and preserve something across function boundaries. It works for objects that are going to be around for a long time,
such as window handles, but it doesn't work for things like VARIANTS that are created on the fly to communicate with some other module (such as yours).
Note well that there is no attempt at a copy constructor. Life is way too short to try and write such a beast. Think about it. Your code would have to cope with every possible variation and do deep copies of arrays within arrays within arrays.
CVariant by whatever method we use it. You wouldn't use a VARIANT to communicate from one function of your program
to another function in the same program. You probably wouldn't want to use it across a DLL boundary either. There's too much overhead to make a VARIANT an
attractive proposition. So it's almost a given that you're communicating with something you didn't write yourself. Thus, there are a few functions you can call to
check the datatype of something that's been passed to you from the something you didn't write.
// Attributes
BOOL IsArray(int iElement = 0);
BOOL IsString(int iElement = 0);
BOOL IsInt(int iElement = 0);
BOOL IsBool(int iElement = 0);
These IsAsomething() functions mirror the datatypes the class supports. If you're not sure about the type of a particular VARIANT use
these functions to determine if some operation you're about to perform has any chance of succeeding.
Why don't I encourage access to the vt member via an explicit member function? Glad you asked. Access to that member would return the exact type.
Why is that bad? It's bad because you then have to allow for all the myriad options. It could be VT_USERDEFINED or VT_BLOB_OBJECT or
VT_DISPATCH. Since the class doesn't handle those types you can do nothing useful with the information. Much better, in my opinion, to ask the class,
are you a string? Or are you an integer? If the answer is yes then you can proceed to perform meaningful operations. If not, you do whatever error handling is
appropriate.
Of course there's nothing stopping you accessing the vt member explicitly but if you do you're on your own.
VARIANT access and for array access and take
a parameter which defaults to zero. The accessors figure out for themselves whether you've got an array or not and do the right thing depending on the exact contents
of the VARIANT.
VARIANT and SAFEARRAY operations they treat arrays as being OPTION BASE 0. Internally they need
not be (they could have come from VB for example with OPTION BASE 1 set but internally the functions correct for the OPTION BASE.
The accessors use the ElementAt() helper function to access the data requested and then apply the appropriate data conversion based on the datatype.
The ElementAt() function looks like this.
VARIANT *CVariant::ElementAt(int iElement)
{
if (vt == VT_VARIANT)
// It's a pointer to an external VARIANT
// so return that variant
return pvarVal;
if (!(vt & VT_ARRAY))
// It's not an array so return ourselves
return this;
// Calculate our element offset
int offset = iElement - pvarVal->parray->rgsabound[0].lLbound;
// Offset must be zero or greater and less than the bounds
if (offset >= 0 && offset <= int(pvarVal->parray->rgsabound[0].cElements))
return &((VARIANT *) pvarVal->parray->pvData)[offset];
else
return (VARIANT *) NULL;
}
You can see what I was talking about earlier. If the VARIANT is wrapping a VARIANT obtained from somewhere else we return that
VARIANT. If the VARIANT isn't an array we return a pointer to ourselves (remember the class is derived from the VARIANT
structure and has no vtable so this is equivalent to a pointer to the base VARIANT structure).
Otherwise we have an array so we calculate an offset into the SAFEARRAY taking into account the lower bound stored in the rgsabound structure.
Then we check that the offset is greater than or equal to 0 and less than the number of elements in the array and if it is we return a pointer to the
SAFEARRAY element. If you've specified an index that's invalid you get back a NULL pointer.
The actual accessor looks like this.
CString CVariant::ToString(int iElement)
{
USES_CONVERSION;
// Get the VARIANT at the iElement offset
VARIANT *v = ElementAt(iElement);
// Must be a valid pointer and must be valid readable memory
if (v != (VARIANT *) NULL && AfxIsValidAddress(v, sizeof(VARIANT), FALSE) && v->vt == VT_BSTR)
return W2A(v->bstrVal);
return _T("");
}
Pretty simple. The other accessors work in much the same way.
Notice that the bool overloads use the lowercase bool datatype, not the typedef'd BOOL. This is necessary to
distinguish between the int and bool overloads. We need the different overloads so we can in fact create a VARIANT with the
VT_BOOL type.
VARIANT isn't in fact numeric data of the expected kind or an empty string if it's not a string
VARIANT but it doesn't do type coercion. However, if you wanted to implement type coercion you might do it like this.
LPCTSTR CVariant::ToString() const
{
USES_CONVERSION;
CString csTemp;
switch (vt)
{
case VT_BSTR:
return W2A(bstrVal);
// It's not a string, maybe a number?
case VT_I4;
csTemp.Format(_T("%d"), lVal);
break;
case VT_I2:
csTemp.Format(_T("%d"), iVal);
break;
// and so forth...
}
return csTemp;
}
This little snippet returns a converted string if the VARIANT does indeed contain a string. Otherwise it attempts to convert numeric data into a string
representation and returns that. Finally, if the VARIANT type isn't any covered by the switch statement it returns an empty string.
20 March 2004 - Added bool overloads.
28 March 2004 - Fixed a bug in the ElementAt() function.
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 27 Mar 2004 Editor: Rob Manderson |
Copyright 2004 by Rob Manderson Everything else Copyright © CodeProject, 1999-2009 Web20 | Advertise on the Code Project |