Introduction
Welcome to part IV of my series on the STL. This time around I want to lok
at two containers called set and map. I'll explain set first and then
look at what map brings to the party.
Creating a set
Both of these containers are implemented as a binary tree by every STL
implementation I know of. As a result, the push_back/pop_back paradigm
does not work here - items are inserted in order, so you are not able to
specify a location. Instead we use the insert method, which can take a
location as an option, this is a suggestion to the STL where to start looking
for an insert location, and can obviously speed things up tremendously if a
large set is being built. To remove items from a set, we need to use the
find algorithm and pass the returned iterator into the erase function.
Whenever you use the find algorithm, you should check if the return value is the
same as end(), just in case the item was not found.
set<STRING> setString;
setString.insert("this");
setString.insert("is");
setString.insert("a");
setString.insert("test");
setString.insert("boys");
cout << "setString" << endl << endl;
set<STRING>::iterator it = setString.begin();
while(it != setString.end())
{
cout << *it << endl;
++it;
}
it = setString.find("boys");
if (it != setString.end())
setString.erase(it);
it = setString.begin();
cout << endl << endl << "We removed the 'boys'" << endl << endl;
while(it != setString.end())
{
cout << *it << endl;
++it;
}
This produces the following output:
setString
a
boys
is
test
this
We removed the 'boys'
a
is
test
this
Specifying sort order
The other thing we can do when we create a set is specify the manner in which
items are sorted. I could use string in this container also, but I've
elected to use char * from here on instead. We are going to create a set
with the same data as before, but sort it in reverse order through a functor as
follows.
struct gtstr
{
bool operator()(const char * s1, const char * s2) const
{
return (strcmp(s1, s2) > 0);
}
};
now in our main function:
set<char gtstr*,> setString2;
setString2.insert("this");
setString2.insert("is");
setString2.insert("a");
setString2.insert("test");
setString2.insert("boys");
cout
<<
endl
<<
"setString2" << endl << endl;
copy(setString2.begin(), setString2.end(),
ostream_iterator<char*>(cout, "\r\n"));
The output looks like this:
setString2
this
test
is
boys
a
As you can see, our policy has allowed the items to be sorted in reverse
order. While this is a trivial case, user defined types will always
require such a functor because no default sort order will be possible.
Advantages of set
The set has a number of advantages over other containers. It has a much
better insert cost than vector, and a much better search cost than list, making
it an excellent trade off in some situations. As Scott Meyers points out
though, if you're creating a container which does not change much, then a
sorted array may be a better choice. In both cases the main advantage
comes from the ease of searching a container that is sorted, and the STL
provides a number of functions especially for use with set.
Set algorithms
The STL contains a number of algorithms that perform set related functions on a
sorted range, as follows:
Function name |
Function returns |
includes |
true if range from set one exists in set two |
set_union |
all elements in both sets |
set_intersection |
elements that appear in both sets |
set_difference |
elements only in set one |
set_symmetric_difference |
elements only in either of the two |
Set and map offer bidirectional iterators, so any other algorithms that accept
these iterators can be used with them, and any sorted container that supports
at least these iterators can be used with the set functions listed above.
The following code uses the last four of these algorithms on two sets that are
similar, but not the same
set<char gtstr*,> setString3;
setString3.insert("this");
setString3.insert("could");
setString3.insert("well");
setString3.insert("be");
setString3.insert("the");
setString3.insert("test");
cout << endl << "Set 3" << endl << endl;
copy(setString3.begin(), setString3.end(), ostream_iterator<char*>(cout, "\r\n"));
cout << endl << endl << "Set Difference" << endl << endl;
std::set_difference(setString2.begin(), setString2.end(),
setString3.begin(), setString3.end(),
ostream_iterator<char *>(cout, " "));
cout << endl << endl << "Set Intersection" << endl << endl;
std::set_intersection(setString2.begin(), setString2.end(),
setString3.begin(), setString3.end(),
ostream_iterator<char *>(cout, " "));
cout << endl << endl << "Set Symmetric Difference" << endl << endl;
std::set_symmetric_difference(setString2.begin(), setString2.end(),
setString3.begin(), setString3.end(),
ostream_iterator<char *>(cout, " "));
cout << endl << endl << "Set Union" << endl << endl;
std::set_union(setString2.begin(), setString2.end(),
setString3.begin(), setString3.end(),
ostream_iterator<char *>(cout, " "));
and the output is as follows:
Set 3
well
this
the
test
could
be
Set Difference
is boys a
Set Intersection
this test
Set Symmetric Difference
well the could be is boys a
Set Union
well this the test could be is boys a
A note about hashed sets
As has been noted, set and map are generally implemented as binary trees.
The other option is to use has tables, and while they are not part of the
standard yet, many implimentation offer hashed sets and maps. The
advantage of hashing is that it provides potentially better performance than a
binary tree, but a binary tree provides stable performance where a hash table at
worst can provide O(n) performance. I'm guessing that is the reason that
when time became tight, the binary tree implementation got into the standard
and the hash one was made to wait.
Map
A map is essentially a set which also stores a second value in each
location. The first value can then be considered a key whereby the second
can be found. An example we have all probably used exists in the heart of
MFC. Because MFC wraps a HWND
into a class, but must ultimately process
it in a global WndProc
, it creates a map which links a HWND
to an instance of a
CWnd*
. Then the WinProc
can access the class by looking it up through the
HWND
which it has had passed into it.
To add to a map, we can use the insert method, but it is much easier to use
array type notation, like this:
mapstring,> mapInt2String;
mapInt2String[0] = "string number 0";
mapInt2String[2] = "string number 2";
mapInt2String[3] = "string number 3";
mapInt2String[5] = "string number 5";
mapInt2String[6] = "the quick brown fox jumped over the lazy dog";
Items can be extracted in the same manner, but there is a slight problem.
The [] operator will create a new item if one does not exist in that
location. The following code illustrates the problem:
cout << "\r\n\r\n\r\nMap Int2String has " << mapInt2String.size()
<< " elements\r\n";
cout << "mapInt2String[6] = " << mapInt2String[6];
cout << "\r\nMap Int2String has " << mapInt2String.size()
<< " elements\r\n";
cout << "mapInt2String[1] = " <<mapInt2String[1];
cout << "\r\nMap Int2String has " << mapInt2String.size()
<< " elements\r\n";
This outputs the following:
mapInt2String has 5 elements
mapInttoString[6] = the quick brown fox jumped over the lazy dog
mapInt2String has 5 elements
mapInttoString[1] =
mapInt2String has 6 elements
Correctly accessing elements in a map
The way to remove an element is to use the erase function, and the way to check
if an element exists is to use find prior to accessing it.
cout << "\r\nErase rogue element...\r\n";
mapInt2String.erase(mapInt2String.find(1));
cout << "\r\nMap Int2String has " << mapInt2String.size()
<< " elements\r\n";
if (mapInt2String.find(4) != mapInt2String.end())
cout << "mapInt2String[4] = " << mapInt2String[4];
cout << "\r\nMap Int2String has " << mapInt2String.size()
<< " elements\r\n\r\n\r\n\r\n";
This outputs the following:
Erase rogue element....
mapInt2String has 5 elements
mapInt2String has 5 elements
In other words, our second attempt to access a non-existent element did not
result in an unwanted insertion.
Map Iterators
Being an STL container, a map uses iterators to control access to it's
elements. However, a map stores two values in each location, so what does
the iterator give us to deal with this ? An iterator from a map returns a
pair<key, value>, and the two values involved can be accessed using
pair.first
and pair.second
respectively. The following code illustrates
this
std::map<int string,>::iterator itMap = mapInt2String.begin();
while (itMap != mapInt2String.end())
{
cout << "First element = " << itMap->first << endl;
cout << "Second element = " << itMap->second << endl;
++itMap;
}
This will output the following
First element = 0
Second element = string number 0
First element = 2
Second element = string number 2
First element = 3
Second element = string number 3
First element = 5
Second element = string number 5
First element = 6
Second element = the quick brown fox jumped over the lazy dog
The first and second elements provide read and write access, so that an element
can be changed as follows:
itMap = mapInt2String.find(0);
if (itMap != mapInt2String.end())
itMap->second = "Would you, could you in a car ? "
"Eat them, eat them, here they are !!";
Printing out the list again would reveal that this element has now been
changed. Accessing first allows changing of the key for an element in the
same way.
Having established this, a better way to access an element would be to catch the
iterator returned by our find test, and call second on it, as using the
[]
operators after establishing the existence of an element requires two lookups,
whereas storing the iterator means only one is required.
Using multiset/multimap
The set and map both require a unique key, that is to say, inserting the same
value into a set twice results in only one insertion, the same is true of two
elements with the same key into a map. Where more than one element
of the same value is required, we must use multiset/multimap. These
containers work the same as thier single element cousins, but have some extra
functions to assist us in using them
Function |
Returns |
count |
number of elements with a given value |
lower_bound |
first element that contains a value |
upper_bound |
last element that contains a value |
equal_range |
pair of iterators that define the range that contains a given
value. |
I hope you've found this article useful, and that it inspires you to make use of
the STL. Up until now I have spoken mainly about the different containers
the STL provides ( although some remain uncovered as of yet ), and how they can
be specialised. However, the real power of the STL lies in the way it
interfaces with the algorithms the standard library provides. A selection
of these algorithms have appeared in the series to date, but the next article
will seek to list a good number of them and explore the ways in which they can
be used.
See you then.