(untagged)

STL 101 Part D - sorted associative containers, Set and Map

Christian Graus

0.00/5 (No votes)

23 Mar 2002

Coverage of two more containers from the STL, namely set and map, and the functions provided for them.

Introduction

Welcome to part IV of my series on the STL. This time around I want to lok at two containers called set and map. I'll explain set first and then look at what map brings to the party.

Creating a set

Both of these containers are implemented as a binary tree by every STL implementation I know of. As a result, the push_back/pop_back paradigm does not work here - items are inserted in order, so you are not able to specify a location. Instead we use the insert method, which can take a location as an option, this is a suggestion to the STL where to start looking for an insert location, and can obviously speed things up tremendously if a large set is being built. To remove items from a set, we need to use the find algorithm and pass the returned iterator into the erase function. Whenever you use the find algorithm, you should check if the return value is the same as end(), just in case the item was not found.

set<STRING> setString; 
setString.insert("this"); 
setString.insert("is"); 
setString.insert("a"); 
setString.insert("test"); 
setString.insert("boys"); 
cout << "setString" << endl << endl; 

set<STRING>::iterator it = setString.begin(); 

while(it != setString.end()) 
{
    cout << *it << endl; 
    ++it; 
} 

it = setString.find("boys"); 
if (it != setString.end()) 
    setString.erase(it); 
    
it = setString.begin(); 

cout << endl << endl << "We removed the 'boys'" << endl << endl; 

while(it != setString.end()) 
{ 
    cout << *it << endl; 
    ++it;
}

This produces the following output:

setString
a
boys
is
test
this

We removed the 'boys'
a
is
test
this

Specifying sort order

The other thing we can do when we create a set is specify the manner in which items are sorted. I could use string in this container also, but I've elected to use char * from here on instead. We are going to create a set with the same data as before, but sort it in reverse order through a functor as follows.

struct gtstr
{
    bool operator()(const char * s1, const char * s2) const
    {
        return (strcmp(s1, s2) > 0);
    }
};

now in our main function:

set<char gtstr*,> setString2;
setString2.insert("this");
setString2.insert("is");
setString2.insert("a");
 setString2.insert("test");
setString2.insert("boys");
cout
<<
endl
<<

"setString2"  <<  endl  <<  endl;

copy(setString2.begin(), setString2.end(),
     ostream_iterator<char*>(cout, "\r\n"));

The output looks like this:

setString2
this
test
is
boys
a

As you can see, our policy has allowed the items to be sorted in reverse order. While this is a trivial case, user defined types will always require such a functor because no default sort order will be possible.

Advantages of set

The set has a number of advantages over other containers. It has a much better insert cost than vector, and a much better search cost than list, making it an excellent trade off in some situations. As Scott Meyers points out though, if you're creating a container which does not change much, then a sorted array may be a better choice. In both cases the main advantage comes from the ease of searching a container that is sorted, and the STL provides a number of functions especially for use with set.

Set algorithms

The STL contains a number of algorithms that perform set related functions on a sorted range, as follows:

Function name	Function returns
includes	true if range from set one exists in set two
set_union	all elements in both sets
set_intersection	elements that appear in both sets
set_difference	elements only in set one
set_symmetric_difference	elements only in either of the two

Set and map offer bidirectional iterators, so any other algorithms that accept these iterators can be used with them, and any sorted container that supports at least these iterators can be used with the set functions listed above. The following code uses the last four of these algorithms on two sets that are similar, but not the same

set<char gtstr*,> setString3;

setString3.insert("this");
setString3.insert("could");
setString3.insert("well");
setString3.insert("be");
setString3.insert("the");
setString3.insert("test");

cout << endl << "Set  3"  << endl  <<  endl;

copy(setString3.begin(), setString3.end(), ostream_iterator<char*>(cout, "\r\n"));

cout << endl << endl <<  "Set  Difference"  << endl  <<  endl;

std::set_difference(setString2.begin(), setString2.end(), 
         setString3.begin(), setString3.end(),
    ostream_iterator<char *>(cout, " "));

cout << endl << endl <<  "Set  Intersection"  << endl  <<  endl;

std::set_intersection(setString2.begin(), setString2.end(), 
         setString3.begin(), setString3.end(),
    ostream_iterator<char *>(cout, " "));

cout << endl << endl <<  "Set  Symmetric  Difference" << endl  <<  endl;

std::set_symmetric_difference(setString2.begin(), setString2.end(), 
         setString3.begin(), setString3.end(),
    ostream_iterator<char *>(cout, " "));

cout << endl << endl <<  "Set  Union"  << endl  <<  endl;

std::set_union(setString2.begin(), setString2.end(), 
         setString3.begin(), setString3.end(),
    ostream_iterator<char *>(cout, " "));

and the output is as follows:

Set 3
well
this
the
test
could
be
Set Difference
is boys a
Set Intersection
this test
Set Symmetric Difference
well the could be is boys a
Set Union
well this the test could be is boys a

A note about hashed sets

As has been noted, set and map are generally implemented as binary trees. The other option is to use has tables, and while they are not part of the standard yet, many implimentation offer hashed sets and maps. The advantage of hashing is that it provides potentially better performance than a binary tree, but a binary tree provides stable performance where a hash table at worst can provide O(n) performance. I'm guessing that is the reason that when time became tight, the binary tree implementation got into the standard and the hash one was made to wait.

Map

A map is essentially a set which also stores a second value in each location. The first value can then be considered a key whereby the second can be found. An example we have all probably used exists in the heart of MFC. Because MFC wraps a HWND into a class, but must ultimately process it in a global WndProc, it creates a map which links a HWND to an instance of a CWnd*. Then the WinProc can access the class by looking it up through the HWND which it has had passed into it.

To add to a map, we can use the insert method, but it is much easier to use array type notation, like this:

mapstring,> mapInt2String;

mapInt2String[0] = "string number 0";
mapInt2String[2] = "string number 2";
mapInt2String[3] = "string number 3";
mapInt2String[5] = "string number 5";
mapInt2String[6] = "the quick brown fox jumped over the lazy dog";

Items can be extracted in the same manner, but there is a slight problem. The [] operator will create a new item if one does not exist in that location. The following code illustrates the problem:

cout << "\r\n\r\n\r\nMap Int2String has " << mapInt2String.size() 
     << " elements\r\n";

cout << "mapInt2String[6] = " << mapInt2String[6];

cout << "\r\nMap Int2String has " << mapInt2String.size()
     << " elements\r\n";

cout << "mapInt2String[1] = " <<mapInt2String[1];

cout << "\r\nMap Int2String has " << mapInt2String.size() 
     << " elements\r\n";

This outputs the following:

mapInt2String has 5 elements
mapInttoString[6] = the quick brown fox jumped over the lazy dog
mapInt2String has 5 elements
mapInttoString[1] =
mapInt2String has 6 elements

Correctly accessing elements in a map

The way to remove an element is to use the erase function, and the way to check if an element exists is to use find prior to accessing it.

cout << "\r\nErase rogue element...\r\n";

mapInt2String.erase(mapInt2String.find(1));

cout << "\r\nMap Int2String has " << mapInt2String.size() 
     << " elements\r\n";

if (mapInt2String.find(4) != mapInt2String.end())
    cout << "mapInt2String[4] = " << mapInt2String[4];

cout << "\r\nMap Int2String has " << mapInt2String.size() 
     << " elements\r\n\r\n\r\n\r\n";

This outputs the following:

Erase rogue element....
mapInt2String has 5 elements
mapInt2String has 5 elements

In other words, our second attempt to access a non-existent element did not result in an unwanted insertion.

Map Iterators

Being an STL container, a map uses iterators to control access to it's elements. However, a map stores two values in each location, so what does the iterator give us to deal with this ? An iterator from a map returns a pair<key, value>, and the two values involved can be accessed using pair.first and pair.second respectively. The following code illustrates this

std::map<int string,>::iterator itMap = mapInt2String.begin();

while (itMap != mapInt2String.end())
{
    cout << "First element = " << itMap->first << endl;
    cout << "Second element = " << itMap->second << endl;
    ++itMap;
}

This will output the following

First element = 0
Second element = string number 0
First element = 2
Second element = string number 2
First element = 3
Second element = string number 3
First element = 5
Second element = string number 5
First element = 6
Second element = the quick brown fox jumped over the lazy dog

The first and second elements provide read and write access, so that an element can be changed as follows:

itMap = mapInt2String.find(0);

if (itMap != mapInt2String.end())
    itMap->second = "Would you, could you in a car ? "
                         "Eat them, eat them, here they are !!";

Printing out the list again would reveal that this element has now been changed. Accessing first allows changing of the key for an element in the same way.

Having established this, a better way to access an element would be to catch the iterator returned by our find test, and call second on it, as using the [] operators after establishing the existence of an element requires two lookups, whereas storing the iterator means only one is required.

Using multiset/multimap

The set and map both require a unique key, that is to say, inserting the same value into a set twice results in only one insertion, the same is true of two elements with the same key into a map. Where more than one element of the same value is required, we must use multiset/multimap. These containers work the same as thier single element cousins, but have some extra functions to assist us in using them

Function	Returns
count	number of elements with a given value
lower_bound	first element that contains a value
upper_bound	last element that contains a value
equal_range	pair of iterators that define the range that contains a given value.

I hope you've found this article useful, and that it inspires you to make use of the STL. Up until now I have spoken mainly about the different containers the STL provides ( although some remain uncovered as of yet ), and how they can be specialised. However, the real power of the STL lies in the way it interfaces with the algorithms the standard library provides. A selection of these algorithms have appeared in the series to date, but the next article will seek to list a good number of them and explore the ways in which they can be used.

See you then.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here