An Efficient String to Unsigned int ID Struct






4.58/5 (5 votes)
C++ optimization for map using string key among others
Table of Contents
Introduction
The base class of the kigs framework (CoreModifiable
) provides access to some member variables, called attributes, using a string name:
// set "IntValue" parameter to 16
instance->setValue("IntValue", 16);
// retrieve value using templated "getValue" method
int v = instance->getValue<int>("IntValue");
// or with reference parameter
int v1;
if(!instance->getValue("IntValue",v1))
{
// instance does not own "IntValue" parameter
}
The instance factory also uses string names for classes:
myApplicationTimer = KigsCore::GetInstanceOf("ApplicationTimer", "Timer");
And we have a lot of other cases where strings are used to access values stored in maps (ordered or unordered).
But for performance reasons, it's not a good choice to use a std::string as a map key.
The KigsID Struct
We looked for a way to convert string
(const char*
or std::string
) to unsigned int ID
in an efficient way. The string
s we use are generally one or two alphanumeric words, and the map contains at most a few dozen values. We wanted to avoid collisions (having the same ID for two different string
s) for this kind of usage.
Here is the KigsID struct
code:
struct KigsID
{
KigsID() = default;
KigsID(const KigsID& other) = default;
KigsID(KigsID&& other) = default;
KigsID& operator=(const KigsID& other) = default;
KigsID& operator=(KigsID&& other) = default;
bool operator==(const KigsID& other) const
{
return (_id == other._id);
}
bool operator<(const KigsID& other) const
{
return _id < other._id;
}
unsigned int toUInt() const { return _id; }
#ifdef KEEP_NAME_AS_STRING
template<size_t _Size> KigsID(const char(&aid)[_Size]) : _id_name(aid),
_id(CharToID::GetID(aid)) {};
KigsID(const kstl::string& aid) : _id_name(aid), _id(CharToID::GetID(aid)) {};
KigsID(const kstl::string_view& aid) : _id_name(aid), _id(CharToID::GetID(aid)) {};
KigsID(unsigned int aid) : _id_name("*unknown*"), _id(aid) {};
KigsID& operator=(const kstl::string& aid)
{ _id_name = aid; _id = CharToID::GetID(aid); return *this; };
template<size_t _Size> KigsID& operator=(const char(&aid)[_Size])
{ _id_name = aid; _id = CharToID::GetID(aid); return *this; };
KigsID& operator=(unsigned int aid)
{ _id_name = "*unknown*"; _id = aid; return *this; };
// Extra name
// Don't set this field manually!
kstl::string _id_name;
#else
template<size_t _Size>
KigsID(const char(&aid)[_Size]) : _id(CharToID::GetID<_Size>(aid)) {};
KigsID(const kstl::string& aid) : _id(CharToID::GetID(aid)) {};
KigsID(const kstl::string_view& aid) : _id(CharToID::GetID(aid)) {};
//KigsID(const char*& aid) : _id(CharToID::GetID(aid)) {};
KigsID(unsigned int aid) : _id(aid) {};
template<size_t _Size>
KigsID& operator=(const char(&aid)[_Size])
{ _id = CharToID::GetID<_Size>(aid); return *this; };
KigsID& operator=(const kstl::string& aid)
{ _id = CharToID::GetID(aid); return *this; };
KigsID& operator=(const kstl::string_view& aid)
{ _id = CharToID::GetID(aid); return *this; };
//KigsID& operator=(const char*& aid) { _id = CharToID::GetID(aid); return *this; };
KigsID& operator=(unsigned int aid) { _id = aid; return *this; };
#endif
// Don't set this field manually!
unsigned int _id;
};
The define KEEP_NAME_AS_STRING
is set when building in debug mode, so that it's easy when debugging to see the original string
used to initialize the ID.
Here is a sample code to illustrate this:
void Sample::KigsIDTest(const KigsID& id)
{
printf("KigsID unsigned int value = %08x\n", id._id);
printf("sizeof(KigsID) = %d\n", sizeof(KigsID));
}
The KigsIDTest
method is called like this:
KigsIDTest("HelloWorld");
And here is the id
variable inspection in debug mode in Visual Studio C++ debugger:
If KEEP_NAME_AS_STRING
is not defined, only unsigned int _id
member is available.
Points of Interest
So the interesting part now is that in release mode, with optimizations activated, a call to KigsIDTest("HelloWorld");
outputs this:
So sizeof(KigsID)
is 4 bytes (32 bits unsigned int
).
Another interesting thing is that here, the "HelloWorld
" string
conversion to int is evaluated at compile time as we can see if we look at the disassembly code:
It is the case for const char*
strings up to 19 characters with Visual C++ compiler (x86 build).
For longer string
s or std::string
, the ID is evaluated at run time using a few logical operations (XOR
, ROL
) on 4 bytes packets.
Benchmark
To test the performances of the structure, we have set up a little benchmark code. For the use we make of it, the critical part is the access to the map values, so the benchmark focuses on that.
Benchmark Code
template<typename maptype, typename vectorkeys>
double TestKigsIDPerfs::testUMap()
{
// get start time
double startTime = mApplicationTimer->GetTime();
// create empty unordered_map
maptype testMap;
// create a vector containing all the keys we are going to test
std::vector<vectorkeys> testVector;
// some "similar keys"
testVector.push_back("testString1");
testVector.push_back("testString2");
testVector.push_back("testString3");
testVector.push_back("testString4");
testVector.push_back("testString5");
// some random keys
testVector.push_back("ajzhgjhsqfd");
testVector.push_back("xcwvksqjdfh");
testVector.push_back("nzsbdocdsdf");
testVector.push_back("zieulkjvsqd");
testVector.push_back("poizerbhncs");
// some incremental length keys
testVector.push_back("az");
testVector.push_back("aze");
testVector.push_back("azer");
testVector.push_back("azert");
testVector.push_back("azerty");
testVector.push_back("azertyu");
testVector.push_back("azertyui");
testVector.push_back("azertyuio");
testVector.push_back("azertyuiop");
testVector.push_back("azertyuiopq");
// fill map, associating a unsigned int to each key
for (unsigned loop = 0; loop < testVector.size(); loop++)
{
testMap[testVector[loop]] = (loop + 1) * (loop + 1);
}
// test a lot of map access
int vectorSize = testVector.size();
unsigned int voidComputation = 0;
for (unsigned int accessloop = 0; accessloop < (1000 * 1000 * 100); accessloop++)
{
// some computation, just to be sure the compiler really access the map
voidComputation = voidComputation << 2;
voidComputation ^= testMap[testVector[accessloop % vectorSize]];
}
// get end time
double endTime = mApplicationTimer->GetTime();
// print computation result, should be the same for all the different tests
printf("voidComputation = %d\n", voidComputation);
// return benchmark time
return endTime - startTime;
}
We tested two kinds of unordered maps: std::unordered_map
and robin_hood::unordered_map
(https://github.com/martinus/robin-hood-hashing)
We also tested with the default hash method, and with the KigsIDHash
method, just returning the ID itself:
struct KigsIDHash
{
std::size_t operator()(const KigsID& k) const
{
return k._id;
}
};
Here are all the different benchmark calls:
// unordered map
double ti=testUMap<std::unordered_map<std::string,unsigned int>, std::string >();
printf("Unordered map with string map keys and string access: %f s\n", ti);
ti = testUMap<std::unordered_map<KigsID, unsigned int>, std::string >();
printf("Unordered map with KigsID map keys and string access: %f s\n", ti);
ti = testUMap<std::unordered_map<KigsID, unsigned int>, KigsID>();
printf("Unordered map with KigsID map keys and KigsID access: %f s\n", ti);
ti = testUMap<std::unordered_map<KigsID, unsigned int, KigsIDHash>, KigsID>();
printf("Unordered map with KigsID map keys, KigsID access and KigsIDHash: %f s\n", ti);
// robin hood unordered map
ti = testUMap<robin_hood::unordered_map<std::string,unsigned int>, std::string>();
printf("Robin hood unordered map with string map keys and string access: %f s\n", ti);
ti = testUMap<robin_hood::unordered_map<KigsID, unsigned int>, std::string>();
printf("Robin hood unordered map with KigsID map keys and string access: %f s\n", ti);
ti = testUMap<robin_hood::unordered_map<KigsID, unsigned int>, KigsID>();
printf("Robin hood unordered map with KigsID map keys and KigsID access: %f s\n", ti);
ti = testUMap<robin_hood::unordered_map<KigsID, unsigned int, KigsIDHash>, KigsID>();
printf("Robin hood unordered map with KigsID map keys,
KigsID access and KigsIDHash: %f s\n", ti);
Building the Benchmark
To easily build and test the code (Git, Microsoft Visual Studio and CMake needed, see Kigs framework Getting Started or watch this video tutorial), clone https://github.com/Kigs-framework/kigs. Then download and extract TestKigsIDPerfs.zip in the kigs/projects directory.
Open kigs/projects/CMakeLists.txt with your favorite text editor, and add the following line:
add_subdirectory(TestKigsIDPerfs)
Save and close CMakeLists.txt.
The solution generation script : kigs/scripts/generateWinCMake.bat works with Visual Studio 2022 now. If you work with Visual Studio 2019, please edit generateWinCMake.bat and change "Visual Studio 17 2022" by "Visual Studio 16 2019" before launching the script.
Go to kigs/scripts directory and launch generateWinCMake.bat script (double click on it).
Then go to Build/solutionWinCMake/kigs/projects/TestKigsIDPerfs directory and double click TestKigsIDPerfs.sln to open the solution in Visual Studio.
In Visual Studio, select TestKigsIDPerfs
as your "startup project" and StaticRelease
as your build configuration. Generate and launch.
Benchmark Results
The benchmark was built with full optimizations and launched on:
- a Windows 10 PC (Core i7-4790K 4.00 Ghz), build with Visual Studio 2019 compiler.
- Firefox (HTML5) on the same Windows 10 PC, build with Emscripten under WSL.
- an Android Samsung Galaxy Tab S2 (armebiv7a), build with CLANG 5.0 compiler under Visual Studio 2019.
- Hololens 2 (Arm64), build under Visual Studio 2019.
All times are in seconds.
Win10 | HTML5 | Android | Hololens 2 | |||||
std::unordered_map, string key, string access, std::hash | 2.765395 | 100% | 5.633000 | 100% | 25.914861 | 100% | 8.705835 | 100% |
---|---|---|---|---|---|---|---|---|
std::unordered_map, KigsID key, string access, std::hash | 1.785608 | 155% | 2.444000 | 230% | 19.587981 | 132% | 5.171553 | 168% |
std::unordered_map, KigsID key, KigsID access, std::hash | 0.722537 | 383% | 1.947000 | 289% | 17.237732 | 150% | 2.843732 | 306% |
std::unordered_map, KigsID key, KigsID access, KigsIDHash | 1.002372 | 276% | 1.916000 | 294% | 17.229091 | 150% | 2.866783 | 304% |
robin_hood::unordered_map, string key, string access, robin_hood::hash | 3.930436 | 70% | 4.103000 | 137% | 15.034525 | 172% | 7.228063 | 120% |
robin_hood::unordered_map, KigsID key, string access, robin_hood::hash | 2.069901 | 134% | 2.766000 | 204% | 10.752449 | 241% | 4.625862 | 188% |
robin_hood::unordered_map, KigsID key, KigsID access, robin_hood::hash | 1.138290 | 243% | 1.638000 | 344% | 8.765440 | 296% | 2.956348 | 294% |
robin_hood::unordered_map, KigsID key, KigsID access, KigsIDHash | 1.142882 | 242% | 1.647000 | 342% | 8.845147 | 293% | 2.946729 | 295% |
Conclusion
Using the KigsID
structure as the access key for unordered maps is always a good choice for our usage in the Kigs framework. On the contrary, using KigsIDHash
seems to be a bad idea.
Choosing between std::unordered_map and robin_hood::unordered_map depends a lot on the compiler and targeted platform.
If you want to have a closer look at the KigsID
code, you can find all the Kigs framework code here:
The KigsID
structure is defined here:
You can also start reading the introduction to the Kigs framework article series on CodeProject:
And follow us on Twitter or YouTube !
History
- 27th January, 2020: Initial version
- 19th February, 2020: Added Table of Contents, Benchmark and source code download
- 21th June, 2022: change links to the Kigs-framework GitHub repo, fixed the code and added link to video tutorial