Table of Contents
Sharing and locking are avoided by each thread having their own copies of the objects. This article shows how to avoid unnecessary object allocation/sharing by using thread_local
.
This is a simple regex_match
example. Note: std::
namespace prefix has been removed for easier reading. regex
is an expensive object to construct.
const string local_match(const string& text)
{
string price = "";
smatch what;
const regex regex(REG_EXP);
if (regex_match(text, what, regex))
{
price = what[1];
}
return price;
}
What if we just construct regex
once by making it a static variable? What is the performance gain?
const string static_match(const string& text)
{
string price = "";
smatch what;
static const regex regex(REG_EXP);
if (regex_match(text, what, regex))
{
price = what[1];
}
return price;
}
We benchmark by looping 2 million times. We gain 48% performance.
local regex object:20051ms
static regex object:10304m
This is the benchmarking code.
const int LOOP = 1000000;
const int THREADS = 4;
class timer
{
public:
timer() = default;
void start_timing(const string& text_)
{
text = text_;
begin = chrono::high_resolution_clock::now();
}
void stop_timing()
{
auto end = chrono::high_resolution_clock::now();
auto dur = end - begin;
auto ms = chrono::duration_cast<chrono::milliseconds>(dur).count();
cout << setw(35) << text << ":" << setw(5) << ms << "ms" << endl;
}
private:
string text;
chrono::steady_clock::time_point begin;
};
#ifdef WIN32
#pragma optimize("", off)
template <class T>
void do_not_optimize_away(T* datum) {
datum = datum;
}
#pragma optimize("", on)
#else
static void do_not_optimize_away(void* p) {
asm volatile("" : : "g"(p) : "memory");
}
#endif
const string REG_EXP = ".*PRICE:.*US\\$(\\d+\\.\\d+|[-+]*\\d+).*PER SHARE";
int main(int argc, char* argv[])
{
string str1 = "Zoomer PRICE: US$1.23 PER SHARE";
string str2 = "Boomer PRICE: US$4.56 PER SHARE";
vector<string> vec;
vec.push_back(str1);
vec.push_back(str2);
timer stopwatch;
stopwatch.start_timing("local regex object");
for(int j = 0; j < LOOP; ++j)
{
for(size_t i = 0; i < vec.size(); ++i)
{
do_not_optimize_away(local_match(vec[i]).c_str());
}
}
stopwatch.stop_timing();
stopwatch.start_timing("static regex object");
for(int j = 0; j < LOOP; ++j)
{
for(size_t i = 0; i < vec.size(); ++i)
{
do_not_optimize_away(static_match(vec[i]).c_str());
}
}
stopwatch.stop_timing();
return 0;
}
But static_match()
is not thread-safe. We can make it thread-safe by using C++11 thread_local
.
const string thread_local_match(const string& text)
{
string price = "";
smatch what;
thread_local const regex regex(REG_EXP);
if (regex_match(text, what, regex))
{
price = what[1];
}
return price;
}
This is performance result of thread_local
regex
object which is slightly worse than static regex
object but hey, we can use it in multithreading!
local regex object:20051ms
static regex object:10304ms
thread_local regex object:12266ms
Let's benchmark the thread_local
code under multithreading.
local regex object(4 threads): 6446ms
thread_local regex object(4 threads): 3696ms
Avoid Allocation/Sharing/Locking
- By letting each thread has its own copies of object/data
- For pre-C++11 code, use per-thread singleton
- Extra note: The thread_local example could be impractical due to real-world scenarios usually involves passing the already-created objects from the main thread to the worker-thread. So create each object (either in the main thread or through factory method) for each thread to be keep in a member variable.
The full benchmarking results is reposted here.
local regex object:20051ms
static regex object:10304ms
thread_local regex object:12266ms
local regex object(4 threads): 6446ms
thread_local regex object(4 threads): 3696ms