Click here to Skip to main content
Click here to Skip to main content

Thread Local Storage - The C++ Way

By , 27 Aug 2004
Rate this:
Please Sign up or sign in to vote.


Global data, while usually considered poor design, nevertheless often is a useful means to preserve state between related function calls. When it comes to using threads, the issue unfortuantely is complicated by the fact that some access synchronisation is needed, to avoid that more than one thread will modify the data.

There are times when you will want to have a globally visible object, while still having the data content accessible only to the calling thread, without holding off other threads that contend for the "same" global object. This is where thread local storage (TLS) comes in. TLS is something the operating system / threading subsystem provides, and by its very nature is rather low level.

From a globally visible object (in C++) you expect that its constructors are getting called before you enter "main", and that it is disposed properly, after you exit from "main". Consequently one would expect a thread local "global" object beeing constructed, when a thread starts up, and beeing destroyed when the thread exits. But this is not the case! Using the native API one can only have TLS that needs neither code to construct nor code to destruct.

While at first glance this is somewhat disappointing, there are reasons, not to automatically instantiate all these objects on every thread creation. A clean solution to this problem is presented e.g. in the "boost" library. Also the standard "pthread" C library addresses this problem properly. But when you need to use the native windows threading API, or need to write a library that, while making use of TLS, has no control over the threading API the client code is using, you are apparently lost.

Fortunately this is not true, and this is the topic of this article. The Windows Portable Executable (PE) format provides for support of TLS-Callbacks. Altough the documentation is hard to read, it can be done with current compilers i.e. MSVC 6.0,7.1,... Since noone else seemingly was using this feature before, and not even the C runtime library (CRT) is making use of it, you should be a little careful and watch out for undesired behaviour. Having said, that the CRT does not use it, does not mean it does not implement it. Unfortunately there is a small bug present in the MSVC 6.0 implementation, that is also worked-around by my code.

If it turns out, that the concepts, presented in this article, prove to be workable in "real life", I would be glad if this article has helped to remove some dust from this topic and make it usable for a broader range of applications. I could e.g. think of a generalized atexit_thread function that makes use of the concepts presented here.

Before going to explain the gory details, I want to mention Aaron W. LaFramboise who made me aware of the existence of the TLS-Callback mechanism.

Using the code

If you are using the precompiled binaries, you simply will need to copy the *.lib files to a convenient directory where your compiler usually will find libraries. So you will copy the files from the include directory to a directory where your compiler searches for includes. Alternatively you may simply copy the files to your project directory.

The following is a simple demonstration of usage, to get you started.

#include <process.h>
// first include the header file
#include <tls.h>

// this is your class
struct A {
    A() : n(42) {
    ~A() {
    int the_answer_is() {
        int m = n;
        n = 0;
        return m;
int n;

// now define a tls wrapper of class A
tls_ptr<A> pA;

// this is the threaded procedure
void  run(void*)
    // instantiate a new "A"
    pA.reset(new A);

    // access the tls-object    
    ans = pA->the_answer_is();

    // note, that we do not need to deallocate
    // the object. This is getting done automagically
    // when the thread exits.

int main(int argc, char* argv[])
    // the main thread also gets a local copy of the tls.
    pA.reset(new A);

    // start the thread
    _beginthread(&run, 0, 0);

    // call into the main threads version

    // the "run" thread should have ended when we
    // are exiting.
    // again we do not need to free our tls object.
    // this is comparable in behaviour to objects
    // at global scope.
    return 0;

While at first glance it might appear natural that the tls-objects should not be wrapped as pointers, in fact it is not. While the objects are globally visible, they are still "delegates" that forward to a thread local copy. The natural way in C++ to express delegation is a pointer object. (The technical reason of course is, that you cannot overload the "." operator but "->" can be overloaded.)

You can use this mechanism when building a "*.exe" file of course, but you also can use it when building a "*.dll" image. However when you are planning to load your DLL by LoadLibary() you should define the macro TLS_ALLOC when building your DLL. This is not necessary when using your DLL by means of an import library. A similar restriction applies when delay-loading your DLL. Please consult your compiler documentation when you are interested in the reasons for this. (Defining TLS_ALLOC forces the use of the TlsAlloc() family functions from the Win32 API.)

The complete API is kept very simple:

tls_ptr<A> pA;         // declare an object of class A
pA.reset(new A);       // create a tls of class A when needed
pA.reset(new A(45));   // create a tls of class A with a custom constructor
                       // note, that this also deletes any prior objects
                       // that might have been allocated to pA
pA.release();          // same as pA.reset(0), releases the thread local <BR>                       // object
A& refA = *pA;         // get a temporary reference to the contained object<BR>                       // for faster access
pA->the_answer_is();   // access the object 

Please again note, that it is not necessary to explicitely call the destructors of your class (or release()). This is very handy, when you are writing a piece of code, that has no control over the calling threads, but must still be multithread safe. One caveat however: The destructors of your class are called _after_ the CRT code has ended the thread. Consequently when you are doing something fancy in your destructors, which causes the CRT to reallocate its internal thread local storage pointers, you will be left with a small memory leak of the CRT. This is comparable in effect to the case when you are using the native Win32 API functions to create a thread, instead of _beginthread().

In principle that is all you need. But wait! I mentioned a small bug in the version 6 of the compiler. Luckily it is easy to work around. I provided an include file tlsfix.h which you will need to include into your program. You need to make sure it is getting included before windows.h. To be more precise: the TLS library must be searched before the default CRT library. So you alternatively may specify the library on the command line on the first place, and omit the inclusion of tlsfix.h.


I will not discuss the user interface in this place. It suffices to say, that it essentialy is the same as in the boost library. However I omitted the feature of beeing able to specify arbitrary deleter functions, since this would have raised the need to include the boost library in my code. I wanted to keep it small and just demonstrate the principles. However, my implementation also deviates from boost insofar as I am featuring native compiler support for TLS variables, thus gaining an almost 4 times speed improvement. No need to say, that my implementation of course is Windows specific.

When thinking about TLS for C++ the main question is how to run the constructors and destructors. A careful study of the PE format (e.g. in the MSDN library) reveals, that it almost ever provided for TLS support. (Thanks again to Aaron W. LaFramboise who read it carefully enough.) Of special interest is the section about TLS-Callback:

The program can provide one or more TLS callback functions (though Microsoft 
compilers do not currently use this feature) to support additional 
initialization and termination for TLS data objects. A typical reason to use 
such a callback function would be to call constructors and destructors for 

Well it is true, that the compilers do not use the feature, but there is nothing that prevents user code to use it though. One somehow must convince the compiler (to be honest it is the linker) to place your callback in a manner, so the operating system will call it. It turns out, that this is surprisingly simple (omitting the deatils for a moment).

// declare your callback
void NTAPI on_tls_callback(PVOID h, DWORD dwReason, PVOID pv)
    if( DLL_THREAD_DETACH == dwReason )

// put a pointer in a special segment
#pragma data_seg(".CRT$XLB")
PIMAGE_TLS_CALLBACK p_thread_callback = on_tls_callback;
#pragma data_seg()

You can even add more callbacks, by appending pointers to the ".CRT$XLB" segment. The fancy definitions are available from the windows.h and winnt.h include files in turn.

Now about the details: You will find at times, that your callbacks are not getting called. The reason for this is when the linker does not correctly wire up your segments. It turns out, that this coincides with when you are not using any __declspec(thread) in your code. A further study of the PE format description reveals:

The Microsoft run-time library facilitates this process by defining a memory image of the TLS Directory and giving it the special name “__tls_used” (Intel x86 platforms) or “_tls_used” (other platforms). The linker looks for this memory image and uses the data there to create the TLS Directory. Other compilers that support TLS and work with the Microsoft linker must use this same technique.

Consequentyly, when the linker does not find the _tls_used symbol it won't wire in your callbacks. Luckily this is easy to circumvent:

#pragma comment(linker, "/INCLUDE:__tls_used")

This will pull in the code from CRT that manages TLS. When using a version 7 compiler, that is all you need. (Actually I tried this with 7.1.) It turns out, however that using a version 6 compiler does not work. But the operating system cannot be the culprit, since code compiled by version 7 does work properly. After a little guess-work you will find out, that the CRT code from version 6 is slightly broken, because it inserts a wrong offset to the callback table. It is easy then to replace the errenous code and convince the linker to wire in the work around before the broken version from the CRT. You can study the tlsfix.c file from my submission, if you are interested in the details.

Points of Interest

Which is the first function of your program that is getting called by the operating system? Of course it is not main(). This was easy. Then mainCRTStartup specified as the entry-point in the linker comes to mind. Wrong again. Interestingly the first function beeing called is the Tls-Callback with Reason == DLL_PROCESS_ATTACH. But wait. Don't rely on this. This is not true on WinXP. I observed this on Win2000 only.

I did not yet try the code on Win95/98, WinXP-Home-Edition and Win2003. I would be interested on feedback about using this code on these platforms. In principle it should work, because it is a feature of PE and not the operating system, but ...


08.28.2004 Uploaded documentation, source and sample code.


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Roland Schwarz

Austria Austria
No Biography provided

Comments and Discussions

Question64-bit version not working Pinmemberroman3137-Jan-13 3:28 
SuggestionReferences PinmemberElias Bachaalany15-Nov-12 15:02 
GeneralMy vote of 1 Pinmemberbrianbacon17-Jan-11 10:52 
GeneralSource Download Broken Pinmemberpeterchen27-Apr-09 1:57 
GeneralRe: Source Download Broken PinmemberJohn M. Dlugosz22-Mar-10 6:08 
GeneralRe: Source Download Broken Pinmemberygen24-Jun-10 5:11 
GeneralRe: Source Download Broken PinmemberCrulex5-Jul-11 0:09 
QuestionHow to use this? PinmemberMember 6053403-Apr-09 14:38 
QuestionWhere is the code? Pinmemberphilippec61312-Sep-08 2:53 
QuestionNo callback notification after main() PinmemberRicky Lung11-Sep-07 4:00 
QuestionNot able to Register ActiveX component after defining using tls_ptr PinmemberAnand Todkar31-Jan-07 1:01 
AnswerRe: Not able to Register ActiveX component after defining using tls_ptr Pinmemberswampmonster28-Jan-09 5:29 
GeneralEasy fix for VC6 PinmemberNeounk20-Dec-06 3:08 
GeneralThanks for the code! PinmemberAndreas Schönle1-Dec-04 5:34 
GeneralRe: Thanks for the code! PinmemberRoland Schwarz1-Dec-04 8:56 
>I included the class in a base dll of a larger project, which in turn is used by many plugins implemented as shared libraries.
Possibly this is also the reason that ThreadTerm() was never called ?!
This is strange. Can you please give more detail?
Which compiler are you using?
Did you link to the libraries? (On msvc6: did you include tlsfix.h?)
Is your base dll using the C-runtime? (mandatory!)
Where did you declare TLS variables?
Are you using COM? In this case use the TLS_ALLOC since COM uses LoadLibrary internally.
(Altough this should not affect callbacks.)
>1) I added a static member function OnDllMain(DWORD dwReason) to the class which is called from DllMain instead of the callback:
This is ok, but shouldn't be necessary.
>2) If the tls_ptr is a global variable in another module, it can be destroyed before the ThreadTerm function is called. I therefore added a destructor to the basic_tls class that calls ThreadTerm().
There might be some issues, if destructors of global objects use the classes. I am also not totally sure whether this is the final call to ThreadTerm relevant to the basic_tls instance under all circumstances.
Perhaps this is not obvious from the code, but there is no problem when the destructor is called
too early, since it is a no-op. One of the tricks is to use a static global structure to call
a list of deletes. The global object is only needed to 1) establish a common access point and
2) let the constuctor initialize this static list.
Adding a destructor that calls basic_tls::thread_term might result in releasing memory
too early since global destructor calls are only specified with respect to their respective
constructor calls (LIFO). However the latter are in an unspecified order. So it might happen
that you try to reference a TLS from a global destructor of your own while the memory has
been released yet. If however you are not using global ctor/dtors, you won't notice the effect.

GeneralRe: Thanks for the code! PinmemberAndreas Schönle1-Dec-04 21:49 
GeneralDownload is not working PinsussAnonymous29-Aug-04 1:03 
GeneralRe: Download is not working PinmemberRoland Schwarz29-Aug-04 22:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140415.2 | Last Updated 28 Aug 2004
Article Copyright 2004 by Roland Schwarz
Everything else Copyright © CodeProject, 1999-2014
Terms of Use
Layout: fixed | fluid