Click here to Skip to main content
15,886,362 members
Articles / Programming Languages / C++

Fast, Low Latency and Non-synchronizing Trace Class

Rate me:
Please Sign up or sign in to vote.
4.75/5 (3 votes)
16 Apr 20044 min read 59.8K   341   30   9
An article about a ready to use trace class specialized for multithreaded programming.

Introduction

Did you ever try to find a wired multithreading or synchronization bug using the normally provided method like TRACE() or OutputDebugString()? Did your debug compile running fine, but your release application got a page fault?

A lot of useful information, which can be generated automatically, is missing in the traditional approaches and you have to provide it on your own. Most of the macros only work in debug mode and are not available in release mode.

But the major point is that, the timing and the flow of the application to be debugged is changed by using the debug tools.

I tried to find a thread synchronization bug in one of my applications and I developed the trace routines provided here as a solution.

Problems

The first problem is the timing, which is changed significantly, for example, by OutputDebugString(). Unfortunately, the time depends on the debugger attached to show the output.

I have done 1.000.000 traces in a loop and measured the time for it:

TRACE-Macro
Debug-Code
No Debugger attached
28 sec.
TRACE-Macro
Debug-Code
VS.NET 2003 attached
1600 sec. (based on 20.000 traces)
TRACE-Macro
Debug-Code
DbgView
210 sec. (based on 100.000 traces)
Tracer
Debug-Code
9 sec.
Tracer
Release-Code
5 sec.

As you can see, Tracer is much faster and available even in release mode code.

The second problem which comes with OutputDebugString() is that, this routine seems to be synchronized itself. By using this routine in two concurrent threads very often, you can shift the timing by the internal synchronization of OutputDebugString() ***.

Basics

The trace classes write their output to normal text files on your drive. By using some special tricks, there is as less synchronization and latency as possible. For each thread tracing some information, the tracer creates one file in the application's directory named like "<exe-name>. trace.<threadid>". Each line in the tracing has the form:

C++
00000128fd9f283f-00000320 : CCameraView::SetOptions() entered

The first part is the hardware timer value generated by QueryPerformanceCounter(). The second value is the thread id, which is generated by the tracing. By using the provided TraceMerge.exe utility, the different files can be merged to one big file, sorted by time.

The tracing is switched on and off using an environment variable. The name can be provided in the startup call.

Internals

The tracing code is implemented as a singleton class. The singleton can be accessed using CTracer::Get()->Trace(). The singleton maintains a map containing a TLSDataObject object for each thread. This class contains a big write buffer containing all the trace strings and the handle to the trace file. A critical section is provided for protecting the buffer.

When doing the first trace for a specific thread, the tracer create a TLSDataObject, stores it in the map and in a thread local storage (TLS) index. By using the TLS, a thread can access its own TLSDataObject without any synchronization to other threads. Only the insertion in the map (so the first trace) has to be synchronized.

Tracer starts a background thread which writes down the buffers of each thread. So for a short time, the tracing has to be stopped. But notice, that only the background thread and one application thread at a time are linked together.

Threads, which have been closed, are removed from the map from time to time to release the write buffer memory.

Using the Code

Include "singleton.h", "tracer.cpp", "tracer.h", "CritSecLocker.cpp" and "CritSecLocker.h" to your project. Include "tracer.h" in all sources you want to trace.

In InitInstance() (or something similar), startup the tracing using:

C++
CTracer::Get()->Startup("MYAPP_TRACE");

In ExitInstance() (or something similar) don't forget to call:

C++
CTracer::Get()->Shutdown();

In your code, use:

C++
CTracer::Get()->Trace("A number %d",123);

to produce a trace line. Note that appending "\n" is not allowed because it breaks the merging tool.

If you want to flush all available buffers, you may use:

C++
CTracer::Get()->Flush();

This writes all buffers of all threads and flushes the Win32 file buffers too. It's useful, for example, before aborting the application in an exception handler.

If you have a member function doing a lot of traces, providing a local reference is useful:

C++
void foo(void)
{
    CTracer &l_tracer = CTracer::Get();

    ...

    l_tracer->Trace("fewfewoifnwefewf");

    ...

    l_tracer->Trace("fewfewoifnwefewf");
}

Using TraceMerger

To create one trace file from all the different files for each thread, a small utility has been provided.

Use it like this:

TraceMerger.exe <Application Exe Path>

TraceMerger will get all the trace files and merge sort them based on time. The results will be saved to a new file named <Application Exe Path>.trace.merge.

License

This package is licensed under The Artistic License.

*** If you know something about the OutputDebugString() synchronization, tell it to me...


Written By
Web Developer
Germany Germany
Being a software developer since 1992, I have seen many different programming languages, operating systems and methodologies.

My primary technical interest are C++ and Windows-Programming, but due to my work as a project manager, I have even seen J2EE.

My everyday work since a couple of years is leading software development teams, mostly in an agile way.


Comments and Discussions

 
GeneralUnderstanding Win32 "OutputDebugString" Pin
Dandy Cheung28-Jul-08 5:13
Dandy Cheung28-Jul-08 5:13 
GeneralOutputDebugString Synchronization Pin
Toby Opferman9-May-04 9:36
Toby Opferman9-May-04 9:36 
GeneralRe: OutputDebugString Synchronization Pin
Toby Opferman9-May-04 9:56
Toby Opferman9-May-04 9:56 
GeneralRe: OutputDebugString Synchronization Pin
Toby Opferman9-May-04 12:36
Toby Opferman9-May-04 12:36 
Ok, so I showed you the difference above when having the kernel debugger setup and when having a local debugger setup. However, I didn't show when just having DbgView setup.

This is how it works when DBGVIEW is setup.

Basically, the same exception call is made as with the local debugger, however after attempting the debugger it fires off an exception.

kd> kb
ChildEBP RetAddr Args to Child
0012f8c4 77f92538 0012f988 0012ff1c 0012f9a4 KERNEL32!_except_handler3+0x3e
0012f8e8 77f8af89 0012f988 0012ff1c 0012f9a4 ntdll!ExecuteHandler+0x26
0012f970 77f9fb9a 0012f988 0012f9a4 0012f988 ntdll!RtlDispatchException+0x76
0012f970 77e89b01 0012f988 0012f9a4 0012f988 ntdll!KiUserExceptionDispatcher+0xe
0012fcc8 77ea0ce8 40010006 00000000 00000002 KERNEL32!RaiseException+0x56
0012ff2c 00401095 00403024 ffffffff 0012f88f KERNEL32!OutputDebugStringA+0x274
WARNING: Stack unwind information not available. Following frames may be wrong.
0012ff4c 004013c3 00000001 00803930 00802990 b+0x1095
0012ffc0 77e87903 ffffffff 0012f88f 7ffdf000 b+0x13c3
0012fff0 00000000 004012e0 00000000 000000c8 KERNEL32!BaseProcessStart+0x3d

As you can see, the exception handler comes out and this is then performed.

kd> u 77ea0b88
KERNEL32!OutputDebugStringA+0x5e:
77ea0b88 8b65e8 mov esp,[ebp-0x18]
77ea0b8b 33db xor ebx,ebx
77ea0b8d 895ddc mov [ebp-0x24],ebx
77ea0b90 895dd8 mov [ebp-0x28],ebx
77ea0b93 899dd0fdffff mov [ebp-0x230],ebx
77ea0b99 899dccfdffff mov [ebp-0x234],ebx
77ea0b9f a14cf0ed77 mov eax,[KERNEL32!gInteractiveLogonUserProcess+0x4 (77edf04c)]
77ea0ba4 3bc3 cmp eax,ebx
kd> u
KERNEL32!OutputDebugStringA+0x7c:
77ea0ba6 0f84a9100000 je KERNEL32!OutputDebugStringA+0x7e (77ea1c55)
77ea0bac 6aff push 0xff
77ea0bae 50 push eax
77ea0baf e8c47bfeff call KERNEL32!WaitForSingleObject (77e88778)
77ea0bb4 6850ffe977 push 0x77e9ff50
77ea0bb9 53 push ebx
77ea0bba 6a02 push 0x2
77ea0bbc e8f3fefeff call KERNEL32!OpenFileMappingA (77e90ab4)

What exactly is performed? Eventually we make it to the above function which, as you can see, opens a file mapping and is protected by a mutex. So, the process simply copies the string into the file mapping and signals an event. dbgview reads this event, copies the data and then signals another event. This process gets that event, wakes up and releases the mutex and leaves.

kd> dc 0x77e9ff50
77e9ff50 49574244 55425f4e 52454646 107d8b00 DBWIN_BUFFER

The file mapping is called "DBWIN_BUFFER"

A global variable holds the mutex. It's first checked if it exists before attempting to perform this operation.

ax=ffffffff ebx=00000000 ecx=00000000 edx=00000000 esi=00000000 edi=77ea1498
eip=77ea0b9f esp=0012fce0 ebp=0012ff2c iopl=0 nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
KERNEL32!OutputDebugStringA+0x75:
001b:77ea0b9f a14cf0ed77 mov eax,[KERNEL32!gInteractiveLogonUserProcess+0x4 (77edf04c)] ds:0023:77edf04c=00000060
kd>
eax=00000060 ebx=00000000 ecx=00000000 edx=00000000 esi=00000000 edi=77ea1498
eip=77ea0ba4 esp=0012fce0 ebp=0012ff2c iopl=0 nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
KERNEL32!OutputDebugStringA+0x7a:
001b:77ea0ba4 3bc3 cmp eax,ebx


kd> dd 77edf04c
77edf04c 00000060 77ee0458 00132d28 0010000e
77edf05c 77e9263c 77e80000 00000000 00000000
77edf06c 00000000 00000000 00000000 00000000
77edf07c 00000000 00000000 00000000 00000000
77edf08c 00000000 00000000 ffffffff 77fcd580
77edf09c ffffffff 00000000 00000000 00000000
77edf0ac 00000000 00000005 00000000 ffffffff
77edf0bc fc009c05 38000000 10000000 00000000
kd> !handle 60 ff
processor number 0
PROCESS fccbf4a0 SessionId: 0 Cid: 039c Peb: 7ffdf000 ParentCid: 00cc
DirBase: 024df000 ObjectTable: fcc9f008 TableSize: 22.
Image: b.exe

Handle Table at e1db9000 with 22 Entries in use
0060: Object: fcd51930 GrantedAccess: 001f0001 (Inherit)
Object: fcd51930 Type: (fcebda40) Mutant
ObjectHeader: fcd51918
HandleCount: 2 PointerCount: 3
Directory Object: fcdc2f10 Name: DBWinMutex


The global mutex is called "DBWinMutex".

Once this mutex is acquired, the file map is opened and mapped. Then, they open another mutex:

eax=00820000 ebx=00000000 ecx=00010101 edx=ffffffff esi=00000000 edi=77ea1498
eip=77eab91a esp=0012fce0 ebp=0012ff2c iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
KERNEL32!OutputDebugStringA+0xcd:
001b:77eab91a 686c2aea77 push 0x77ea2a6c
kd>
eax=00820000 ebx=00000000 ecx=00010101 edx=ffffffff esi=00000000 edi=77ea1498
eip=77eab91f esp=0012fcdc ebp=0012ff2c iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
KERNEL32!OutputDebugStringA+0xd2:
001b:77eab91f 53 push ebx
kd>
eax=00820000 ebx=00000000 ecx=00010101 edx=ffffffff esi=00000000 edi=77ea1498
eip=77eab920 esp=0012fcd8 ebp=0012ff2c iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
KERNEL32!OutputDebugStringA+0xd3:
001b:77eab920 6800001000 push 0x100000
kd>
eax=00820000 ebx=00000000 ecx=00010101 edx=ffffffff esi=00000000 edi=77ea1498
eip=77eab925 esp=0012fcd4 ebp=0012ff2c iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
KERNEL32!OutputDebugStringA+0xd8:
001b:77eab925 e8e4ccfdff call KERNEL32!OpenEventA (77e8860e)
kd> dc 0x77ea2a6c
77ea2a6c 49574244 55425f4e 52454646 4145525f DBWIN_BUFFER_REA
77ea2a7c 90005944 72666552 49687365 6946696e DY..

This event is called "DBWIN_BUFFER_READY" to signal when the buffer is ready to be copied. Then it also opens after this another event called "DBWIN_DATA_READY"

So, the whole layout goes like this:

Get DBWinMutex
Open mapview file "DBWIN_BUFFER"
Wait for DBWIN_BUFFER_READY
copy data
signal DBWIN_DATA_READY
close handles
close map view
release DBWinMutex

If the above is not setup and the globals are not setup do do this, the function defaults to the kernel debugger.

So, this is how it occurs.

1. Try Local debugger
2. If local debugger failed, generate exception handler.
3. Exception handler detects if a dbgview type viewer is
listenting.
4. If dbgview type debugger is not listening, go back to kernel with DbgPrint().
5. Attempt to send data to kernel debugger if it is listening.

And, they all do synchronous access but only 1 will get the output. And, it appears that this is the order of precedence, at least in the setup I tried on Windows 2000 Professional.




QuestionVC 6? Pin
WREY17-Apr-04 9:08
WREY17-Apr-04 9:08 
AnswerRe: VC 6? Pin
Thomas Freudenberg18-Apr-04 5:25
Thomas Freudenberg18-Apr-04 5:25 
GeneralRe: VC 6? Pin
WREY18-Apr-04 7:12
WREY18-Apr-04 7:12 
GeneralRe: VC 6? Pin
Tim Hagemann19-Apr-04 1:29
Tim Hagemann19-Apr-04 1:29 
GeneralRe: VC 6? Pin
Toby Opferman9-May-04 8:16
Toby Opferman9-May-04 8:16 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.