Click here to Skip to main content
15,891,136 members
Articles / Programming Languages / C++

Fast, Low Latency and Non-synchronizing Trace Class

Rate me:
Please Sign up or sign in to vote.
4.75/5 (3 votes)
16 Apr 20044 min read 59.8K   341   30   9
An article about a ready to use trace class specialized for multithreaded programming.

Introduction

Did you ever try to find a wired multithreading or synchronization bug using the normally provided method like TRACE() or OutputDebugString()? Did your debug compile running fine, but your release application got a page fault?

A lot of useful information, which can be generated automatically, is missing in the traditional approaches and you have to provide it on your own. Most of the macros only work in debug mode and are not available in release mode.

But the major point is that, the timing and the flow of the application to be debugged is changed by using the debug tools.

I tried to find a thread synchronization bug in one of my applications and I developed the trace routines provided here as a solution.

Problems

The first problem is the timing, which is changed significantly, for example, by OutputDebugString(). Unfortunately, the time depends on the debugger attached to show the output.

I have done 1.000.000 traces in a loop and measured the time for it:

TRACE-Macro
Debug-Code
No Debugger attached
28 sec.
TRACE-Macro
Debug-Code
VS.NET 2003 attached
1600 sec. (based on 20.000 traces)
TRACE-Macro
Debug-Code
DbgView
210 sec. (based on 100.000 traces)
Tracer
Debug-Code
9 sec.
Tracer
Release-Code
5 sec.

As you can see, Tracer is much faster and available even in release mode code.

The second problem which comes with OutputDebugString() is that, this routine seems to be synchronized itself. By using this routine in two concurrent threads very often, you can shift the timing by the internal synchronization of OutputDebugString() ***.

Basics

The trace classes write their output to normal text files on your drive. By using some special tricks, there is as less synchronization and latency as possible. For each thread tracing some information, the tracer creates one file in the application's directory named like "<exe-name>. trace.<threadid>". Each line in the tracing has the form:

C++
00000128fd9f283f-00000320 : CCameraView::SetOptions() entered

The first part is the hardware timer value generated by QueryPerformanceCounter(). The second value is the thread id, which is generated by the tracing. By using the provided TraceMerge.exe utility, the different files can be merged to one big file, sorted by time.

The tracing is switched on and off using an environment variable. The name can be provided in the startup call.

Internals

The tracing code is implemented as a singleton class. The singleton can be accessed using CTracer::Get()->Trace(). The singleton maintains a map containing a TLSDataObject object for each thread. This class contains a big write buffer containing all the trace strings and the handle to the trace file. A critical section is provided for protecting the buffer.

When doing the first trace for a specific thread, the tracer create a TLSDataObject, stores it in the map and in a thread local storage (TLS) index. By using the TLS, a thread can access its own TLSDataObject without any synchronization to other threads. Only the insertion in the map (so the first trace) has to be synchronized.

Tracer starts a background thread which writes down the buffers of each thread. So for a short time, the tracing has to be stopped. But notice, that only the background thread and one application thread at a time are linked together.

Threads, which have been closed, are removed from the map from time to time to release the write buffer memory.

Using the Code

Include "singleton.h", "tracer.cpp", "tracer.h", "CritSecLocker.cpp" and "CritSecLocker.h" to your project. Include "tracer.h" in all sources you want to trace.

In InitInstance() (or something similar), startup the tracing using:

C++
CTracer::Get()->Startup("MYAPP_TRACE");

In ExitInstance() (or something similar) don't forget to call:

C++
CTracer::Get()->Shutdown();

In your code, use:

C++
CTracer::Get()->Trace("A number %d",123);

to produce a trace line. Note that appending "\n" is not allowed because it breaks the merging tool.

If you want to flush all available buffers, you may use:

C++
CTracer::Get()->Flush();

This writes all buffers of all threads and flushes the Win32 file buffers too. It's useful, for example, before aborting the application in an exception handler.

If you have a member function doing a lot of traces, providing a local reference is useful:

C++
void foo(void)
{
    CTracer &l_tracer = CTracer::Get();

    ...

    l_tracer->Trace("fewfewoifnwefewf");

    ...

    l_tracer->Trace("fewfewoifnwefewf");
}

Using TraceMerger

To create one trace file from all the different files for each thread, a small utility has been provided.

Use it like this:

TraceMerger.exe <Application Exe Path>

TraceMerger will get all the trace files and merge sort them based on time. The results will be saved to a new file named <Application Exe Path>.trace.merge.

License

This package is licensed under The Artistic License.

*** If you know something about the OutputDebugString() synchronization, tell it to me...


Written By
Web Developer
Germany Germany
Being a software developer since 1992, I have seen many different programming languages, operating systems and methodologies.

My primary technical interest are C++ and Windows-Programming, but due to my work as a project manager, I have even seen J2EE.

My everyday work since a couple of years is leading software development teams, mostly in an agile way.


Comments and Discussions

 
GeneralUnderstanding Win32 "OutputDebugString" Pin
Dandy Cheung28-Jul-08 5:13
Dandy Cheung28-Jul-08 5:13 
GeneralOutputDebugString Synchronization Pin
Toby Opferman9-May-04 9:36
Toby Opferman9-May-04 9:36 
Calls to "OutputDebugString" will result in a call to "DbgPrint", whis is like "printf" for debugging it supports formatted strings.

The "DbgPrint" call will make it's way into the debugger by calling interrupt 2dh and eventually makes it's way to a trap API "KdpTrap".

This API will call "DkLogDbgPrint" as shown in this stack trace:

kd> kb
ChildEBP RetAddr Args to Child
fb419868 80535628 fb4198f0 6c6c6548 000a0d6f nt!KdLogDbgPrint+0xe9
fb41992c 8043021a fb419d64 00000000 fb419d10 nt!KdpTrap+0x2c1
fb419cf4 804624cb fb419d10 00000000 fb419d64 nt!KiDispatchException+0x160
fb419d5c 80462aa5 f78e7d70 8045265f 00000000 nt!CommonDispatchException+0x4d
fb419d5c 77f982fc f78e7d70 8045265f 00000000 nt!KiTrap03+0x97
0012fa10 77f98293 00000001 0012faa8 00000000 ntdll!DebugService+0x10
0012fa20 77f9825b 0012faa8 0012fd0b 0040302b ntdll!DebugPrint+0xd
0012fcd0 77ea0c5f 77e9ff4c 0012fd04 00130748 ntdll!DbgPrint+0x146
0012ff2c 00401095 00403024 00130748 0012f88f KERNEL32!OutputDebugStringA+0x1e6
WARNING: Stack unwind information not available. Following frames may be wrong.
0012ff4c 004013c3 00000001 00803930 00802990 beans+0x1095
0012ffc0 77e87903 00130748 0012f88f 7ffdf000 beans+0x13c3
0012fff0 00000000 004012e0 00000000 000000c8 KERNEL32!BaseProcessStart+0x3d

This is where the first part of Synchronous access begins. The "KdLogDbgPrint" will raise the IRQL level and acquire a spinlock. So, even if you have multiple processors everyone on the system is accessing this location synchronously.

There is now a global memory location which is the debug output string pointer. The latest debug message is added to this location and the tail pointer to this location is incremented.

(The above debug trace is from Windows 2000 professional).


The following assembly steps show the write pointer being incremented:

8042afa8 55 push ebp
kd>
eax=8054404f ebx=fb4198f0 ecx=80544056 edx=00000007 esi=805440c0 edi=805430c0
eip=8042afa9 esp=fb419854 ebp=00000007 iopl=0 nv up ei ng nz na po cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000287
nt!KdLogDbgPrint+0x5f:
8042afa9 ff7304 push dword ptr [ebx+0x4] ds:0023:fb4198f4=fb419874
kd>
eax=8054404f ebx=fb4198f0 ecx=80544056 edx=00000007 esi=805440c0 edi=805430c0
eip=8042afac esp=fb419850 ebp=00000007 iopl=0 nv up ei ng nz na po cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000287
nt!KdLogDbgPrint+0x62:
8042afac 50 push eax
kd>
eax=8054404f ebx=fb4198f0 ecx=80544056 edx=00000007 esi=805440c0 edi=805430c0
eip=8042afad esp=fb41984c ebp=00000007 iopl=0 nv up ei ng nz na po cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000287
nt!KdLogDbgPrint+0x63:
8042afad e888a21000 call nt!KdpMoveMemory (8053523a)
kd>
eax=00000007 ebx=fb4198f0 ecx=c03ed064 edx=00000000 esi=805440c0 edi=805430c0
eip=8042afb2 esp=fb419858 ebp=00000007 iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202
nt!KdLogDbgPrint+0x68:
8042afb2 0105c0405480 add [nt!KdPrintWritePointer (805440c0)],eax ds:0023:805440c0=8054404f

The function will then release the spinlock and lower the IRQL level before returning. The processing after this will vary, however for a kernel debugger the next step is to basically send this data to the debugger.

The first step in doing this is to call "KeEnterDebugger" which does some things such read the performance counters and call "KeFreezeExecution" to disable interrupts. The next step is for this function to return and "KdpPrintString". This is the function that actually sends the data to the kernel debugger. If you unassemble it you will notice a port send function.

Stepping over this function in the kernel debugger you will see that the debug output string was now printed in the kernel debugger. "KdExitDebugger" is now called, these functions are probably just used to synchronize access to the kernel debugger.

So, this is why OutputDebugString is synchronized with all threads on the system when you are using a kernel debugger. How about when you're running a local debugger?

If you step through OutputDebugString using the regular debugger you will notice that it calls "NtRaiseException"

The first thing the kernel will do is call "DbgkForwardException" which will call "DbgkpSendApiMessage".

ChildEBP RetAddr Args to Child
fba9856c 804948b4 fba9858c e1b40ec0 00000001 nt!DbgkpSendApiMessage+0x14
fba98604 804300d6 fba98a00 00000001 00000000 nt!DbgkForwardException+0x78
fba989c0 80431ca2 fba98a00 00000000 fba98d64 nt!KiDispatchException+0x172
fba98d34 804652c9 fba98a00 fba98a50 00000000 nt!KiRaiseException+0x16c
fba98d50 80461691 0012fc78 0012f990 00000001 nt!NtRaiseException+0x31
fba98d50 77e89b01 0012fc78 0012f990 00000001 nt!KiSystemService+0xc4
0012fcc8 77ea0ce8 40010006 00000000 00000002 KERNEL32!RaiseException+0x56
0012ff2c 00401095 00403024 0000020a 0000021a KERNEL32!OutputDebugStringA+0x274
WARNING: Stack unwind information not available. Following frames may be wrong.
0012ff4c 004013c3 00000001 00802468 008029d8 b+0x1095
0012ffc0 77e87903 0000020a 0000021a 7ffdf000 b+0x13c3
0012fff0 00000000 004012e0 00000000 000000c8 KERNEL32!BaseProcessStart+0x3d

At this point, the process will be suspended.

nt!DbgkpSendApiMessage+0x14:
80494870 e869000000 call nt!DbgkpSuspendProcess (804948de)

So, before sending the debug output string as an exception to the debugger, we suspend the process. This is done by a "call nt!KeFreezeAllThreads (8042ef3d)" call.

So, we could actually stop here, this is why OutputDebugString() synchronizes everything in the process when hooked up with a local debugger.

But, what happens next is the debugger API recieves the notification by an LPC.

8049483b e86e7d0200 call nt!LpcRequestWaitReplyPort (804bc5ae)

Then, a little clean up occurs and finally resume the process.

80494878 e898000000 call nt!DbgkpResumeProcess (80494915)

If the function returns an error code, they simply will try again.

In conclusion, you can see why "OutputDebugString" calls will be synchronous and it will be harder to find timing bugs or multiple thread issues if they are all synchronized on these debug calls. It doesn't mean they would not be found, it would depend on the particular issue. It's not as if these are critical sections for your code after all but they do stop threads from continuing until another thread allows them to write their string.




GeneralRe: OutputDebugString Synchronization Pin
Toby Opferman9-May-04 9:56
Toby Opferman9-May-04 9:56 
GeneralRe: OutputDebugString Synchronization Pin
Toby Opferman9-May-04 12:36
Toby Opferman9-May-04 12:36 
QuestionVC 6? Pin
WREY17-Apr-04 9:08
WREY17-Apr-04 9:08 
AnswerRe: VC 6? Pin
Thomas Freudenberg18-Apr-04 5:25
Thomas Freudenberg18-Apr-04 5:25 
GeneralRe: VC 6? Pin
WREY18-Apr-04 7:12
WREY18-Apr-04 7:12 
GeneralRe: VC 6? Pin
Tim Hagemann19-Apr-04 1:29
Tim Hagemann19-Apr-04 1:29 
GeneralRe: VC 6? Pin
Toby Opferman9-May-04 8:16
Toby Opferman9-May-04 8:16 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.