Writing a .NET Debugger (Part 3) – Symbol and Source Files

Sebastian Solnica

5.00/5 (4 votes)

Nov 10, 2010

CPOL

4 min read

16279

How to load module debugging symbols (PDB files) into the debugger and how to bind them with source files

In this part, I will show you how to load module debugging symbols (PDB files) into the debugger and how to bind them with source files. This can’t be achieved without diving into process, thread and module internals so we will examine these structures also.

Our small debugger mindbg after the last part (part 2) is attached to the appdomains and receives events from the debuggee. Before we start dealing with symbols and sources, I will quickly explain what changes were made to the already implemented logic.

I created a new class that will be a parent for all debuggee events:

/// <summary>
/// A base class for all debugging events.
/// </summary>
public class CorEventArgs
{
    private readonly CorController controller;

    /// <summary>
    /// Initializes the event instance.
    /// </summary>
    /// <param name="controller">Controller of the debugging process.</param>
    public CorEventArgs(CorController controller)
    {
        this.controller = controller;
    }

    /// <summary>
    /// Gets the controller.
    /// </summary>
    /// <value>The controller.</value>
    public CorController Controller { get { return this.controller;  } }

    /// <summary>
    /// Gets or sets a value indicating whether debugging process should continue.
    /// </summary>
    /// <value><c>true</c> if continue; otherwise, <c>false</c>.</value>
    public bool Continue { get; set; }
}

All events are now dispatched to the process that they belong to. As an example, take a look at the Breakpoint event handler in CorDebugger:

void ICorDebugManagedCallback.Breakpoint
(ICorDebugAppDomain pAppDomain, ICorDebugThread pThread, ICorDebugBreakpoint pBreakpoint)
{
    var ev = new CorBreakpointEventArgs(new CorAppDomain(pAppDomain, p_options),
                                        new CorThread(pThread),
                                        new CorFunctionBreakpoint(
                                               (ICorDebugFunctionBreakpoint)pBreakpoint));

    GetOwner(ev.Controller).DispatchEvent(ev);

    FinishEvent(ev);
}

DispatchEvent method is implemented in the CorProcess. For each type of event that we are interested in, we have an overloaded version of this method. Example:

/// <summary>
/// Handler for CorBreakpoint event.
/// </summary>
public delegate void CorBreakpointEventHandler(CorBreakpointEventArgs ev);

/// <summary>
/// Occurs when breakpoint is hit.
/// </summary>
public event CorBreakpointEventHandler OnBreakpoint;

internal void DispatchEvent(CorBreakpointEventArgs ev)
{
    // stops executing by default (further handlers may change this)
    ev.Continue = false;

    // calls external handlers
    OnBreakpoint(ev);
}

We want also to stop the debugger on the Main method of the executable module so we will create a function breakpoint in ModuleLoad event handler (more about breakpoints will be in the next part of the series):

internal void DispatchEvent(CorModuleLoadEventArgs ev)
{
    if (!p_options.IsAttaching)
    {
        var symreader = ev.Module.GetSymbolReader();
        if (symreader != null)
        {
            // we will set breakpoint on the user entry code
            // when debugger creates the debuggee process
            Int32 token = symreader.UserEntryPoint.GetToken();
            if (token != 0)
            {
                // FIXME should be better written (control over this breakpoint)
                CorFunction func = ev.Module.GetFunctionFromToken(token);
                CorBreakpoint breakpoint = func.CreateBreakpoint();
                breakpoint.Activate(true);
            }
        }
    }
    ev.Continue = true;
}

That’s all about events – I also made some minor changes in other parts of the application but I don’t think they are important enough to be mentioned in this post . So let’s focus on the main topic.

I want to display the source code for the location where the breakpoint was hit. So first, let’s subscribe to the breakpoint event on the newly created process:

var debugger = DebuggingFacility.CreateDebuggerForExecutable(args[0]);
var process = debugger.CreateProcess(args[0]);

process.OnBreakpoint += new MinDbg.CorDebug.CorProcess.CorBreakpointEventHandler(process_OnBreakpoint);

The handler code is as follows:

static void process_OnBreakpoint(MinDbg.CorDebug.CorBreakpointEventArgs ev)
{
    Console.WriteLine("Breakpoint hit.");

    var source = ev.Thread.GetCurrentSourcePosition();

    DisplayCurrentSourceCode(source);
}

There are two methods that are mysterious here: CorThread.GetCurrentSourcePosition and DisplayCurrentSourceCode. Let’s start from GetCurrentSourcePosition method. When a thread executes application code, it uses a stack to store function’s local variables, arguments and return address. So each stack frame is associated with a function that is currently using it. The most recent frame is the active frame and we may retrieve it using ICorDebugThread.GetActiveFrame method:

public CorFrame GetActiveFrame()
{
    ICorDebugFrame coframe;
    p_cothread.GetActiveFrame(out coframe);
    return new CorFrame(coframe, s_options);
}

and use it to get the current source position:

public CorSourcePosition GetCurrentSourcePosition()
{
    return GetActiveFrame().GetSourcePosition();
}

Inside the active CorFrame, we have an access to the function associated with it:

/// <summary>
/// Gets the currently executing function.
/// </summary>
/// <returns></returns>public CorFunction GetFunction()
{
    ICorDebugFunction cofunc;
    p_coframe.GetFunction(out cofunc);
    return cofunc == null ? null : new CorFunction(cofunc, s_options);
}

/// <summary>
/// Gets the source position.
/// </summary>
/// <returns>The source position.</returns>
public CorSourcePosition GetSourcePosition()
{
    UInt32 ip;
    CorDebugMappingResult mappingResult;

    frame.GetIP(out ip, out mappingResult);

    if (mappingResult == CorDebugMappingResult.MAPPING_NO_INFO ||
        mappingResult == CorDebugMappingResult.MAPPING_UNMAPPED_ADDRESS)
        return null;

    return GetFunction().GetSourcePositionFromIP((Int32)ip);
}

The ip variable represents the instruction pointer which (after MSDN) is the stack frame’s offset into the function’s Microsoft intermediate language (MSIL) code. That basically means that the ip variable points to the currently executed code. The question now is how to bind this instruction pointer with the real source code line stored in a physical file. Here, symbol files come into play. Symbol files (PDB files) may be considered as translators of the binary code into the human readable source code. Unfortunately, the whole logic behind symbol files is quite complex and explaining it thoroughly would take a lot of space (which might be actually a good subject for few further posts ). For now, let’s assume that symbol files will provide us with the source file path and line coordinates corresponding to our instruction pointer value. I tried to implement the symbol readers and binders on my own but this subject overwhelmed me and I finally imported all symbol classes and interfaces from MDBG source code. So I will just show you how to use these classes and if someone is not satisfied with it he/she may look and analyze content of the mindbg\Symbols folder.

Each module (CorModule instance) has its own instance of the SymReader class (created with help of the SymbolBinder):

public ISymbolReader GetSymbolReader()
{
    if (!p_isSymbolReaderInitialized)
    {
        p_isSymbolReaderInitialized = true;
        p_symbolReader = (GetSymbolBinder() as ISymbolBinder2).GetReaderForFile(
                                GetMetadataInterface<IMetadataImport>(),
                                GetName(),
                                s_options.SymbolPath);
    }
    return p_symbolReader;
}

Moving back to the CorFrame.GetSourcePosition method code snippet, you might have noticed that in the end, it called GetSourcePositionFromIP method CorFunction instance associated with this frame. Let’s now load source information from symbol files for this function:

// Initializes all private symbol variables
private void SetupSymbolInformation()
{
    if (p_symbolsInitialized)
        return;

    p_symbolsInitialized = true;
    CorModule module = GetModule();
    ISymbolReader symreader = module.GetSymbolReader();
    p_hasSymbols = symreader != null;
    if (p_hasSymbols)
    {
        ISymbolMethod sm = null;
        sm = symreader.GetMethod(new SymbolToken((Int32)GetToken())); // FIXME add version
        if (sm == null)
        {
            p_hasSymbols = false;
            return;
        }
        p_symMethod = sm;
        p_SPcount = p_symMethod.SequencePointCount;
        p_SPoffsets = new Int32[p_SPcount];
        p_SPdocuments = new ISymbolDocument[p_SPcount];
        p_SPstartLines = new Int32[p_SPcount];
        p_SPendLines = new Int32[p_SPcount];
        p_SPstartColumns = new Int32[p_SPcount];
        p_SPendColumns = new Int32[p_SPcount];

        p_symMethod.GetSequencePoints(p_SPoffsets, p_SPdocuments, p_SPstartLines,
                                        p_SPstartColumns, p_SPendLines, p_SPendColumns);
    }
}

You may see that our function is represented in Symbol API as SymMethod which contains a collection of sequence points. Each sequence point is defined by the IL offset, source file path, start line number, end line number, start column index and end column index. IL offset is actually the value that interests us most because it is directly connected to the ip variable (which holds instruction pointer value). So finally, we are ready to implement CorFunction.GetSourcePositionFromIP method:

public CorSourcePosition GetSourcePositionFromIP(Int32 ip)
{
    SetupSymbolInformation();
    if (!p_hasSymbols)
        return null;

    if (p_SPcount > 0 && p_SPoffsets[0] <= ip)
    {
        Int32 i;
        // find a sequence point that the given instruction
        // pointer belongs to
        for (i = 0; i < p_SPcount; i++)
        {
            if (p_SPoffsets[i] >= ip)
                break;
        }

        // ip does not belong to any sequence point
        if (i == p_SPcount || p_SPoffsets[i] != ip)
            i--;

        CorSourcePosition sp = null;
        if (p_SPstartLines[i] == SpecialSequencePoint)
        {
            // special type of sequence point
            // it indicates that the source code
            // for this part is hidden from the debugger

            // search backward for the last known line
            // which is not a special sequence point
            Int32 noSpecialSequencePointInd = i;
            while (--noSpecialSequencePointInd >= 0)
                if (p_SPstartLines[noSpecialSequencePointInd] != SpecialSequencePoint)
                    break;

            if (noSpecialSequencePointInd < 0)
            {
                // if not found in backward search
                // search forward for the first known line
                // which is not a special sequence point
                noSpecialSequencePointInd = i;
                while (++noSpecialSequencePointInd < p_SPcount)
                    if (p_SPstartLines[noSpecialSequencePointInd] != SpecialSequencePoint)
                        break;
            }

            Debug.Assert(noSpecialSequencePointInd >= 0);
            if (noSpecialSequencePointInd < p_SPcount)
            {
                sp = new CorSourcePosition(true,
                                           p_SPdocuments[noSpecialSequencePointInd].URL,
                                           p_SPstartLines[noSpecialSequencePointInd],
                                           p_SPendLines[noSpecialSequencePointInd],
                                           p_SPstartColumns[noSpecialSequencePointInd],
                                           p_SPendColumns[noSpecialSequencePointInd]);
            }
        }
        else
        {
            sp = new CorSourcePosition(false, p_SPdocuments[i].URL, p_SPstartLines[i], p_SPendLines[i],
                                        p_SPstartColumns[i], p_SPendColumns[i]);
        }
        return sp;
    }
    return null;
}

And the second mysterious function – DisplayCurrentSourceCode – from the beginning of the post is as follows:

static void DisplayCurrentSourceCode(CorSourcePosition source)
{
    SourceFileReader sourceReader = new SourceFileReader(source.Path);

    // Print three lines of code
    Debug.Assert(source.StartLine < sourceReader.LineCount 
    && source.EndLine < sourceReader.LineCount);
    if (source.StartLine >= sourceReader.LineCount ||
        source.EndLine >= sourceReader.LineCount)
        return;

    for (Int32 i = source.StartLine; i <= source.EndLine; i++)
    {
        String line = sourceReader[i];
        bool highlightning = false;

        // for each line highlight the code
        for (Int32 col = 0; col < line.Length; col++)
        {
            if (source.EndColumn == 0 || col >= source.StartColumn - 1 
            && col <= source.EndColumn)
            {
                // highlight
                if (!highlightning)
                {
                    Console.ForegroundColor = ConsoleColor.Yellow;
                    highlightning = true;
                }
                Console.Write(line[col]);
            }
            else
            {
                // normal display
                if (highlightning)
                {
                    Console.ForegroundColor = ConsoleColor.Gray;
                    highlightning = false;
                }
                Console.Write(line[col]);
            }
        }
    }
}

SourceFileReader class is just a simple text file reader which reads the whole file at once and stores all lines in a collection of strings. What’s the final result? Have a look:

There is a lot more to say about symbols and source files. I hope that in further posts, I will show you how to download symbols from symbol store and source files from repositories. As usual, the source code for this post may be found at mindbg.codeplex.com (revision 55200).

Filed under: CodeProject, Debugging