My previous article was meant to be an introduction to the latest Microsoft Debug Interface Access (DIA) infrastructure. The focus of the article was the Program Database (PDB) files, showing a set of functions of the DIA family and also presenting a project that wraps DIA in a set of programmer friendly virtual C++ interfaces.
In this article, I will continue the investigation of the Microsoft DIA potential, focusing on the portable executable files. As in the previous article, I will also present a project, with sources, that wraps the DIA interfaces into a set of virtual C++ interfaces. The snapshot below shows the console application presented here and the kind of information that can be retrieved with my sample:
- The location of the public Symbols Store
- The location of the local Symbols Store
- The GUID of the PDB symbols file that is referenced by the assembly
- The full path of the PDB symbols file that is referenced by the assembly
- The real path of the PDB symbols file that has been found by the system
- The list of path attempts made by the system when searching for the PDB symbols file
The Debug Interface Access (DIA) is a new application programming interface that client applications should use when coping with symbols information hosted in Program Database (PDB) files. The image below shows an overview of the different existing APIs.
For historical reasons, most developers coping with symbols files typically use the well known DbgHelp.dll interface. The new DIA interface can not only be used to query PDB files but also to investigate portable executable (EXE, DLL, SCR, CPL, SYS…) files and to collect debug related information out of them.
Just as a reminder, the image below shows the global view of the different parts involved when producing code and debugging it. The compiler creates an assembly together with its associated debug symbols file. The debugger opens the assembly and tries to find the file containing the debugging information.
Search of the Symbols File
Before being able to setup a breakpoint or to watch any variable, the debugger begins its job with the quest to locate the debug symbols file that is associated with the assembly to debug. As previously mentioned, native and managed assemblies embed the GUID and the full path name of the PDB file that contains the debug data of the code.
By default, the linker writes the full qualified name to the PDB file into the image file. If you want it, you can strip the path to the PDB file and keep only the name (and the extension) of the PDB file by using the following technique. Have you ever take a look to the Windows image files (e.g. notepad.exe, kernel32.dll,..)? As far as I see, Microsoft always strips the path to the PDB when building their images. This saves a few bytes when building the image files and hides the name of the directory on the build machine where an image has been built (e.g. "d:\temp\version2\free_demo_build_whithout_some_features\.... xy.pdb" ).
When a debugger attaches to an assembly, it reads its image file and checks whether it has been compiled with debug information. When positive, it reads the embedded GUID of the expected associated PDB file. Based on this information, it searches at different locations and tries to find the PDB whose GUID correlates with the one found in the image file to debug. On its quest, the debug engine searches at the following locations:
- Location pointed by the full path (when available) of the PDB file as embedded in the image file
- The directory from which is image is loaded
- Local symbol cache when available
- Remote symbols server when available. When the file has been found on the remote symbols server, the symbols engine copies the PDB file to the local symbols cache. (By the next time the process will be debugged, the launch of the debug session will be must faster since the PDB file will be found on the local symbols cache.)
The software architecture of the project presented here is built with the same concept as in my previous article. Interfaces cannot be directly created; they can only be obtained. The idea behind this concept is to free the consumer of any memory management and responsibility.
All classes are virtual and therefore cannot be instantiated directly.
Using the Code
The project presented here consists of two parts:
SymbolsParser: C++ project - implements the SymbolsParser.dll which is a wrapper around a few DIA interfaces
ConsoleSymbolsParser: C++ Win32 console project - consumes the
SymbolsParser and shows a little information about the PDB file referenced by an assembly.
Opening a Portable Executable (PE) file is made in two steps:
One can then invoke the
- Retrieve the GUID of the referenced PDB file using the
wstring sGuid = pIPeFile->GetGuid();
wcout << L"GUID:" << sGuid.c_str() << endl;
- Retrieve the full path of the referenced PDB file using the
wcout << L"Built-in PDB Path:" <<
pIPeFile->GetBuiltinSymbolsPath().c_str() << endl;
- Retrieve the full path of the location where the PDB file has been found using the
wcout << L"Found PDB Path:" << pIPeFile->GetFoundSymbolsPath().c_str() << endl;
In its quest to locate the PDB file that is referenced by an executable file, the symbols engine searches at different locations. Using the
ISymbolsParser::GetSymbolsSearch() interface, one can obtain an object that can be used to enumerate the paths visited during this search.
ISymbolsSearch* pISymbolsSearchPath = pISymbolsParser->GetSymbolsSearch();
typedef std::map<wstring, bool> Paths;
Paths paths = pISymbolsSearchPath->GetPaths();
Paths::iterator it = paths.begin();
for ( ;it!=paths.end();it++)
std::wstring s = it->first;
wcout << s.c_str() << endl;
The location of the local and remote Symbols servers can also be retrieved by obtaining a pointer to the
ISymbolsStore* pIEnvironment = pISymbolsParser->GetSymbolsStore();
wcout << L"Public Symbols Store:" <<
pIEnvironment->GetPublicSymbolsStore().c_str() << endl;
wcout << L"Local Symbols Store:" <<
pIEnvironment->GetLocalSymbolsStore().c_str() << endl;
When appropriate, the resources allocated by the
SymbolsParser must be freed using the
The project has been developed with Visual Studio 2008 and tested on Windows Vista Ultimate 32bit only.
IPeFile has been obtained, one could retrieve a
IPdbFile and continue to collect more details about the PDB file which is referenced by the executable. This project does not implement this bridge.
I left the implementation of this bridge as an exercise for the reader. One can have a look at my previous article, which presents the
- 6th July, 2009: Initial post
Marc Ochsenmeier is the author of pestudio (www.winitor.com) and worked as developer with the focus on Windows Security. He now works as a Malware Analyst
pestudio is on twitter at: https://twitter.com/ochsenmeier