Enumerating Message Table Contents






4.78/5 (11 votes)
An article on enumerating message table resources.
Introduction
The code for this article allows enumerating the entries in message tables of arbitrary Win32 PE files.
Background
Only recently, I introduced a more or less generalized scheme for error codes in Win32 native applications, in the company that I am currently working for. It is based on the same principles that MS itself uses for its own error codes, most of which can be found in the winerror.h header file and have a string representation embedded as a message table resource in kernel32.dll and a few other DLLs. This way, using the FormatMessage
API, an error code can be resolved into a meaningful human-readable error description. While working on this topic, I was curious to find out which error strings reside in the DLLs of a typical Windows installation. I wanted to enumerate all DLLs in a certain directory (say, the c:\windows\system32 directory) and enumerate the message table entries for any given language ID in each DLL.
However, unlike with various other resource types, the Win32 API lacks such an API to enumerate message table resources. Hence, my first approach was to use brute-force: call FormatMessage
for all possible message IDs between 0 and 0xFFFFFFFF. The idea was that if FormatMessage
fails, then let it fail. However, if it succeeds, then there must be a message table string entry for the given ID which FormatMessage
returns to the caller. You can certainly imagine that this naive approach not only burned needless CPU cycles, it also took hours to execute for only a bunch of DLLs. After the first initial tests, I estimated that it would take some three or four years to scan my entire system32 directory, let alone that this approach requires a-priori knowledge about the language ID that I should scan the DLLs for. This is because one and the same message table can exist in any number of languages, with identical message table entry IDs within the same binary. Therefore, a different solution had to be found.
Low level APIs to the rescue
The Windows APIs that deal with resource loading come in two flavours: the low-level functions such as FindResource
, LoadResource
, LockResource
, and the high level functions, such as LoadMenu
, LoadString
, LoadImage
. Other than the high-level APIs, the low-level APIs are resource-type agnostic, i.e., they don't know anything about the type or binary layout of the resource for which they are used. Needless to say, the high-level APIs internally call the low-level APIs. As a consequence, using the low-level APIs, it is quite straightforward to enumerate and load any resource in any language, once you know the binary layout of the raw resource that is acquired this way. Fortunately, the message table binary layout is pretty well documented. A message table consists of one or more blocks of data that are organized as a MESSAGE_RESOURCE_DATA
structure (defined in winnt.h):
typedef struct _MESSAGE_RESOURCE_DATA { DWORD NumberOfBlocks; MESSAGE_RESOURCE_BLOCK Blocks[ 1 ]; } MESSAGE_RESOURCE_DATA, *PMESSAGE_RESOURCE_DATA;
An actual MESSAGE_RESOURCE_DATA
block doesn't contain only one member of the type MESSAGE_RESOURCE_BLOCK
, as the struct definition suggests. Instead, the member variable NumberOfBlocks
indicates how many MESSAGE_RESOURCE_BLOCK
entries a MESSAGE_RESOURCE_DATA
block, loaded via a sequence of calls to the FindResource
, LoadResource
, LockResource
APIs, contains. The data type MESSAGE_RESOURCE_BLOCK
is defined in winnt.h as well, and looks like this:
typedef struct _MESSAGE_RESOURCE_BLOCK { DWORD LowId; DWORD HighId; DWORD OffsetToEntries; } MESSAGE_RESOURCE_BLOCK, *PMESSAGE_RESOURCE_BLOCK;
Each MESSAGE_RESOURCE_BLOCK
represents a sequence of consecutive message table entries in a message table, starting at the ID indicated by the member LowId
and ending with the ID indicated by the HighId
member of the MESSAGE_RESOURCE_BLOCK
struct. Adding the value in the OffsetToEntries
member to the address of the MESSAGE_RESOURCE_BLOCK
struct itself then yields the start address of the message table entry with the first ID of the MESSAGE_RESOURCE_BLOCK
which is contained in the LowId
member. This address points to a MESSAGE_RESOURCE_ENTRY
data structure, also defined in the winnt.h, as such:
typedef struct _MESSAGE_RESOURCE_ENTRY { WORD Length; WORD Flags; BYTE Text[ 1 ]; } MESSAGE_RESOURCE_ENTRY, *PMESSAGE_RESOURCE_ENTRY;
Each MESSAGE_RESOURCE_ENTRY
block represents a single message table string item. As you might already have guessed, the actual string address of the message table item starts at the address of the Text
member of this structure, and is of variable length, which is determined by the Length
member of the structure. The Length
member contains the length of the string, in bytes, without the terminating zero character. But here comes an additional twist: the string itself can either be a codepage based ANSI string, or a UTF-16 Unicode string, and this is what the Flags
member of the structure is good for: if it is zero, the string is an ANSI string, if it is one, it is a Unicode string. Other values for the Flags
member are not defined. The next message table item then follows the current MESSAGE_RESOURCE_ENTRY
block at the address of the current MESSAGE_RESOURCE_ENTRY
's Text
member plus the number of bytes encoded in the Length
member.
Using the code
The function that I wrote for this article in order to enumerate message tables in a PE binary has the following prototype:
BOOL EnumMessageTableStrings(HMODULE hMod, LPCTSTR lpName, ENUM_MESSAGES enfn, LONG_PTR lParam);
The first parameter, hMod
, is a module/instance handle to the DLL or EXE file, whose message table is to be enumerated. The second parameter is the name or ID of the resource. For message tables whose strings can be loaded via FormatMessage
, this is always MAKEINTRESOURCE(1)
. While it is theoretically possible to have a message table resource with a different numerical ID or a string ID, it doesn't happen in practice, because this would require manual intervention and manipulation of the message compiler's output during the build process.
The third parameter is an enumeration callback function that the caller has to supply, and which will be invoked once for each message table entry per language while EnumMessageTableStrings
executes. The last parameter, lParam
, is a user defined parameter that will always be passed to the callback function from within EnumMessageTableStrings
. You can pass anything you want as this parameter, e.g., a pointer to an object whose member functions will be invoked in your callback function, or whatever strikes you fancy.
The function returns a nonzero value if it succeeds, and FALSE
if it either fails or if the enumeration callback returned FALSE
to discontinue enumeration. In order to distinguish both cases where the function returns FALSE
, extended information is provided with GetLastError
. If GetLastError
returns ERROR_SUCCESS
, the enumeration was aborted by the callback returning FALSE
. If an error occurred during enumeration, GetLastError
will return a nonzero error code defined in winerror.h.
The prototype for the ENUM_MESSAGES
callback function looks like this:
typedef BOOL (CALLBACK * ENUM_MESSAGES)(LPVOID lpMsg, DWORD dwMsgId, WORD wFlags, WORD wIDLanguage, LONG_PTR lParam);
The first parameter, lpMsg
, is the string of the enumerated message table entry. It is prototyped as LPVOID
, because it is either a Unicode string (UTF-16), or an ANSI string, so it should be cast to either an LPCWSTR
or to an LPCSTR
. The third parameter, wFlags
, determines if lpMsg
is to be interpreted as an ANSI string (wFlags
=0) or Unicode (wFlags
=1). The usage of numbers in code makes me really crazy, therefore, I defined the macros EMT_MSG_IS_ANSI
(0) and EMT_MSG_IS_UNICODE
(1) in the header files that contain the EnumMessageTableStrings
prototype and the definition of the ENUM_MESSAGES
callback. The second parameter, dwMsgId
, specifies the message ID of the enumerated message, and the fourth parameter, wIDLanguage
, is the Win32 language ID for which the message table entry was found. The last parameter, as already explained before, is the custom parameter that the caller passed to EnumMessageTableStrings
as the lParam
parameter. If an enumeration should be aborted, the callback function should return FALSE
. To continue enumeration, a nonzero value should be returned from the enumeration callback.
ANSI and Unicode message table entries
As explained above, MESSAGE_RESOURCE_ENTRY
blocks represent a single message table string entry, and come in two flavours: with the Flags WORD
set to 0, the Text
member has to be interpreted as an ANSI string, and set to 1, it represents a Unicode (UTF-16) string. It should be noted that, normally, for message table resources in a PE file, this doesn't change between individual MESSAGE_RESOURCE_ENTRY
blocks. Creating a message table is, typically, done with the MS message compiler (mc.exe), which, by default, creates a message table with ANSI strings, and by virtue of the "-U" command line parameter will create Unicode message table entries, resulting in a slightly larger resulting binary.
The demo application
The application that comes with the source code of this article, msgdump.exe, simply enumerates all DLLs in the current working directory, and prints their message table entries to stdout. In order to see if a particular message table entry is a Unicode based string or an ANSI based string, the text printed to stdout starts with the lowercase letters "id" for ANSI based strings, and with uppercase letters "ID" for Unicode based strings. An interesting experiment is to build the application, copy it into a directory in the %PATH% environment variable, run a console (cmd.exe), navigate inside the console to the Windows sytem32 directory, and finally, run msgdump in this directory. This will then dump all message tables from the system DLLs.
Other applications for this code
In the company that I currently work for, we always ship binaries that are localized into German and English. Traditionally, it has been a problem to always keep both resource variants in sync. It frequently happened that for a given resource, the German variant was there, but not the English version, and vice versa. Also, it sometimes occurred that resources which contain format strings suitable for sprintf
or FormatMessage
did not have the placeholders (such as %d
, %s
, %1
, %2
) in the correct order, or to the same amount in both language variants, which sometimes lead to "very interesting behaviour" (read: "crashes") of the software, depending on the user's language preferences. I, therefore, wrote a tool named compres which I will enhance in the near future to include support for scanning message tables as well, using the functionality outlined in this article. In a nutshell, the compres tool is designed to run as part of an automated build process over all the EXE and DLL files that have been built as part of the build process. If it finds resources in one language but not in the other language, or if it finds different format strings in the two languages for which it compares resources, it prints an error message with a description of the error to stdout. Other possible applications for the code are localization tools, or resource editors, or simply academic curiosity.
Points of interest
Using the demo application for this article, I looked at various operating system versions in order to see if there are any peculiarities in their usage of message tables. As expected, a Windows 95 installation has all its message table entries encoded as ANSI strings, in order to save both hard disk and memory space. A typical Windows XP installation, as of today, has the majority of its message table entries encoded as Unicode strings. Another interesting point is the fact that using the demo application, which is built as a native x86 application, it is also possible to enumerate message tables of native x64 DLLs on Windows XP/2003 x64 editions. This demonstrates that the Win32 PE ("portable executable") format really deserves the "P" in its name.
History
- 06/10/2006 - Initial version of article and code.