How to use DbgHelp to access type information

Updated: 13.09.2004

Introduction

In this article I will describe how to access type information with the help of DbgHelp library. This library provides a simple and easy to use interface to access debug information, and is the foundation of various debugging utilities. But since type information access is a relatively new feature in DbgHelp, there is currently no (known to me) good documentation or articles about it. I assume that the reader is already familiar with the basics of using DbgHelp (it is possible get more information about using the library in its documentation or in other sources on the Internet; you can also take a look at these examples which show how to use DbgHelp in various scenarios).

What is type information? When we debug an application, the debugger does not only allow us to set breakpoints and step through the source code, but it also allows us to see the values of various variables. To be able to do it, the debugger needs to know the exact types of the variables. This is where type information comes to help the debugger. For every variable, debug information can include the description of its type. For example, for a simple integer variable the description will contain the type of the variable (int, unsigned int, etc.) and size. For a user-defined type, the description will contain the list of member variables and functions, base types, and so on. After reading this information, the debugger will know the exact data layout of the variable in memory, and will be able to read the correct values and display them to the user.

DbgHelp and DIA

Before we proceed to DbgHelp type information access details, I would like to mention another library that serves a similar purpose. DIA SDK, which is supplied with Visual Studio.NET, also allows to read and analyse debug information, including type information. The set of functionality provided by DIA is similar to DbgHelp. But while DbgHelp interface is extremely simple, DIA uses a large set of COM interfaces, which makes DIA hard to learn. On the other hand, DIA interface is richer and gives access to more information than DbgHelp (I mention some differences later in this article). And probably the main benefit of DIA is its documentation. Contrary to DbgHelp, most of DIA interfaces are well documented. But since both libraries work with the same kind of information, DIA documentation is equally useful for DbgHelp users. Therefore I recommend DIA documentation to everybody who wants to read type information using DbgHelp.

Symbols and tags

DbgHelp exposes type information - related functionality via an object-based interface, as most Win32 APIs do. As usual, the interface consists of a set of functions, which operate on objects of several predefined types. The type of an object defines the set of properties supported by the object, and the set of possible relationships between the object and other objects.

Following the popular naming convention, DbgHelp calls its objects “symbols”. An identifier, called “tag”, represents the type of an object (symbol). At the time of this writing, 30 different tags are available. (For complete list of tags, see SymTagEnum enumeration in DbgHelp.h file). Every symbol supports Tag property, which allows to determine the type of the symbol.

All symbol types (tags) can be divided into two groups. The first group consists of symbols that have a predefined location (e.g. memory address, or a register). Examples of such symbols are Data (SymTagData), Function (SymTagFunction) and PublicSymbol (SymTagPublicSymbol) symbols (Figure 1 lists some symbol types from this group, and more information about these and other symbols can be found in DIA documentation). Most commonly used DbgHelp functions, such as SymEnumSymbols, SymFromName, SymFromAddress, work only with this group of symbols.

The second group consists of symbols that represent type information. With their help we can determine, for example, the type of a variable, the number of elements in an array, or offsets of a class’ member variables. Figure 2 lists some symbol types from this group.

In the beginning, DbgHelp could work only with the first group of symbols. It was possible to determine the address and size of a symbol, obtain its name, but type information was not available. DbgHelp 5.1 introduced support for type information. Now, if we have a Data or Function symbol, we can use its Type property to lookup the symbol that represents the type of the variable or function. Then we can use properties of that symbol to obtain more information about the type.

Accessing symbols

In order to access an object in a typical object-based interface, a handle to the object is usually needed. In DbgHelp, every symbol has a unique identifier – “index” – which serves as the object’s handle to various DbgHelp functions. For simplicity, we can assume that DbgHelp maintains an internal array of references to all available symbols, and indices allow to lookup references in that array.

How can we get the index of a symbol? A number of DbgHelp functions return information about symbols in SYMBOL_INFO structure, and Index member of the structure contains the index of the symbol. It is also possible to go in the opposite direction – using SymFromIndex function. SYMBOL_INFO structure also contains another index, TypeIndex, which is the index of the symbol that represents the type of the symbol described by the structure.

If we know the index of a symbol, we can access properties of the symbol with the help of SymGetTypeInfo function.

BOOL IMAGEAPI SymGetTypeInfo(
    IN  HANDLE                      hProcess,
    IN  DWORD64                     ModBase,
    IN  ULONG                       Index,
    IN  IMAGEHLP_SYMBOL_TYPE_INFO   PropId,
    OUT PVOID                       pData
  );

The first two parameters (process handle and module base address) are common to most DbgHelp functions (see DbgHelp documentation for more information about them). Index parameter specifies the index of the symbol whose properties we want to access. PropId parameter identifies the requested property (for the list of possible values, see DbgHelp documentation and IMAGEHLP_SYMBOL_TYPE_INFO enumeration). pData parameter is a pointer to the user allocated buffer, where SymGetTypeInfo will put the value of the property. The type of every available property (and therefore the size of the buffer) is specified in SymGetTypeInfo documentation.

The function returns TRUE in case of success, and FALSE in case of failure. If the function failed, it is possible to use GetLastError to get more information about the error. The most typical error code is 1, which means that the requested property is not supported by the symbol.

For some properties, the function allocates a block of memory and returns a pointer to it to the user (using pData parameter). Then it is the user’s responsibility to free the memory using LocalFree function.

Complete example of using SymGetTypeInfo to obtain the name of a symbol is shown in Figure 3.

The concept of symbol properties is defined in DIA documentation, where every symbol supports a set of properties. DbgHelp documentation does not use symbol property names and operates only with the members of IMAGEHLP_SYMBOL_TYPE_INFO enumeration. But since there is a clear correspondence between DIA properties and members of IMAGEHLP_SYMBOL_TYPE_INFO enumeration, I will use property names defined by DIA in the remainder of this article. Figure 4 contains the list of property names and the corresponding enumerators used by DbgHelp.

Parents and children

Symbols can participate in parent-child relationships. Such relationships allow symbols to expose additional information, beyond the reach of their properties. For example, a symbol representing a function type can have child symbols that represent parameters of the function. Another example is a symbol representing a user-defined type (a class, for example). One set of its child symbols is used to represent its base classes, while other sets of child symbols represent member variables and functions.

We have to call SymGetTypeInfo function twice to obtain the list of child symbols for a parent symbol. First, we call the function to obtain the number of children and allocate the buffer big enough to contain their indices, and then we call the function again to copy indices of the child symbols to the buffer. The whole process is shown in Figure 5.

Basic types

Now we know enough theory to start working with real symbols. Let’s begin with the simplest. BaseType symbol type (identified by SymTagBaseType tag) is used to represent basic types like integer or floating point numbers, strings, and so on. This symbol type supports only two properties – BaseType and Length. BaseType property contains a value from BasicType enumeration (see Figure 6), which specifies the type represented by the symbol. Looking at the contents of BasicType enumeration, it is clear that it does not specify the size of the type (for example, single and double precision floating point types are represented by the same value, btFloat). This is where the second property, Length, comes into play. This property specifies the size of the type in bytes, which allows to distinguish between similar types of different size.

Figure7

Type definitions

Another simple symbol type is Typedef (SymTagTypeDef), which represents type definitions. It also supports two properties. The first property, Name, specifies the name of the type definition. The second property, Type, contains the index of the symbol that represents the underlying type of the type definition. For example, Type property of the type definition shown in Figure 8 contains the index of BaseType symbol.

Figure8

Pointers

Pointer types are represented by PointerType symbol (SymTagPointerType). Its Type property contains the index of the symbol that represents the type the pointer points to. Length property contains the size of the pointer in bytes.

Figure9

Pointer-to-pointer types are represented with the help of two PointerType symbols.

Figure10

Unfortunately DbgHelp does not allow to determine if a pointer is actually a reference. PointerType symbols support boolean Reference property, but it is not exposed via SymGetTypeInfo function. If you application needs this information, DIA is a better option.

Arrays

ArrayType symbol (SymTagArrayType) represents arrays. The most important properties are Type and Length. Type property contains the index of the symbol that represents the type of array elements. Length property contains the size of the array in bytes. To obtain the number of elements in the array, the value of its Length property can be divided by the size of the symbol referenced by Type property. There is also a simpler way – Count property contains the number of elements in the array.

Figure11

In case of multidimensional arrays, Type property can point to another ArrayType symbol.

Figure12

Functions

FunctionType symbol (SymTagFunctionType) represents the type of a function. Its properties allow to determine the number of function arguments (Count property), the type of the function’s return value (Type property), and the calling convention (CallingConvention property).

More information about function arguments can be obtained with the help of child symbols. For each argument, there is a child FunctionArgType symbol, whose Type property contains the index of the argument type symbol.

For member functions, ClassParent property contains the index of the symbol that represents the class (UDT symbol type, described below). Also, Count property takes “this” parameter into account.

Figure13

User defined types (UDT)

User defined types are represented by UDT symbol (SymTagUDT). The most important properties of this symbol type are UdtKind, Name and Length. UdtKind property specifies whether the user defined type is a class, structure or union (see UdtKind enumeration for more details). Name property contains the name of the type. Length property specifies the size of the type in bytes.

Significant amount of information about user defined types is available via child symbols. Member variables and members of a union are represented as child Data symbols, and member functions are represented as child Function symbols.

Unfortunately DbgHelp does not allow to determine whether a member function is virtual or not. It is also impossible to determine the access specifier of a member function or variable. If your application needs this information, use DIA.

There is also no explicit way to tell whether a member function is static or not. But there is a workaround – Count property of the function type symbol takes into account “this” pointer. If the value of Count property is the same as the number of child FunctionArgType symbols, there is no “this” pointer and the function is static. If the value is not the same, there is “this” pointer and the function is not static.

Base classes are also represented as child symbols. For every base class of a UDT, there is a child BaseClass symbol (SymTagBaseClass). Type property of this symbol contains the index of UDT symbol that represents the base class. Offset property allows to determine the offset of the base class data inside the derived class.

VirtualBaseClass property of BaseClass symbol allows to check whether the base class is virtual or not (if it is virtual, the property value is non-zero, and it is zero if the class is not virtual). If the base class is virtual, its Offset property does not work, and VirtualBasePointerOffset property should be used to determine the displacement of the virtual base pointer inside the class data. Then the virtual base pointer can be used to obtain the offset of the base class with the help of the virtual base table.

Figure14

Locations of variables

When working with variables, it is always interesting to know whether the variable is global, static, local, static local, etc. DbgHelp allows us to do it with the help of DataKind property. This property contains a value from DataKind enumeration. Table in Figure 15 describes the available data kinds.

DataKind Description
DataIsLocal Local variable.
DataIsStaticLocal Static local variable.
DataIsParam Parameter of a function.
DataIsObjectPtr "this" pointer of a member function.
DataIsFileStatic Static variable.
DataIsGlobal Global variable, or static member of a user-defined type.
DataIsMember Non-static member of a user-defined type.
DataIsStaticMember Static member of a user-defined type that is imported from another DLL. Such symbos do not have a location in the current module (because they are actually located in another DLL).
DataIsConstant Constant.

Putting it all together

Now we know that there are so many symbols and tags. But how to analyse the type of a function or a variable? The answer can be distilled down to the following:

1. Obtain the index of the type symbol (it is usually stored in TypeIndex member of SYMBOL_INFO structure; SymGetTypeInfo function can also return it, when TI_GET_TYPEID is specified).

2. Obtain the tag of the type symbol (using SymGetTypeInfo function with TI_GET_SYMTAG).

3. Access properties of the type symbol to get more information about the type.

4. If necessary, analyse related symbols too. For example, if we have a pointer type, use its Type property to get information about the type the pointer points to.

To see type information access in action, take a look at TypeInfoDump sample application. The application loads symbols for the executable specified on the command line, enumerates all functions and variables, and displays detailed information about their types. Type information access is independent from the reporting part and implemented as a C++ class (CTypeInfoDump), so it is possible to reuse it.

Probably the most interesting part of the sample code is CTypeInfoDump::DumpSymbolType function. Starting with the index of a function or variable symbol, it obtains the index of its type symbol and extracts (with the help of CTypeInfoDump::DumpType function) as much information about the type as possible.

There is another class, CTypeInfoText, which contains overridable functions that convert various type information data to human readable format. CTypeInfoText::GetTypeName function is interesting, because it can obtain the complete type definition for a function or variable.

Contact

Have questions or comments? Feel free to contact Oleg Starodumov at firstname@debuginfo.com.