MFC D3D Application: Direct3D Tutorial: Part I

hector [.j.] rivas

4.99/5 (29 votes)

Oct 4, 2006

CPOL

31 min read

155820

3845

Yet another Direct3D framework, this time for MFC apps, with a step by step tutorial

Download source files - 58.6 KB

Screenshot - mfcd3d.gif

Introduction

Taming the Microsoft® Direct3D® animal is a huge undertaking for those new to it, or to 3D graphics programming, as I was some time ago. Once you browse the SDK samples, read some of the tutorials, try out some of the code samples, and ultimately decide that you want to program your application or game engine using it, you are left with the question of where to start. That is where I was about a year ago, so I went deep into a code walkthrough of a sample framework application. I read tons and tons of material along the way in a soul-searching mode to try and understand what is going on (literally) behind the scenes.

The thing is that the SDK help is really telegraphic and the forums are for seasoned graphics programmers. Furthermore, it is version 9 we're talking about (actually 10 nowadays), meaning that the technology has been out there for probably more than a decade. It almost feels like the whole thing was designed by the Direct3D team to be used by themselves and themselves only. But do not despair: there are lots of tutorials out there and here is yet another one. Before we begin, though, I will bore you to death with a full-fledged testimony on managed vs. unmanaged Direct3D, for the sake of those still wondering what managed code is. Feel free to skip to the next section to get things going!

Managed vs. unmanaged code

Apart from the various versions, nowadays Direct3D comes in two different flavors, namely, managed and unmanaged Direct3D. Unmanaged Direct3D is for those of us C++ programmers having nothing to do with the .NET Framework, accessing the core Direct3D API directly, and using the Direct3D extensions (D3DX) utility library. Managed Direct3D is an abstraction layer to unmanaged Direct3D, which for the most part provides a one-to-one mapping of the unmanaged interfaces, structures and enumerations to the managed classes, structures and enumerations.

So here is your second chance to skip this section, for that is all you really need to know. If you are still stuck here, just so you know, the entire .NET paradigm works around managed code. This leads to the question of, "What is managed code?" The answer comes courtesy of Brad Abrams; I just did a little editing.

Managed code is code that has its execution managed by the .NET Framework Common Language Runtime or CLR. It refers to a contract of cooperation between natively executing code and the runtime. This contract specifies that at any point of execution, the runtime may stop and retrieve information specific to the current CPU instruction address, namely the runtime state, such as register or stack memory contents.

The necessary information is encoded in an Intermediate Language (IL) and a set of symbols, a.k.a. metadata, that describe all of the entry points and the constructs (e.g., methods and properties) and their characteristics. The CLR is the most commercially successful version of the Common Language Infrastructure (CLI) standard, which describes how the information is to be encoded, so that compilers can emit the correct encoding. This setup allows every popular programming language -- from COBOL to Camel, in addition to C#, J#, VB .Net, Jscript .Net, and C++ from Microsoft -- to produce managed code as Portable Executable (PE) files containing IL and metadata.

Before the code is run, the IL is compiled into native executable code by a runtime-aware compiler that knows how to target the managed execution environment. This allows the latter to insert traps and appropriate garbage collection hooks, handle exceptions, ensure type safety, check array bounds and ultimately make guarantees about what the code is going to do. Effectively, this eliminates an entire set of programming mistakes that often lead to security holes.

In contrast, unmanaged executable files are basically a binary image loaded into memory. The OS knows about a program counter and that is all. Surely there are protections in place around memory management and port I/O and so on, but the system does not actually know what the application is doing. Therefore it cannot make any guarantees about what happens when the application runs.

In any case, this article deals with unmanaged Direct3D, but most of the concepts discussed apply to managed Direct3D also; they are just handled in a different manner. Besides that, any tidy C++ programmer should know better than to leave memory leaks for hackers to wreak havoc, so let us just get...

Back to the start

This article includes a custom version of the Direct3D 9 SDK framework, which comprises a set of wrapper classes that make Direct3D ready for embedding in a MFC (Microsoft Foundation Classes) application, and somewhat of a beginners' Direct3D tutorial. You do need to know C++ and Windows programming, of course. You are entitled to ask why I would reinvent the wheel. Well, in the first place, so that I could understand it. Secondly, because I like to add comments as I myself begin to understand it, and ultimately, because I like my code to be correctly indented, for heaven's sake. Yes, I am a perfectionist, so bear with me. You have gotten this far already and I proclaim this as a disclaimer: I "pimped" the SDK framework! Wink | ;-) .

Another disclaimer, and credits: some passages are directly copied from the SDK help. Sometimes you do need to be telegraphic, like in the next paragraph:

"To use Direct3D, you first create an application window and then you create and initialize Direct3D objects. You use the Component Object Model (COM) interfaces that these objects implement to manipulate them and to create other objects required to render a scene. Applications written in C++ access these interfaces and objects directly."

Ok, so we need a window to render to and some COM savvy, but not a serious amount of it. The window can be the top-most window for a fullscreen application, which is the usual case for a game. It can also be a control in a dialog for which a window handle can be obtained. This is the case for, say, a level editor in which you need a rendering of the level's geometry and tons of controls (namely, editor tools). The demo project included covers the second case, but with a little extra effort it can be extended to switch modes.

For the purposes of this article, the sample included is a Document/View architecture MFC SDI (Single Document Interface) form-based application, created with the Visual Studio C++ 6.0 application wizard. You can also create a MDI (Multiple Document Interface) application and use the Direct3D headers and CPP files. The sample renders to the client area of a control owned by the view. The first interesting thing my MFC program does in terms of being a Direct3D application is to derive a class inheriting from both the CWnd and the CXD3D classes:

class CD3DWnd : public CXD3D, public CWnd

So, you can say that CD3DWnd has a mommy and a daddy.

The CXD3D class

CD3DWnd is the base class for a control in the form, e.g. a picture box. The control will perform as a regular CWnd and provide its CXD3D nature with the render window handle, invoke some functions to startup Direct3D and override some methods to actually render the scene. So, let's examine the CXD3D class:

//---------------------------------------------------------
// CXD3D class: the class a view class will derive from to
// provide a window handle to render into, and that will
// override the 3D scene rendering.
//---------------------------------------------------------
class CXD3D
{
protected:
    // internal state variables

    bool m_bActive; // toggled on Pause, can be queried upon
                    // initializing to issue a CreateD3D
 
    bool m_bStartFullscreen; // queried on
                             // ChooseInitialSettings
 
    bool m_bShowCursor;   // in fullscreen mode
    bool m_bClipCursor;   // in fullscreen mode

    bool m_bWindowed;   // queried on
                        // BuildPresentParamsFromSettings

    
    bool m_bIgnoreSizeChange; // queried on
                              // HandlePossibleSizeChange
    
    bool m_bDeviceLost;            // success for: Present
    bool m_bDeviceObjectsInited;   // InitDeviceObjects and
    bool m_bDeviceObjectsRestored; // RestoreDeviceObjects
 
    // internal timing variables

    FLOAT m_fTime;        // absolute execution time
    FLOAT m_fElapsedTime; // elapsed time
    FLOAT m_fFPS;         // the frames per second rate
 
    // statistics

    TCHAR m_strDeviceStats[256]; // device description
    TCHAR m_strFrameStats[16];   // frames per second
 
    // main D3D objects

    HWND m_hWndRender; // device window
    HWND m_hWndFocus;  // focus window

    LPDIRECT3D9           m_pd3d;       // main D3D object
    LPDIRECT3DDEVICE9     m_pd3dDevice; // rendering device
    D3DPRESENT_PARAMETERS m_d3dpp;      // present. params.
 
    DWORD m_dwCreateFlags; // sw/hw VP + pure device

    DWORD m_dwWindowStyle; // saved for mode switches
 
    RECT m_rcWindow;   // window and client rects,
    RECT m_rcClient;   // saved for mode switches
 
    // setup objects

    CXD3DEnum Enumeration;   // adapters, modes, etc.
    CXD3DSettings Settings;  // current display settings
...
};

Do not try to bend the spoon just yet. Let's start with just a few pointers about the state flags. Most of them are easy to master, but I will elaborate on some of them:

The active flag is queried by the view upon initialization to issue a CXD3D::CreateD3D. The fullscreen start and the windowed flags may seem somewhat contradictory. For the time-being, just accept that there are many situations in which a Direct3D application may need to switch from the fullscreen to the windowed mode and vice versa. The clip cursor (in fullscreen mode) flag indicates whether the application confines the cursor to the render target, just in case the user PC's desktop spans across multiple monitors. The ignore size changes flag, when set to false (the default), will let a windowed application reset the Direct3D environment when the user changes the window size.

Apart from that, the last three state flags refer to a "device," meaning a Direct3D device. So what in the world is a Direct3D device?

Direct3D devices

A Direct3D device is the rendering component of Direct3D; it encapsulates and stores the rendering state. In addition, a Direct3D device performs transformations and lighting operations, as well as rasterizes an image to a surface. You can tell things are starting to get better and worse at the same time, so sit back while I expand a little on that definition.

The rendering state is no other than the what, when, where and how 3D objects are displayed. Transformations and lighting (or TL) operate on the objects' vertices, performing all of the pertinent 3D math and color calculations, respectively. Rasterization is the process of turning transformed and lit 3D vertices (TL vertices) into something a PC's graphics card (and ultimately the screen) can handle, namely 2D pixels.

Architecturally, each operation comprises a separate device module: the transformation module, the lighting module and the rasterization module. The first two modules operate on vertices, which is why they are also referred to as Vertex Processing, or VP for short. Now is a good time to go back to bed (or school) if you do not know what a vertex is, but in a geometry for dummies fashion, I will let you in on the secret: a vertex is the location of a point in space, usually described by its x, y and z Cartesian coordinates. 3D graphics programming pipelines use vertices as the base unit for creating triangles or other polygons, literally by connecting the dots. A good number of correctly oriented, colored and lit triangular faces create the smoothness illusion of a real 3D object.

So if you are still here, let's get back on track. Direct3D devices may support hardware VP, depending on the display adapter and driver. This means that a particular adapter may have specialized hardware that can take in "raw" vertex data and perform the transformation and lighting, relieving Direct3D from such tasks. The problem is that a different display adapter may not offer such a feature. In general, applications should provide within a single device both hardware and software VP functionality in order to take advantage of both or even to mix them. However, they should also default to software VP -- which is available in virtually all cases -- if hardware VP is not available.

Incidentally, the performance of hardware VP is comparable to that of software VP, provided the vertex data is available at specific locations for each VP type. Software VP works best when the data is in system memory, while hardware VP works best when the data resides in driver-optimal memory: either local video memory, non-local video memory, system memory or even AGP memory. More importantly than where, though, is the fact that the choice is up to the device driver. Notice that this is not the case for rasterization, for which specialized graphics hardware is usually faster than the PC's processor.

Ok, so it's a good time for a coffee break and letting Direct3D devices sink in. To summarize, they process vertices either with specialized hardware or in software, and turn the output into a pixel raster so that it can be rendered on a surface. Besides that, they are the Direct3D interface through which you set or query the rendering state.

By the way, the SDK definition of a device ends in the word "surface," instead of "screen," or "window." A surface is somewhat of a generalization, for at times it might represent only a part of the screen or window. To be rigorous, in the Direct3D context, a surface represents a linear area of video memory. In practice, it is a rectangular portion of the video memory.

Back from the break, let's get into device types.

Direct3D device types

Applications using Direct3D do not access video graphics cards directly; they call Direct3D functions and methods. Direct3D in turn accesses the hardware through the Hardware Abstraction Layer (HAL). The HAL is a hardware-specific, manufacturer-specific interface -- either embedded in the driver or supplied in a DLL -- that Direct3D uses to work directly with the display hardware, therefore insulating applications from card-specific implementation details. Microsoft sustains that if the computer that your application is running on supports a HAL-type device, it will gain the best performance by using it. However, there are other types of creatable Direct3D devices.

The second type of device you can create is a reference device, which uses special CPU instructions whenever it can. What the special instructions might be depends heavily on the hardware. They include the 3DNow! instruction set on some AMD (American Micro Devices) processors, the MMX (Multi Media Extensions) instruction set supported by many Intel processors and the SSE (Streaming SIMD (Single Instruction Multiple Data) Extensions) instruction set on some Intel Processors. Any type of Direct3D device may use these instruction sets to accelerate VP, but a reference device is one allowed to use the sets to rasterize whenever it can.

Applications cannot rely on reference devices -- a.k.a. reference rasterizers -- to perform on every user machine because of the obvious hardware disparities. This is why Microsoft recommends using reference devices for feature testing or demonstration purposes only. In fact, this is so much the case that Direct3D may succeed in creating a reference device, but the device may not be able to render at all! In a nutshell, you need a state-of-the-art graphics card to create a working reference device.

The third type of device is the software device. If the user's PC provides no special hardware acceleration for rasterization, your application might use a software device to emulate it. It will also make use of the special instruction sets when it can, but it will definitely run slower than a HAL device. Software devices are not readily available. In fact, they are intended for use by driver developers, relying on the Direct3D Driver Development Kit (DDK). So, do not even bother looking for them, as not even Microsoft has one to offer. Besides that, they must be loaded by the application and registered as a plug-in to the Direct3D interface. We just will not mention them anymore.

Finally, to make matters worse, although not admitted as a type, there's a variant of creatable Direct3D devices known as pure devices. A pure device can improve performance, but requires hardware VP, does not support querying some Direct3D states and does not filter any redundant state changes. This means that if you have say, a 1000 rendering state changes per frame, you may be better off with the redundancy filtering done automatically by a non-pure device. What are render state changes, again? Well, anything "animated" from one frame to the next, but worse, for animation does not apply exclusively to geometry. You may animate by changing colors, textures, visual effects and you name them.

Pure devices are designed to maximize the performance of ship-ready applications, meaning that they are not suitable for debugging. As with all performance issues, the only way to know whether or not your application will perform better with a pure device is to compare execution under both types.

Hopefully your machine supports the HAL device type, which encapsulates the hardware capabilities and which, for most cases, is the best choice. That is, until you prove that your application runs smoothly and 3 times faster on a pure hardware VP device or you become a serious Id® Software competitor! Yes, the makers of Doom, but who didn't know that, right?

Back to the CXD3D class...

Interesting things start to happen at CXD3D::CreateD3D.

//----------------------------------------------------------
// CreateD3D(): provided m_hWnd has been initialised, it
// instantiates the d3d object, chooses initial d3d
// settings and initializes the d3d stuff.
//----------------------------------------------------------
HRESULT CXD3D::CreateD3D()
{
    HRESULT hr;

    // check for a window to render to
    if (m_hWndRender == NULL)
        return DisplayErrorMsg(D3DAPPERR_NOWINDOW, MSGERR_CANNOTCONTINUE);
 
    // instantiate a D3D Object
    if ((m_pd3d = Direct3DCreate9(D3D_SDK_VERSION)) == NULL)
        return DisplayErrorMsg(D3DAPPERR_NODIRECT3D, MSGERR_CANNOTCONTINUE);

    // build a list of D3D adapters, modes and devices
    if (FAILED(hr = Enumeration.Enumerate(m_pd3d)))
    {
        SAFE_RELEASE(m_pd3d);
        
        return DisplayErrorMsg(hr, MSGERR_CANNOTCONTINUE);
    }

    // use the device window as the focus window, unless otherwise specified
    if (m_hWndFocus == NULL)
        m_hWndFocus = m_hWndRender;

    // save some window properties into class members
    m_dwWindowStyle = GetWindowLong(m_hWndRender, GWL_STYLE);
    
    GetWindowRect(m_hWndRender, &m_rcWindow);
    GetClientRect(m_hWndRender, &m_rcClient);

    // choose the best settings to render
    if (FAILED(hr = ChooseInitialSettings()))
    {
        SAFE_RELEASE(m_pd3d);
    
        return DisplayErrorMsg(hr, MSGERR_CANNOTCONTINUE);
    }

    // initialize the timer
    DXUtil_Timer(TIMER_START);

    // initialize the app's custom (pre-device creation) stuff
    if (FAILED(hr = OneTimeSceneInit()))
    {
        SAFE_RELEASE(m_pd3d);
        
        return DisplayErrorMsg(hr, MSGERR_CANNOTCONTINUE);
    }

    // initialize the 3D environment, creating the device
    if (FAILED(hr = InitializeEnvironment()))
    {
        SAFE_RELEASE(m_pd3d);
        
        return DisplayErrorMsg(hr, MSGERR_CANNOTCONTINUE);
    }

    // D3D is ready to go so unpause it
    Pause(false);

    return S_OK;
}

The first thing CXD3D::CreateD3D checks for is a non-null m_hWndRender member. Then it will instantiate a D3D object with Direct3DCreate9(D3D_SDK_VERSION). Upon success, our member D3D object pointer to the IDirect3D9 COM interface is initialised, so we can continue. Next, we need to build a list of display adapters, modes and devices using the CXD3D "Enumeration" internal setup object, a CXD3DEnum class object.

Here's the thing: we need a way to know how many display adapters are present in the machine, even though there's usually just one. Then each adapter may be hosting one or more devices. For each device there will be a myriad of formats, settings and capabilities that may or may not be suited for the purposes of the application. So, we need a bunch of lists to keep track of the information about each device in each adapter, all in order to choose the "best" in terms of either application constraints or a "faster is better" criterion.

The enumeration class

So, let's look at the enumeration class: it sets up application constraints for resolution, color, alpha, display formats, back buffer formats, depth/stencil buffer formats, multisampling types, presentation intervals, the usage of a depth buffer and the usage of mixed (both hardware and software) VP. Too scary? Do not worry; I will go through the meaning of each at a moderate pace. What you need to understand right now is that enumeration class holds the application requirements pertaining to a D3D device.

//----------------------------------------------------------
// CXD3DEnum class: enumerates D3D adapters, devices, etc.
//----------------------------------------------------------
class CXD3DEnum
{
...
    // application constraints

    bool AppUsesMixedVP;     // whether the app can take advantage
                             // of the mixed VP type
   
    UINT AppMinFullscreenWidth;   // app min fullscreen width
    UINT AppMinFullscreenHeight;  // app min fullscreen height

    UINT AppMinRGBBits;      // min RGB bits per channel
    UINT AppMinAlphaBits;    // min alpha bits per pixel
     
    bool AppUsesDepthBuffer; // wether the app uses a depth buffer 
     
    UINT AppMinDepthBits;   // min depth bits
    UINT AppMinStencilBits; // min stencil bits

    // app-allowed constraint lists

    DWORDARRAY AppDisplayFormats;
    DWORDARRAY AppBackBufferFormats;
    DWORDARRAY AppDepthStencilFormats;
    DWORDARRAY AppMultiSamplingTypes;
    
    // list of enumerated AdapterInfos
    AdapterInfoArray AdapterInfos;
...
};

Let's keep in mind that the key to getting Direct3D set up and ready to run is to enumerate, enumerate and then enumerate. We start with a set of minimal requirements, so that anything below them gets filtered out. This is implemented through a series of application constraints, the set of variables with the "App" prefix of CXD3DEnum.

The first constraint, AppUsesMixedVP, simply lets the framework turn the usage of a mixed VP on or off, although not at runtime. We mentioned the mixed VP when discussing devices, so track back if you missed it. The fullscreen width and height in pixels -- a.k.a. the display mode resolution -- are constrained. By default, the minimum is set as 640x480. Color bit depth or the number of bits used for each color channel and alpha bit depth -- i.e. the level of opacity or transparency -- are also constrained by AppMinRGBBits and AppMinAlphaBits, respectively. Each of the remaining application constraints deserve a heading, which means that we will examine them in detail.

Display formats

Ok, so here's something interesting. The number of bits for both color and alpha that a particular device can support is comprised into what is known as a display format. The enumeration class holds a list of display formats in AppDisplayFormats, which by default includes every possible one that Direct3D can handle:

D3DFMT_R5G6B5      // 16-bit, 6 for green
D3DFMT_X1R5G5B5    // 16-bit, 5 per channel
D3DFMT_A1R5G5B5    // 16-bit, 1 for alpha
D3DFMT_X8R8G8B8    // 32-bit, 8 per channel
D3DFMT_A8R8G8B8    // 32-bit, 8 for alpha
D3DFMT_A2R10G10B10 // 32-bit, 2 for alpha

As a futile exercise, you could try finding out how many colors each format can produce. Notice that formats added to the list must be in synch with AppMinRGBBits and AppMinAlphaBits, i.e. if your application requires 8-bit alpha, there's no point in adding the less than 8-bit alpha formats to the list.

A note on lists and Direct3D enumerations

Direct3D formats are defined in a huge enum in d3d9types.h named D3DFORMAT, forced to a 32-bit (DWORD) size as most Direct3D enums are. DWORDARRAY is template-based array of DWORDs, a typedef for CTArray<DWORD>. The template class, CTArray, implemented in tarray.h works like a standard array of objects of any type -- i.e. you can use the [] operator -- but it encapsulates other array-handling functions such as Append, Find and Sort and they clean up after themselves. The CTArray class replaces the default SDK framework use of the CArrayList class included in dxutil.h/dxutil.cpp. As to why don't I use the STL library std::list or std::vector template-based classes, well I don't and that's that.

Back buffers

Back buffering is analogous to the way you can do animation with a pad of paper, known as page flipping. On each page, the artist changes the figure slightly so that when you flip rapidly between sheets, the drawing appears animated.

Direct3D implements this functionality through a swap chain. A swap chain is a series of Direct3D buffers that flip to the screen in the way that the artist's paper flips to the next page. The first buffer is referred to as the color front buffer. Applications write to the buffers behind it, i.e. the back buffers, and then flip the front buffer so that one back buffer appears onscreen. While the system displays the image, your software is again writing to another back buffer. When the process executes continuously, it allows for an efficient method to animate images.

A back buffer is therefore a non-visible surface -- i.e. a memory block -- to which bitmaps and other images can be drawn. This is opposed to the visible front buffer that displays the currently visible image. A swap chain is a collection of buffers that are swapped in turns to create a smooth animation. Back buffers also have a color/alpha format and, by default, the enumeration list holds every possible one that Direct3D defines in AppBackBufferFormats, which resume to the same list of display formats.

For windowed applications, the back buffer format does not need to match the display mode format if the hardware supports color conversion, e.g., turning a X8R8G8B8 into a R5G6B5. In this case, the runtime will allow any valid back buffer format to be presented to any desktop format. The exception is for the 8 bits per pixel (256 colors) modes because devices typically do not operate in such modes anymore. They probably had to 10+ years ago, though.

On the other hand, fullscreen applications cannot do color conversion. So, the back buffer format must be identical in all respects to the display format except in the alpha channel bits. This is because fullscreen applications' display formats cannot contain an alpha channel, but back buffers do. So if the display format is D3DFMT_X1R5G5B5, valid back buffer formats include D3DFMT_X1R5G5B5 and D3DFMT_A1R5G5B5, but exclude D3DFMT_R5G6B5 (notice the 6-bit green). In general, applications are better off avoiding color conversion, and matching the back buffer and display formats.

Incidentally, the last format -- the 10-bit per channel -- is only available as a display format in fullscreen modes and fast PCs with really cool graphics cards. One can dream, so give it a couple of years before PCs, display adapters and Direct3D all support the 32-bits per channel as display formats. Now that will be something, definitely beyond "true" color! I'll coin it as "super true" color! Back to Earth, though, I guess ILM and Dreamworks imagineers already use these formats on a day-to-day basis. So much for that.

Depth/stencil buffer formats

A depth buffer, often called a z-buffer holds depth information (z-coordinate values), used to determine how 3D objects occlude one another. Usually implemented by the hardware, z-buffers solve the problem of determining which elements in a scene are drawn in front others, so that it can hide those which are not. This saves some execution time and memory space. There are also w-buffers, which use the homogeneous w-coordinates from the point's (x,y,z,w) location in projection space. However, they are not supported as widely in hardware as z-buffers and you will really need to polish your matrix algebra to use them.

On the other hand, a stencil buffer is analogous to a real life stencil, which is a "cutout" surface that when laid on top of another surface, allows some of it to be visible through the cutouts or holes, but the rest to be occluded. In our context, a stencil is typically used to mask pixels in an image. The more common visual effects achieved through stencils are known as decaling and outlining. In Direct3D, both depth and stencil buffer formats are combined in the same group of constants:

D3DFMT_D16     // 16-bit z-buffer
D3DFMT_D15S1   // 16-bit z-buffer, 1-bit stencil
D3DFMT_D24X8   // 32-bit z-buffer, 24-bit depth
D3DFMT_D24S8   // 32-bit z-buffer, 8-bit stencil
D3DFMT_D24X4S4 // 32-bit z-buffer, 4-bit stencil
D3DFMT_D32     // 32-bit z-buffer

Hence, the depth/stencil format combination and single enumeration function. The enumeration class depth/stencil formats list, AppDepthStencilFormats, allows every one of these formats by default. As in the case of display and back buffer formats, depth/stencil formats must be in synch with the AppMinDepthBits and AppMinStencilBits constraints. Additionally, you may disable the usage of a depth buffer at all by setting AppUsesDepthBuffer to false.

Multisampling

Multisampling is the technique used by Direct3D to perform full-scene antialiasing. That is, the diminishing of stairstep-like lines that should be smooth, effectively blurring the edges of each polygon in the scene. Multisampling reduces the prominence of such artifacts by sampling the surrounding pixels of each edge and creating color gradients around them. It can also be used in conjunction with multiple rendering passes, altering a different subset of the sample in each rendering pass to simulate some cool visual effects like motion blur, depth-of-field focus effects, reflection blur and so on.

Direct3D multisampling types directly indicate the number of samples available for full-scene antialiasing. The exception is D3DMULTISAMPLE_NONMASKABLE (with a value of 1), which rather enables the multisample quality level. The quality level introduced in version 9.0 of Direct3D can be used to factor the number of samples and achieve a particular visual quality-to-performance ratio. Consider that you find antialiasing support for 6 samples with 4 quality levels. You can use and compare the 6/1, 6/2 and 6/3 ratios for the presentation. This means that the final color intensity (each RGB channel) will be factored that much and you can determine which one makes for the best quality-performance tradeoff.

The enumeration class adds every multisampling type to the corresponding list, AppMultiSamplingTypes, by default.

Enumeration structures

Now that the enumeration has a set of application constraints, we will get into its implementation structures and methods. To hold the information about adapters, devices per adapter, settings and the capabilities of each device, the enumeration class implements a series of hierarchical structures. Look again into the CXD3DEnum class declaration. The last member of the class is an array of AdapterInfo structures. So, here is the declaration of AdapterInfo:

//----------------------------------------------------------
// AdapterInfo: info about a display adapter, and a typedef
// for an array of them
//----------------------------------------------------------
struct AdapterInfo
{
     int                    AdapterOrdinal;
     D3DADAPTER_IDENTIFIER9 AdapterIdentifier;
     D3DDISPLAYMODEARRAY    DisplayModes;
     DeviceInfoArray        DeviceInfos;
};
typedef CTArray<AdapterInfo> AdapterInfoArray;

AdapterInfo holds an adapter's ordinal number (0 for the primary or default display adapter), a D3DADAPTER_IDENTIFIER9 structure, an array of adapter DisplayModes and an array of DeviceInfos. Incidentally, right above the declaration you will find a typedef CTArray<D3DDISPLAYMODE> D3DDISPLAYMODEARRAY; statement. Essentially, AdapterInfo holds information about an adapter, the display modes it can handle and a list of devices it can provide, encapsulated by the last array.

//----------------------------------------------------------
// DeviceInfo: info about a D3D device; we'll use arrays of
// these, hence the typedef
//----------------------------------------------------------
struct DeviceInfo
{
    int              AdapterOrdinal;
    D3DDEVTYPE       DevType;
    D3DCAPS9         Caps;   
    DeviceComboArray DeviceCombos;
};
typedef CTArray<DeviceInfo> DeviceInfoArray;

A DeviceInfo structure inherits the adapter's ordinal from its parent, AdapterInfo. It holds a device type -- HAL, reference or software -- as defined by D3DDEVTYPE enum, as well as a D3DCAPS9 device capabilities structure and an array of DeviceCombos.

//----------------------------------------------------------
// DeviceCombo class: a combination of adapter format and
// back buffer format that is compatible with a particular
// D3D device and the application. We will also use arrays
// of them, hence the typedef.
//----------------------------------------------------------
struct DeviceCombo
{
    int        AdapterOrdinal;
    D3DDEVTYPE DevType;   
    D3DFORMAT  DisplayFormat;
    D3DFORMAT  BackBufferFormat;   
    bool       Windowed;
    DWORDARRAY VPTypes;
    DWORDARRAY DSFormats;
    DWORDARRAY MSTypes;
    DWORDARRAY MSQualityLevels;
    DSMSConflictArray DSMSConflicts;
    DWORDARRAY PresentIntervals;
};
typedef CTArray<DeviceCombo> DeviceComboArray;

A DeviceCombo structure inherits the adapter's ordinal and the device type from its parent DeviceInfo. It also holds a display format and a back buffer format. The combination of both formats gives the structure its name. Besides that, it holds a windowed/fullscreen flag and DWORD lists of VP types, depth/stencil formats, multisampling types and multisampling quality levels. It also keeps track of which depth/stencil formats are not compatible with which multisampling types in a DSMSConflict array and, finally, a DWORD list of presentation intervals. We will discuss the last two later on.

Enumeration walkthrough

So the enumeration class goes on filling up these structures, checking capabilities against application constraints and querying the Direct3D device for support. Use the next graph as a reference if you get lost. Just remember that DeviceCombo members are also lists, not expanded for clarity.

Enumeration
|
+-- AdapterInfos[0]
| |
| +-- DisplayModes[0]
| +-- DisplayModes[1]
| ...
| |
| +-- DeviceInfos[0]
| | |
| | +-- DeviceCombos[0]
| | | |
| | | +-- VPTypes
| | | +-- DSFormats
| | | +-- MSTypes
| | | +-- MSQualityLevels
| | | +-- DSMSConflicts
| | | +-- PresentIntervals
| | +-- DeviceCombos[1]
| | ...
| +-- DeviceInfos[1]
| ...
+-- AdapterInfos[1]
...

Adapter enumeration

It all starts with the CXD3DEnum::Enumerate function taking in a pointer to the Direct3D interface. It will keep a local reference to it in its m_pd3d object, use it to get an adapter count and traverse adapters to store identifiers in the AdapterInfos array. Once the function IDs an adapter, it uses the GetAdapterModeCount and EnumAdapterModes Direct3D methods to retrieve and store display modes. Both methods take in a display format -- incidentally, an enhancement of Direct3D version 9.0 -- so we will pass them each of our application-defined display formats. Enumerated modes are returned in a Direct3D D3DDISPLAYMODE structure:

typedef struct _D3DDISPLAYMODE 
{
    UINT Width;
    UINT Height;
    UINT RefreshRate;
    D3DFORMAT Format;
} 
D3DDISPLAYMODE;

Right away we can check a display mode's dimensions, color bit depth and alpha bit depth for compatibility with the application. Again: we ID an adapter, traverse app-defined display formats and enumerate display modes for each format. An available display mode meeting the application's requirement goes into the DisplayModes list. Look at the graph again for more insight. Some display modes might not make it into the list, either because the display format is not supported at all (and GetAdapterModeCount returns 0) or because it did not pass the tests, e.g. the 320x200 fullscreen mode when AppMinFullscreenWidth is 640. However, when one does, the display format is appended to a temporary list used later on to enumerate devices on each adapter.

After traversing every adapter and collecting every possible display mode that makes the application happy, we sort the display modes so that the smallest and fastest bubbles up to the top of the list. The CTArray class implements sorting with qsort, taking in a callback sorting function. To sort display modes, the enumeration uses the SortModesCallback function at the top of the CXD3DEnum.cpp file. At this point, Enumerate has identified a display adapter and enumerated every display mode it can handle and that the application allowed. The function passes the adapter and the temp subset of display formats to the EnumerateDevices function.

Device enumeration

This is the process of filling up the DeviceInfo list for the passed-in adapter. There will be at most three DeviceInfos for each adapter, namely a HAL, a reference and a software device. So, we ask Direct3D for support of each type and store its capabilities (if supported) in the Caps member, with a single IDirect3D9::GetDeviceCaps call. Device types that do not make it through the call are skipped. Those which do turn to their respective DeviceCombos enumeration, again using the passed-in display formats list.

DeviceCombo enumeration

So, we turn into EnumerateDeviceCombos for a particular device supporting each of the passed-in display formats. This will most probably be a subset of AppDisplayFormats. We will start by traversing these formats and will retrieve each one in turn. Now we traverse the application-allowed backbuffer formats, retrieve one, skip it if does not meet the alpha bit depth, and check if both display and back buffer formats can be used concurrently on the device in both windowed and fullscreen modes (hence, the third inner loop). Support is found through the IDirect3D9::CheckDeviceType API call.

The format combo that makes it through the call yields a system-supported DeviceCombo, but it still needs to be checked against other application constraints for compatibility: namely a VP type, a depth/stencil format, a multisampling type (and quality levels), conflict between the last two, and a presentation interval. Each of these constraints corresponds to the lists maintained by a DeviceCombo and each has its own enumeration function.

VP types enumeration

HAL devices may support the three different VP types: software, hardware and mixed, as described previously. To find support for hardware VP, the framework checks Caps.DevCaps for the D3DDEVCAPS_HWTRANSFORMANDLIGHT flag. If set, the device supports it and therefore it also supports the mixed VP type, although it will only be used if the AppUsesMixedVP flag is set. The VP types enumeration function also checks capabilities for the pure device type, setting the VP type to pure hardware VP so that the framework can take advantage of this and create a pure device. If hardware transformations and lighting are not supported, the framework defaults to software VP, which is always available.

A note on device capabilities

The framework inspects device capabilities for hardware VP support, presentation intervals or null reference devices only. However, your application may need to check for other stuff in order to work. Suppose your application requires hardware-supporting volume textures, for whatever they are, so at some point it will have to check the Caps.TextureCaps member for the D3DPTEXTURECAPS_VOLUMEMAP flag. If it is not set, the hardware simply does not support them and your application should exit gracefully and tell the user to go get a new display adapter.

Multisampling enumeration

Another application constraint is the allowed multisampling types, filled up by default with every possible type in the CXD3DEnum constructor. This enumeration function actually asks Direct3D for support of each one on the render-target surface (the back buffer), filtering out the unsupported ones with a Direct3D CheckDeviceMultiSampleType call. The call also returns, upon success, the number of quality levels for the type. Both values -- type and quality levels -- make it into their corresponding lists. Different multisampling types can have different quality levels, so make sure you access both lists in sync when passing the values to other API calls and structures.

Depth/stencil formats enumeration

This function traverses the application-allowed depth/stencil formats and checks each against the corresponding "AppMin" values, just in case. Then it asks Direct3D two questions about the format: can it be used on the device and is it compatible with both the device's display and back buffer formats. If and only if the answer to both questions is yes, a depth/stencil buffer format makes it into the current DeviceCombo's list of DSFormats. So, typically we will end up with subsets of the app-defined depth/stencil formats.

Depth/stencil-multisampling conflicts enumeration

If your application will use depth/stencil surfaces, they must allow multisampling. Furthermore, if such surfaces are to be used in conjunction with any render target (back buffer) surface, both surfaces require the same multisampling type. Therefore we must check every depth/stencil format against every multi-sample type that made it into the current DeviceCombo for compatibility. This is done with a device API call which, upon failure, registers the conflict in the corresponding list. The purpose of all this is to prevent an invalid combination from making it into the display settings.

Presentation intervals enumeration

These refer to the driver's ability to update the display, i.e. swap the presentation, at a certain rate in terms of the screen refresh rate (a.k.a. the vertical sync).

The vertical sync is the number of times per second that the screen gets refreshed, typically around 60 for most CRTs and TVs. Did you ever wonder how your CRT (Cathode Ray Tube) monitor works? Well, an electron ray or beam gets deflected by a heated cathode to hit a phosphor-coated vacuum tube in a continuous sweep from left to right and top to bottom. When the ray gets to the bottom-right corner, it retraces its way to the top along the diagonal and repeats the sweep. The number of completed sweeps per second equates the infamous 60 Hz, the vertical sync. Incidentally, the number of horizontal lines in the sweep is what makes the difference between conventional NTSC TV (525 lines) and HDTV (1125 lines). Now don't go asking me about flat screens or LCDs because they work differently, but they also have a refresh rate. Anyway, so much for a video electronics crash course. Let's get on with it.

Choosing a presentation interval that matches the screen refresh rate limits the possibility of display artifacts, making your application generally more reliable. On the other hand, when a display driver supports the immediate presentation interval -- i.e. one not in sync with the screen refresh rate -- the runtime might update the scene more than once during the adapter refresh period. This is the same as saying that we might get much higher frame rates. So, the choice carries within itself the universal tradeoff between speed and stability or performance vs. quality. As always, you must test, test, and when done testing, test again.

The immediate presentation interval is always available, but is worth checking to prevent blowing up some old cards. Just kidding! Nowadays most PCs should be able to handle it. Your application may get around a 10 fps kick out of it without display artifacts or, if any, the same ones that I get with the default interval. The default presentation interval equivalent to the "one" presentation interval is also always available. The enumeration will put it on top of the list, just in case the application is windowed. In this case, it helps in reducing mouse flicker when compared to the immediate interval. 2, 3, and 4 interval support are hardware-dependant, so we check device capabilities for them.

And that's it for the enumeration object! We have a full set of adapters, display modes and device capabilities that meet the application's requirements so that we can choose among them to display our 3D scene. That does it for part I. I'd say we're halfway to setting up Direct3D, but there's way too much material on this first delivery. Still, I hope it enlightens the newbies with such courage to read through it, and that it stirs up the old D3D wolves' status quo, inspiring corrections and the occasional death threat.

Keep it real, and stay tuned for part II.

History

4 October, 2006 -- Original version posted
16 July, 2007 -- Article edited and moved to the main CodeProject.com article base