Click here to Skip to main content
Click here to Skip to main content
Go to top

DirectX 11 Compute Shaders

, 22 Feb 2013
Rate this:
Please Sign up or sign in to vote.
HPC via Compute Shaders (GPGPU).

Introduction

This article introduces GPGPU via DirectX11 Compute Shaders.

GPGPU (General-Purpose Computing on Graphics Processing Units) involves using graphics processing units to perform repeated calculation, utilizing the vast array of processing elements available on the GPU.

This article will demonstrate a very simple trigonometric calculation executed on the GPU.

Additional attached code shows the classic use GPGPU with square matrix squaring (multiplication) by spawning nrow*nrow number of GPU threads. This example is chosen since the output elements can be calculated independently.

Background

GPGPU has been around for more than a year with NVIDIA introducing CUDA, AMD introducing close to metal and AMD stream, and many other enthusiasts trying to use DirectX9 pixel shaders to achieve GPGPU.

Using the code

The attached code is compiled using VS2010 Beta 1 using libraries from DirectX SDK (August 2009) on Windows 7 RC. This code will not run on Windows XP since DirectX11 is not available for Windows XP. Some parts of the source code are picked up from DirectX SDK August 09 samples and adapted to suite the program.

The code starting point is Start(void*). The program is divided into the following sub parts:

Creation of a device (the easiest part)

Use D3D_DRIVER_TYPE_REFERENCE for emulation, and D3D_DRIVER_TYPE_HARDWARE to run code on GPU (you will require hardware support for this).

D3D11CreateDevice( NULL,D3D_DRIVER_TYPE_REFERENCE/*D3D_DRIVER_TYPE_HARDWARE*/, 
  NULL, D3D11_CREATE_DEVICE_SINGLETHREADED|D3D11_CREATE_DEVICE_DEBUG, 
  NULL, 0,D3D11_SDK_VERSION, &pDeviceOut, &flOut, &pContextOut );

Load the GPU

The tough bit is the programmer must load the buffers to the GPU for processing. The attached source code will shed a lot more light on this:

//for input buffer

HRESULT CreateStructuredBufferOnGPU( ID3D11Device* pDevice, 
        UINT uElementSize, UINT uCount, VOID* pInitData, 
        ID3D11Buffer** ppBufOut )
{

    *ppBufOut = NULL;
    D3D11_BUFFER_DESC desc;
    ZeroMemory( &desc, sizeof(desc) );

    desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE;
    desc.ByteWidth = uElementSize * uCount;
    desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
    desc.StructureByteStride = uElementSize;

    if ( pInitData )
    {
    D3D11_SUBRESOURCE_DATA InitData;
    InitData.pSysMem = pInitData;
    return pDevice->CreateBuffer( &desc, &InitData, ppBufOut );
    }
    else
        return pDevice->CreateBuffer( &desc, NULL, ppBufOut );
}

//for input buffer
HRESULT CreateBufferSRV( ID3D11Device* pDevice, ID3D11Buffer* pBuffer, 
                         ID3D11ShaderResourceView** ppSRVOut )
{

    D3D11_BUFFER_DESC descBuf;
    ZeroMemory( &descBuf, sizeof(descBuf) );
    pBuffer->GetDesc( &descBuf );
    D3D11_SHADER_RESOURCE_VIEW_DESC desc;
    ZeroMemory( &desc, sizeof(desc) );
    desc.ViewDimension = D3D11_SRV_DIMENSION_BUFFEREX;
    desc.BufferEx.FirstElement = 0;

    if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS )
    {
        // This is a Raw Buffer
        desc.Format = DXGI_FORMAT_R32_TYPELESS;
        desc.BufferEx.Flags = D3D11_BUFFEREX_SRV_FLAG_RAW;
        desc.BufferEx.NumElements = descBuf.ByteWidth / 4;
    }
    else
        if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_STRUCTURED )
        {

            // This is a Structured Buffer
            desc.Format = DXGI_FORMAT_UNKNOWN;
            desc.BufferEx.NumElements = 
               descBuf.ByteWidth / descBuf.StructureByteStride;
        }
        else
        {
            return E_INVALIDARG;
        }
    return pDevice->CreateShaderResourceView( pBuffer, &desc, ppSRVOut );
}

//for output buffer    
HRESULT CreateBufferUAV( ID3D11Device* pDevice, ID3D11Buffer* pBuffer, 
                         ID3D11UnorderedAccessView** ppUAVOut )
{
    D3D11_BUFFER_DESC descBuf;
    ZeroMemory( &descBuf, sizeof(descBuf) );
    pBuffer->GetDesc( &descBuf );

    D3D11_UNORDERED_ACCESS_VIEW_DESC desc;
    ZeroMemory( &desc, sizeof(desc) );
    desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
    desc.Buffer.FirstElement = 0;

    if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS )
    {
        // This is a Raw Buffer
        desc.Format = DXGI_FORMAT_R32_TYPELESS;
        // Format must be DXGI_FORMAT_R32_TYPELESS,
        // when creating Raw Unordered Access View

        desc.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
        desc.Buffer.NumElements = descBuf.ByteWidth / 4; 
    }
    else
        if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_STRUCTURED )
        {
            // This is a Structured Buffer
            desc.Format = DXGI_FORMAT_UNKNOWN;
            // Format must be must be DXGI_FORMAT_UNKNOWN,
            // when creating a View of a Structured Buffer

            desc.Buffer.NumElements = 
                 descBuf.ByteWidth / descBuf.StructureByteStride; 
        }
        else
        {
            return E_INVALIDARG;
        }
    return pDevice->CreateUnorderedAccessView( pBuffer, &desc, ppUAVOut );
}

Run

This command dispatches the data to the processing elements available to the GPU, and its performance is directly related to the hardware and driver support (this is for the device created using D3D_DRIVER_TYPE_HARDWARE).

pd3dImmediateContext->Dispatch( X, Y, Z );

Read output buffer

Earlier, using DirectX9, this part was the most painful bit, but with DirectX 11 Compute Shaders, this has become a lot easier.

First, create a temporary read buffer with the CPU access flag set to D3D11_CPU_ACCESS_READ. Then, copy the buffer, and map it to a pointer as shown below:

pd3dImmediateContext->CopyResource( debugbuf, pBuffer );
BufType *p;
pContextOut->Map( debugbuf, 0, D3D11_MAP_READ, 0, &MappedResource );
p = (BufType*)MappedResource.pData; //p will hold the output buffer

Points of interest

With Compute Shaders, we can implement Physics based simulations involving liquids (probably my next project).

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

No Biography provided

Comments and Discussions

 
QuestionMatrix multiplication using AMP PinmemberAsif Bahrainwala22-Feb-13 1:25 
GeneralMy vote of 3 PinmemberGPUToaster22-Nov-10 22:03 
GeneralRe: My vote of 3 Pinmemberreinux25-Jul-11 0:01 
GeneralRe: My vote of 3 PinmemberGPUToaster25-Jul-11 1:26 
General"specialized Graphics API DX11 is not conforming the GPGPU" PinmemberAsif Bahrainwala25-Jul-11 1:42 
GeneralRe: "specialized Graphics API DX11 is not conforming the GPGPU" PinmemberGPUToaster25-Jul-11 4:46 
Generalcomparison with DX10 PinmemberAsif Bahrainwala8-Feb-10 2:35 
GeneralMy vote of 2 Pinmembervirtualnik2-Jan-10 5:29 
GeneralLooks like a copy of the SDK DirectCompute sample PinmemberJohnWallis4221-Dec-09 15:03 
GeneralRe: Looks like a copy of the SDK DirectCompute sample PinmemberAsif Bahrainwala20-Jan-10 8:27 
GeneralDirectX SDK download PinmemberMooseBoys425-Oct-09 7:39 
GeneralMy vote of 2 PinmemberTurms29-Sep-09 3:40 
GeneralRe: My vote of 2 PinmemberAsif Bahrainwala30-Sep-09 0:45 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140916.1 | Last Updated 22 Feb 2013
Article Copyright 2009 by Asif Bahrainwala
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid