Click here to Skip to main content
Click here to Skip to main content

DirectX 11 Compute Shaders

By , 22 Feb 2013
 

Introduction

This article introduces GPGPU via DirectX11 Compute Shaders.

GPGPU (General-Purpose Computing on Graphics Processing Units) involves using graphics processing units to perform repeated calculation, utilizing the vast array of processing elements available on the GPU.

This article will demonstrate a very simple trigonometric calculation executed on the GPU.

Additional attached code shows the classic use GPGPU with square matrix squaring (multiplication) by spawning nrow*nrow number of GPU threads. This example is chosen since the output elements can be calculated independently.

Background

GPGPU has been around for more than a year with NVIDIA introducing CUDA, AMD introducing close to metal and AMD stream, and many other enthusiasts trying to use DirectX9 pixel shaders to achieve GPGPU.

Using the code

The attached code is compiled using VS2010 Beta 1 using libraries from DirectX SDK (August 2009) on Windows 7 RC. This code will not run on Windows XP since DirectX11 is not available for Windows XP. Some parts of the source code are picked up from DirectX SDK August 09 samples and adapted to suite the program.

The code starting point is Start(void*). The program is divided into the following sub parts:

Creation of a device (the easiest part)

Use D3D_DRIVER_TYPE_REFERENCE for emulation, and D3D_DRIVER_TYPE_HARDWARE to run code on GPU (you will require hardware support for this).

D3D11CreateDevice( NULL,D3D_DRIVER_TYPE_REFERENCE/*D3D_DRIVER_TYPE_HARDWARE*/, 
  NULL, D3D11_CREATE_DEVICE_SINGLETHREADED|D3D11_CREATE_DEVICE_DEBUG, 
  NULL, 0,D3D11_SDK_VERSION, &pDeviceOut, &flOut, &pContextOut );

Load the GPU

The tough bit is the programmer must load the buffers to the GPU for processing. The attached source code will shed a lot more light on this:

//for input buffer

HRESULT CreateStructuredBufferOnGPU( ID3D11Device* pDevice, 
        UINT uElementSize, UINT uCount, VOID* pInitData, 
        ID3D11Buffer** ppBufOut )
{

    *ppBufOut = NULL;
    D3D11_BUFFER_DESC desc;
    ZeroMemory( &desc, sizeof(desc) );

    desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE;
    desc.ByteWidth = uElementSize * uCount;
    desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
    desc.StructureByteStride = uElementSize;

    if ( pInitData )
    {
    D3D11_SUBRESOURCE_DATA InitData;
    InitData.pSysMem = pInitData;
    return pDevice->CreateBuffer( &desc, &InitData, ppBufOut );
    }
    else
        return pDevice->CreateBuffer( &desc, NULL, ppBufOut );
}

//for input buffer
HRESULT CreateBufferSRV( ID3D11Device* pDevice, ID3D11Buffer* pBuffer, 
                         ID3D11ShaderResourceView** ppSRVOut )
{

    D3D11_BUFFER_DESC descBuf;
    ZeroMemory( &descBuf, sizeof(descBuf) );
    pBuffer->GetDesc( &descBuf );
    D3D11_SHADER_RESOURCE_VIEW_DESC desc;
    ZeroMemory( &desc, sizeof(desc) );
    desc.ViewDimension = D3D11_SRV_DIMENSION_BUFFEREX;
    desc.BufferEx.FirstElement = 0;

    if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS )
    {
        // This is a Raw Buffer
        desc.Format = DXGI_FORMAT_R32_TYPELESS;
        desc.BufferEx.Flags = D3D11_BUFFEREX_SRV_FLAG_RAW;
        desc.BufferEx.NumElements = descBuf.ByteWidth / 4;
    }
    else
        if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_STRUCTURED )
        {

            // This is a Structured Buffer
            desc.Format = DXGI_FORMAT_UNKNOWN;
            desc.BufferEx.NumElements = 
               descBuf.ByteWidth / descBuf.StructureByteStride;
        }
        else
        {
            return E_INVALIDARG;
        }
    return pDevice->CreateShaderResourceView( pBuffer, &desc, ppSRVOut );
}

//for output buffer    
HRESULT CreateBufferUAV( ID3D11Device* pDevice, ID3D11Buffer* pBuffer, 
                         ID3D11UnorderedAccessView** ppUAVOut )
{
    D3D11_BUFFER_DESC descBuf;
    ZeroMemory( &descBuf, sizeof(descBuf) );
    pBuffer->GetDesc( &descBuf );

    D3D11_UNORDERED_ACCESS_VIEW_DESC desc;
    ZeroMemory( &desc, sizeof(desc) );
    desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
    desc.Buffer.FirstElement = 0;

    if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS )
    {
        // This is a Raw Buffer
        desc.Format = DXGI_FORMAT_R32_TYPELESS;
        // Format must be DXGI_FORMAT_R32_TYPELESS,
        // when creating Raw Unordered Access View

        desc.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
        desc.Buffer.NumElements = descBuf.ByteWidth / 4; 
    }
    else
        if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_STRUCTURED )
        {
            // This is a Structured Buffer
            desc.Format = DXGI_FORMAT_UNKNOWN;
            // Format must be must be DXGI_FORMAT_UNKNOWN,
            // when creating a View of a Structured Buffer

            desc.Buffer.NumElements = 
                 descBuf.ByteWidth / descBuf.StructureByteStride; 
        }
        else
        {
            return E_INVALIDARG;
        }
    return pDevice->CreateUnorderedAccessView( pBuffer, &desc, ppUAVOut );
}

Run

This command dispatches the data to the processing elements available to the GPU, and its performance is directly related to the hardware and driver support (this is for the device created using D3D_DRIVER_TYPE_HARDWARE).

pd3dImmediateContext->Dispatch( X, Y, Z );

Read output buffer

Earlier, using DirectX9, this part was the most painful bit, but with DirectX 11 Compute Shaders, this has become a lot easier.

First, create a temporary read buffer with the CPU access flag set to D3D11_CPU_ACCESS_READ. Then, copy the buffer, and map it to a pointer as shown below:

pd3dImmediateContext->CopyResource( debugbuf, pBuffer );
BufType *p;
pContextOut->Map( debugbuf, 0, D3D11_MAP_READ, 0, &MappedResource );
p = (BufType*)MappedResource.pData; //p will hold the output buffer

Points of interest

With Compute Shaders, we can implement Physics based simulations involving liquids (probably my next project).

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Asif Bahrainwala
India India
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionMatrix multiplication using AMPmemberAsif Bahrainwala22 Feb '13 - 1:25 
GeneralMy vote of 3memberGPUToaster22 Nov '10 - 22:03 
GeneralRe: My vote of 3memberreinux25 Jul '11 - 0:01 
GeneralRe: My vote of 3memberGPUToaster25 Jul '11 - 1:26 
General"specialized Graphics API DX11 is not conforming the GPGPU"memberAsif Bahrainwala25 Jul '11 - 1:42 
GeneralRe: "specialized Graphics API DX11 is not conforming the GPGPU"memberGPUToaster25 Jul '11 - 4:46 
Generalcomparison with DX10memberAsif Bahrainwala8 Feb '10 - 2:35 
GeneralMy vote of 2membervirtualnik2 Jan '10 - 5:29 
GeneralLooks like a copy of the SDK DirectCompute samplememberJohnWallis4221 Dec '09 - 15:03 
GeneralRe: Looks like a copy of the SDK DirectCompute samplememberAsif Bahrainwala20 Jan '10 - 8:27 
GeneralDirectX SDK downloadmemberMooseBoys425 Oct '09 - 7:39 
GeneralMy vote of 2memberTurms29 Sep '09 - 3:40 
GeneralRe: My vote of 2memberAsif Bahrainwala30 Sep '09 - 0:45 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130513.1 | Last Updated 22 Feb 2013
Article Copyright 2009 by Asif Bahrainwala
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid