Disclaimer: This article uses material published on Diligent Engine web site.
Efficiently supplying data to the graphics processing unit (GPU) is essential for a 3D renderer or any other application that utilizes the power of modern GPUs. As CPU and GPU usually have separate memory systems and perform operations in different timelines, it is not always as straightforward as simply writing bytes at a given address. In fact, the optimal way depends on the expected usage scenario. This article describes different ways a resource can be updated in Diligent Engine as well as important internal details and performance implications related to each method.
Diligent Engine is a modern cross-platform low-level graphic library that has Direct3D11, Direct3D12, OpenGL/GLES and Vulkan backends and supports Windows, Linux, MacOS, iOS and Android platforms. This article gives an introduction to the engine.
Buffers represent linear memory and are the most basic resource type. The data can be written to a buffer during initialization, as well as at run time using one of the methods described in this paragraph.
The most basic way to supply data into a buffer is to provide it during the buffer initialization:
IndBuffDesc.Name = "Cube index buffer";
IndBuffDesc.Usage = USAGE_STATIC;
IndBuffDesc.BindFlags = BIND_INDEX_BUFFER;
IndBuffDesc.uiSizeInBytes = sizeof(Indices);
IBData.pData = Indices;
IBData.DataSize = sizeof(Indices);
pDevice->CreateBuffer(IndBuffDesc, IBData, &m_CubeIndexBuffer);
The buffer usage defines how often its content is expected to change.
USAGE_STATIC buffers cannot be updated after being created and initial data must always be provided.
USAGE_DEFAULT buffers are expected to be updated occasionally, while
USAGE_DYNAMIC buffers are optimized for very frequent updates.
To initialize the buffer, Diligent Engine performs the steps described below.
The operation directly translates to glBufferData.
In Direct3D11 backend, the initial data is passed over to ID3D11Device::CreateBuffer method.
In next-gen backends,
USAGE_DEFAULT buffers are allocated in GPU-only memory that is not directly accessible by CPU. To initialize the buffer, Diligent Engine creates a temporary staging buffer that is allocated in a memory visible to CPU, copies the data to this temporary buffer and then issues a GPU-side copy command. As soon as the command completes, temporary buffer is released and the staging memory is returned to the system.
USAGE_DYNAMIC cannot be initialized this way and are described later in this paragraph.
Updating Buffers with IBuffer::UpdateData()
The first way to update a buffer contents at run time is to use
IBuffer::UpdateData() method. Only buffers created with
USAGE_DEFAULT flag can use this way. This method writes new data to a given buffer subregion, as in the example below:
FirstVertToUpdate * sizeof(Vertex),
NumVertsToUpdate * sizeof(Vertex),
Under the hood, Diligent Engine translates this call into the following operations:
The operation directly translates to glBufferSubData.
The operation directly translates to ID3D11DeviceContext::UpdateSubresource.
Default buffers are allocated in GPU-only accessible memory, so the data cannot be written directly. To perform the operation, the engine first allocates temporary storage in a CPU-visible memory, copies the data to this temporary storage and then issues GPU command to copy the data from the storage to the final destination. It also performs necessary resource state transitions (such as shader resource -> copy destination).
IBuffer::UpdateData() is currently the only way to update data in a default (GPU-only) buffer. The operation involves two copy operations. However, the main and not so obvious performance issue with this method is state transitions. Every time when a buffer is used in a copy operation, it needs to be transitioned to copy destination state. Every time it is used in a shader, it needs to be transitioned to shader resource state. Transitioning back and forth stalls the GPU pipeline and degrades performance dramatically.
This method should be used when a buffer content stays constant most of the time and only needs to be updated occasionally, usually no more often than once in a frame, for example, when reusing existing buffer to write new mesh data (vertices/indices). This method should not be used for high frequency updates such as animation or constant buffer updates.
Inefficient for frequent updates.
Updating Dynamic Buffers via Mapping
When buffer contents need to be updated frequently (once or more times per frame), the buffer should be created with
USAGE_DYNAMIC flag. Dynamic buffers cannot be updated with
IBuffer::UpdateData() . Instead, they need to be mapped to obtain a pointer that can be used to write data directly to the buffer, as in the example below:
Vertex* Vertices = nullptr;
VertexBuffer->Map(m_pImmediateContext, MAP_WRITE, MAP_FLAG_DISCARD,
for(Uint32 v=0; v < _countof(CubeVerts); ++v)
const auto& SrcVert = CubeVerts[v];
Vertices[v].uv = SrcVert.uv;
Vertices[v].pos = SrcVert.pos;
VertexBuffer->Unmap(m_pImmediateContext, MAP_WRITE, MAP_FLAG_DISCARD);
The operation translates to glMapBufferRange with
GL_MAP_INVALIDATE_BUFFER_BIT flags set.
In Direct3D11 backend, this call directly translates to ID3D11DeviceContext::Map with
When dynamic buffer is created in Direct3D12 or Vulkan backend, no memory is allocated. Instead, both backends have special dynamic storage which is a buffer created in CPU-accessible memory that is persistently mapped. When dynamic buffer is mapped, a region is reserved in this buffer. This operation boils down to simply moving current offset and is very cheap. A pointer is then returned that references this memory and the application can write data directly, avoiding all copies. When a dynamic buffer is used for rendering, internal dynamic buffer is bound instead and the proper offset is applied. Internal dynamic buffer is pre-transitioned to read-only state and no transitions are ever performed at run time. The engine takes care of synchronization making sure that a region in the buffer is never given to the application while being used by the GPU.
In Direct3D12/Vulkan backends, mapping dynamic buffers with
MAP_FLAG_DISCARD flag is very cheap as it only involves updating current offset. It is hard to say what exactly Direct3D11 and OpenGL do under the hood, but most likely something similar. There is one significant difference however: Direct3D11 and OpenGL preserve contents of dynamic buffers between frames while Direct3D12 and Vulkan backends do not. As a result, mapping is many times more efficient in next-gen backends.
Dynamic buffers should be used for content that changes often, typically multiple time per frame. The most common example is a constant buffer that is updated with different transformation matrices before every draw call. Dynamic buffers should not be used for constant data that never changes.
Only the entire buffer can currently be mapped with
In Direct3D12 and Vulkan backends, the contents of all dynamic resources are lost at the end of every frame. A dynamic buffer must be mapped in every frame before its first use.
The total amount of CPU-accessible memory can be limited. Besides, access from the GPU may be slower compared to GPU-only memory, so dynamic buffers should not be used to store resources that are constant or change infrequently.
Streaming buffer is not an API object, but rather a strategy that allows uploading variable amounts of data to the GPU in an efficient manner. The idea of streaming buffer can be summarized as follows:
- Create dynamic buffer large enough to encompass the maximum amount of data that can be uploaded to GPU.
- First time, map the buffer with
- This will discard previous buffer contents and allocate new memory.
- Set the current buffer offset to zero, write data to the buffer and update offset accordingly.
- Unmap the buffer and issue draw command.
- Note that in Direct3D12 and Vulkan backends, unmapping the buffer is not required and can be safely skipped to improve performance.
- When mapping the buffer next time, check if the remaining space is enough to encompass the new polygon data.
- If there is enough space, map the buffer with
- This will tell the system to return previously allocated memory. It is the responsibility of the application to not overwrite the memory that is in use by the GPU.
- Write new data at current offset (which guarantees that bytes previously written and currently used by the GPU will not be affected) and update the offset.
- If there is not enough space, reset the offset to zero and map the buffer with
MAP_FLAG_DISCARD flag to request new chunk of memory.
While buffers are simply linear regions of memory, textures are optimized for efficient sampling operations and use opaque layouts that are typically not exposed to the application. As a result, only the driver knows how to write data to the texture. Linear texture layouts are allowed in Direct3D12 and Vulkan, but they are less efficient.
Similar to buffers, initial data can be supplied to textures at creation time. For
USAGE_STATIC textures, this is the only way.
TexDesc.Type = RESOURCE_DIM_TEX_2D;
TexDesc.Format = TEX_FORMAT_RGBA8_UNORM_SRGB;
TexDesc.Width = 1024;
TexDesc.Height = 1024;
TexDesc.MipLevels = 1;
TexDesc.BindFlags = BIND_SHADER_RESOURCE;
TexDesc.Usage = USAGE_STATIC;
InitData.pSubResources = subresources;
InitData.NumSubresources = _countof(subresources);
Device->CreateTexture(TexDesc, InitData, &Texture);
Texture initialization is performed similar to buffer initialization. In Direct3D11 and OpenGL/GLES backends, there are corresponding native API calls. In Direct3D12/Vulkan backends, the engine creates temporary staging texture in a CPU-writable memory, copies the data to this memory and then issues a GPU copy command.
Updating Textures with ITexture::UpdateData()
The first way to update a texture at run time is to use
ITexture::UpdateData() method. The method works similar to
IBuffer::UpdateData() and writes new data to a given texture region:
Uint32 Width = 128;
Uint32 Height = 64;
UpdateBox.MinX = 16;
UpdateBox.MinY = 32;
UpdateBox.MaxX = UpdateBox.MinX + Width;
UpdateBox.MaxY = UpdateBox.MinY + Height;
SubresData.Stride = Width * 4;
SubresData.pData = Data.data();
Uint32 MipLevel = 0;
Uint32 ArraySlice = 0;
ArraySlice, UpdateBox, SubresData);
Under the hood, this maps to the following native API commands:
The operation directly translates to glTexSubImage** family of functions.
As with buffer updates, in Direct3D11 backend, this call directly maps to
As with buffers, to update a texture the next-gen backends first allocate region in a CPU-accessible memory and copy client data to this region. They then perform necessary state transitions and issue GPU copy command that writes pixels to the texture using GPU-specific layout.
Usage scenarios are similar to buffer updates: the operation should be used for textures whose contents stay mostly constant and only occasionally requires updates.
As the operation involves two copies and state transitions, it is not efficient for frequent texture updates.
Mapping a texture is a second way to update its contents. From the API side, mapping textures looks similar to mapping buffers:
Uint32 MipLevel = 0;
Uint32 ArraySlice = 0;
Uint32 Width = 128;
Uint32 Height = 256;
MapRegion.MinX = 32;
MapRegion.MinY = 64;
MapRegion.MaxX = MapRegion.MinX + Width;
MapRegion.MaxY = MapRegion.MinY + Height;
Texture->Map(m_pImmediateContext, MipLevel, ArraySlice,
Texture.Unmap(m_pImmediateContext, 0, 0);
What happens under the hood is very different compared to buffers.
Mapping textures is currently not supported in OpenGL/GLES backends.
In Direct3D11 backend, this call directly maps to ID3D11DeviceContext::Map with
There are no dynamic textures in next-gen backends in a way similar to dynamic buffers. While buffers can easily be suballocated from another buffer by binding parent buffer and applying an offset, there is no similar way for textures. So even if the required memory was suballocated from the dynamic buffer, there would be no way to treat this memory as a texture. Binding the memory to an existing texture is also not allowed. As a result, mapping textures in Direct3D12/Vulkan backend does not differ significantly from updating textures with
ITexture::UpdateData(). When mapping a texture, the engine returns the pointer to the CPU-accessible memory directly that avoids one copy. However, GPU-side copy and most importantly state transitions are still performed.
It is not exactly clear what Direct3D11 does under the hood. The two most likely options are either creating linear-layout texture and suballocating it from CPU-accessible memory every time Map is called, or performing the same operations as Diligent's next-gen backends.
Mapping dynamic textures is not as efficient as mapping dynamic buffers, and typical usage scenarios are similar to
There is no simple way to implement high-frequency texture updates across all APIs, so Diligent expects that this will be implemented by the application using low-level API interoperability. For Direct3D12 and Vulkan backends, one possible way is to create a number of linear-layout textures in CPU-writable memory and use them in a round-robin fashion. As this method is very application-specific, Diligent Engine does not expose it through common API.
Texture mapping is not currently implemented in OpenGL/GLES backend.
In Direct3D11, only the entire texture level can be mapped with
In Direct3D12/Vulkan backends, mapping dynamic textures is not as efficient as mapping dynamic buffers. In fact, it is very similar to updating textures with
ITexture::UpdateData() and only avoids one CPU-side copy.
The following table summarizes update methods for buffers:
|Update scenario ||Usage ||Update Method ||Comment |
|Constant data || |
|n/a ||Data can only be written during buffer initialization |
|< Once per frame || |
|>= Once per frame || |
|The content of dynamic buffers is invalidated at the end of every frame |
The following table summarizes update methods for textures:
|Update scenario ||Usage/Update Method ||Comment |
|Constant data || |
USAGE_STATIC / n/a
|Data can only be written during texture initialization |
|< Once per frame || |
|>= Once per frame ||Implemented by the application ||Dynamic textures cannot be implemented the same way as dynamic buffers |
Full engine source code is available for download at GitHub.
The following tutorials illustrate the ideas described in this article:
Tutorial 10 - Data Streaming
Tutorial 11 - Resource Updates