Stencil Buffer Glows - Part 2

JimRL

4.82/5 (9 votes)

Feb 11, 2011

CPOL

6 min read

32790

977

Create a more robust glow with basic post processing

Introduction

In the previous article, we accentuated objects for the end user with colorful outlines. By masking pixels against the stencil buffer, we were able to generate an exact glow around the model of interest. This procedure worked well for some objects. For others, however, the effect was not quite right. Consider this torus. Its inner ring is missing an outline.

In this article, we will construct a glow effect which outlines arbitrarily shaped models.

Background

Briefly recall the logic of the original article. First, we rendered the model normally while simultaneously creating a per pixel mask of that model. Next, we generated a glow effect. Finally, we composited the glow onto the original image while respecting the mask. In this article, we have similar logic for masking and compositing. However, the procedure for generating the actual glow is much more robust.

Step 1

Render the model into two frame buffers, the depth buffer, and the stencil buffer (i.e. use multiple render targets).

Step 2

Blur the image in the second render target.

Step 3

Render the second render target onto the frame buffer while masking pixels against values currently in the stencil buffer.

Post Processing

Before we jump into the actual implementation, let’s do a very basic and general overview of image processing, specifically the ideas applicable to blurs. If you have never done post processing, this should ease the learning curve and, hopefully, clarify some of the shaders and code to come.

I chose to generate blurs using convolution. Convolution takes the weighted sum of surrounding pixels as the new value of a given pixel. Consider the following kernel of convolution:

For every pixel in the image, the new value of that pixel will be one times the value of its upper left neighbor, plus two times the value of its upper neighbor, plus one times the value of its upper right neighbor, plus two times the value of its left neighbor, plus four times the value of itself, etc. In this particular filter, you may want to divide the end result (or all kernel values) by sixteen. This will retain the original luminance of the image since this kernel currently adds sixteen times the original color value into the pixel being convolved.

Next, we need to talk about separability. A kernel is separable if the product of a column and row vector is the original kernel. In the above example, consider [1 2 1]T * [1 2 1], where 'T' represents transpose. Multiplying these two vectors together produces the original kernel. So, we can now convolve our image by [1 2 1]T and then [1 2 1] to produce the same result as the original matrix. Why does this matter? Well, in this case, we have reduced the number of texture look ups from nine to six. That’s not too dramatic. However, in a fifteen by fifteen kernel, the number of look ups will drop from two hundred and twenty-five to thirty.

Finally, in this project, the kernel weights are generated procedurally through the Gaussian distribution. Don’t panic. Everything we just discussed also applies to Gaussian kernels. However, instead of hard coding kernel weights, the weights are a function of the distance away from the original pixel.

Sample/Demo Requirements

Since I based this project in DirectX 10, you will need Windows Vista or higher. In addition, I was using the DirectX, February 2010, SDK. You will need to install the February 2010, or more recent, redistributable. Finally, although not a requirement, I would recommend having a DirectX 10 compatible GPU.

Source Code Requirements

First, all the sample requirements also apply to the source code. In addition, you will need to install the entire DirectX SDK, the February 2010 version or more recent. After you install the SDK, include the DirectX and DXUT headers and link their respective libraries in your IDE. Note: Microsoft does not distribute the DXUT libraries. There will be a Visual Studio project within the SDK folder containing the DXUT headers. Build this project to create the libraries.

Using the code

From a technical perspective, there are three main concepts interacting in this project: stencil masking, multiple render targets, and post processing.

First, all stencil operations are identical or similar to the ones discussed in the previous article. If you need a refresher or are new to these concepts, I would recommend reviewing part one of the series.

Next, while the concept of using multiple render targets may seem trivial, the implementation can be tricky. In the conceptual explanation, we described using a main frame buffer and a secondary render target. Here, we are actually using three render targets. One remains the primary render target. The other two act as a secondary render target as intermediate results are ping ponged back and forth between them.

Here’s how to set up the render targets as well as views to read and write to them in shaders.

// Set up the second and third render targets.
D3D10_TEXTURE2D_DESC desc;
ZeroMemory( &desc, sizeof(desc) );
desc.Width = pBufferSurfaceDesc->Width;
desc.Height = pBufferSurfaceDesc->Height;
desc.MipLevels = 1;
desc.ArraySize = 1;
desc.Format = pBufferSurfaceDesc->Format;
desc.SampleDesc.Count = 1;
desc.Usage = D3D10_USAGE_DEFAULT;
desc.BindFlags = D3D10_BIND_RENDER_TARGET | D3D10_BIND_SHADER_RESOURCE;

V_RETURN( pd3dDevice->CreateTexture2D( &desc, NULL, 
          &g_pSecondRenderTarget ) );
V_RETURN( pd3dDevice->CreateTexture2D( &desc, NULL, 
          &g_pThirdRenderTarget ) );

// Set up the resource view for the second and third render targets.
D3D10_RENDER_TARGET_VIEW_DESC rtDesc;
rtDesc.Format = desc.Format;
rtDesc.ViewDimension = D3D10_RTV_DIMENSION_TEXTURE2D;
rtDesc.Texture2D.MipSlice = 0;

V_RETURN( pd3dDevice->CreateRenderTargetView( g_pSecondRenderTarget, 
          &rtDesc, &g_pSecondRenderTargetRTView ) );
V_RETURN( pd3dDevice->CreateRenderTargetView( g_pThirdRenderTarget, 
          &rtDesc, &g_pThirdRenderTargetRTView ) );

// Set up the shader resource view for the second and third render targets.
D3D10_SHADER_RESOURCE_VIEW_DESC srDesc;
srDesc.Format = desc.Format;
srDesc.ViewDimension = D3D10_SRV_DIMENSION_TEXTURE2D;
srDesc.Texture2D.MostDetailedMip = 0;
srDesc.Texture2D.MipLevels = 1;

V_RETURN( pd3dDevice->CreateShaderResourceView( g_pSecondRenderTarget, 
          &srDesc, &g_pSecondRenderTargetSRView ) );
V_RETURN( pd3dDevice->CreateShaderResourceView( g_pThirdRenderTarget, 
          &srDesc, &g_pThirdRenderTargetSRView ) );

In order to write data to two targets simultaneously, we need to set up the output merger. This can be done in a single API call. (In passing, note that you can only bind one depth stencil buffer for all render targets.)

// Set primary and secondary render targets.
ID3D10RenderTargetView* pRTViews[2];
pRTViews[0] = pOrigRTView; 
pRTViews[1] = g_pSecondRenderTargetRTView;

pd3dDevice->OMSetRenderTargets( 2, pRTViews, g_pDSView );

In addition, shaders need to reflect this set up. Pixel shaders need to have the same number of outputs as render targets. In this project, there are two targets which have identical output; however, that is not a requirement.

Now, we need to discuss the post processing step. In a pixel shader implementation, there’s an innate problem of actually getting the render target data to the pixel shader. You cannot just "read" the data. You have to render it. Drawing a full screen quad solves this problem. Here’s how to set one up:

// Full Screen Quad Vertices
{
        ScreenVertex svQuad[4];
        svQuad[0].pos = D3DXVECTOR4( -1.0f, 1.0f, 0.5f, 1.0f );
        svQuad[0].tex = D3DXVECTOR2( 0.0f, 0.0f );
	svQuad[1].pos = D3DXVECTOR4( 1.0f, 1.0f, 0.5f, 1.0f );
	svQuad[1].tex = D3DXVECTOR2( 1.0f, 0.0f );
	svQuad[2].pos = D3DXVECTOR4( -1.0f, -1.0f, 0.5f, 1.0f );
	svQuad[2].tex = D3DXVECTOR2( 0.0f, 1.0f );
	svQuad[3].pos = D3DXVECTOR4( 1.0f, -1.0f, 0.5f, 1.0f );
	svQuad[3].tex = D3DXVECTOR2( 1.0f, 1.0f );

	D3D10_BUFFER_DESC vbdesc =
	{
		4 * sizeof( ScreenVertex ),
		D3D10_USAGE_DEFAULT,
		D3D10_BIND_VERTEX_BUFFER,
		0,
		0
	};
	D3D10_SUBRESOURCE_DATA InitData;
	InitData.pSysMem = svQuad;
	InitData.SysMemPitch = 0;
	InitData.SysMemSlicePitch = 0;
	V_RETURN( pd3dDevice->CreateBuffer( &vbdesc, &InitData, 
                  &g_pScreenQuadVB ) );
}

In addition, make sure to omit the transformation pipeline in your vertex shader. The vertices are already spanning screen space which removes the need for any transformations.

For the pixel shader, the blur kernel is first set up by the CPU and loaded onto the shader. Then, the shader convolves each fragment by sampling the original image at the pre-computed kernel positions. Finally, it multiplies them by the appropriate kernel weights.

To learn more about how the positions and weights are computed, I would recommend looking at the code directly or viewing the "HDRToneMappingCS11" DirectX sample. This blur approach is based loosely off of their pixel shader bloom implementation.

Finally, we have a composition step which combines the blurred image into the original. While rendering a full screen quad, we discard noncontributing pixels with an alpha test while shading the remaining pixels the color of the glow. Here’s a snippet of the pixel shader.

PS_OUTPUT PSOutline( PS_INPUT input) : SV_Target{	
    PS_OUTPUT output;
	output.result1 = txDiffuse.Sample( samLinear, input.Tex );
		// alpha test
	if( output.result1.w == 0.0f )	{
		discard;
	}
	// Assign Glow Color
	output.result1.x = 1.0f;
	output.result1.y = 0.0f;
	output.result1.z = 0.0f;
	return output;
}

Technical Recap

Using multiple render targets, render the model into targets one and two.

Convolve the result in target two with the horizontal blur vector. Store the results in render target three.

Convolve the result in target three with the vertical blur vector. Store the results in render target two.

Composite the result in target two with the original render target (target one).

Conclusion

By leveraging post processing, we have extended the outline glow effect to arbitrarily shaped models. However, we have also increased the time and space consumption of our implementation. If you are looking for ways to improve this implementation, it might be worthwhile to incorporate down sampling into the post processing step.

In addition, if you want a more complicated stenciling project, look into stencil shadow implementations.

Cheers!

History

Current revision: Version 1.0