DirectShow Filters Development Part 1: Video Rendering with Direct2D

Roman Ginzburg

4.85/5 (43 votes)

Jan 31, 2011

CPOL

10 min read

226618

6798

This article is about DirectShow development in general and filters development in detail.

Download source code - 81.6 KB

Introduction

This article is about DirectShow development in general and filters development in detail. I decided to share the knowledge and experience in this area so there will be a simple tutorial for developers who wish to write their own filters and can't find enough references on the web. I assume you have a basic knowledge of DirectShow graph management and C++/COM programming. All source code samples are using Visual Studio 2010 on Windows 7, and you have to install Windows SDK for Windows 7 and .NET Framework 4.

The strength of an API is measured not only by its capabilities but also by its extensibility model, and when talking about extensibility - DirectShow really shines: you can extend the framework by building its basic structure blocks called filters. Each filter is actually a C++ object with one or more nested objects called pins which are responsible for connections with other filters and data delivery between them.

Most of the time, you will be using existing filters as there are a plethora of them installed already on your OS, and you can download a lot of free filters from here. However, there are times when you need to do something unusual, and when working in a startup company with a low budget, it is not possible to buy what you need. So one day I found myself struggling with filters development...

DirectShow filters come in three flavors:

Source filters – responsible for generating media samples and pushing them to downstream filters in the graph. Source filters themselves are divided into three groups:

Filesource filters – filters which are responsible for parsing media files, reading media samples, and pushing them to appropriate output pins dealing with video, audio, text, or data.
Capturesource filters – filters that are usually bound to some external device like webcam or video acquisition card, and responsible for generating media samples at a constant rate and pushing them to output pins.
Livesource filters - filters that get video samples at an unspecified pace from the network stream or external function calls and push them downstream.

Transform filters – probably the vast majority of DirectShow filters which are responsible for en/decoding, de/multiplexing, parsing, and splitting media streams. There are two kinds of transform filters:

In place transformation – filters that perform some action on the media sample and deliver it to the output pin without any buffer copy.
Transformation filters which receive a media sample, perform some action, and save its output in another media sample which is pushed down stream.

Renderer filters – filters that act as a "final station" for the media samples, and responsible for either sending the sample to the network, saving it to a file, or showing it on screen.

Renderer filters

Renderer filters are the most easy to implement, simply by inheriting the DirectShow base class and overriding some method calls, so I decided to start this series of articles with rendering filters.

Filter development prerequisites

After installing the Windows SDK, you have to build the baseclasses solution located in C:\Program Files\Microsoft SDKs\Windows\v7.1\Samples\multimedia\directshow\baseclasses in both Debug and Release configurations. After successful build, you will have the strmbasd.lib library in the Debug folder and strmbase.lib in the Release folder.

Create a new Visual C++ -> Win32 project as shown below, and press OK.

After that, select the DLL application type and check the Empty project checkbox so the wizard will not create any main function for us.

After the project is created, right-click on it and select Properties. Now you have to setup the include path for the header files to C:\Program Files\Microsoft SDKs\Windows\v7.1\Samples\multimedia\directshow\baseclasses and the linker path for the *.lib files to C:\Program Files\Microsoft SDKs\Windows\v7.1\Samples\multimedia\directshow\baseclasses\Debug.

Your project will link statically with the following libraries: strmbasd.lib, winmm.lib, d2d1.lib.

Now the configuration is over, and you need to create the actual renderer class and inherit it from CBaseVideoRenderer:

The CBaseVideoRenderer class implements most of the filter functionality for you. All you need to do is implement two pure virtual methods:

CheckMediaType – responsible for media type negotiation between pins during the connection stage.
DoRenderSample – the actual rendering method called each time a media sample should be rendered.

In addition, I have also overridden two virtual methods:

SetMediaType – called when the pins agree on a specific media and a connection is complete.
StartStreaming - called when the graph starts the media streaming.

Besides the methods mentioned above, I also added a method for setting a video window handle which will be used for video presentation, and a setter/getter for letter-boxing: either keep the aspect ratio or stretch to the display window. These methods are declared in the IVideoRenderer interface along with GUIDs and other declarations.

// {269BA141-1FDE-494B-9024-453A17838B9F}
static const GUID CLSID_Direct2DVideoRenderer = 
{ 0x269ba141, 0x1fde, 0x494b, { 0x90, 0x24, 0x45, 0x3a, 0x17, 0x83, 0x8b, 0x9f } };
// {34E5B77C-CCBA-4EC0-88B5-BABF6CF3A1D2}
static const GUID IID_IVideoRenderer = 
{ 0x34e5b77c, 0xccba, 0x4ec0, { 0x88, 0xb5, 0xba, 0xbf, 0x6c, 0xf3, 0xa1, 0xd2 } };
#define FILTER_NAME L"Direct2D Video Renderer"
enum DisplayMode
{
        KeepAspectRatio = 0,
        Fill = 1
};
DECLARE_INTERFACE_(IVideoRenderer, IUnknown)
{
        STDMETHOD(SetVideoWindow)(HWND hWnd) PURE;
        STDMETHOD_(void, SetDisplayMode)(DisplayMode) PURE;
        STDMETHOD_(DisplayMode, GetDisplayMode)(void) PURE;
};
class CD2DVideoRender : public CBaseVideoRenderer, public IVideoRenderer
{
public:
       DECLARE_IUNKNOWN;
       
       CD2DVideoRender(LPUNKNOWN pUnk, HRESULT* phr);
       virtual ~CD2DVideoRender(void);

       virtual HRESULT DoRenderSample(IMediaSample *pMediaSample);
       virtual HRESULT CheckMediaType(const CMediaType *pmt);
       virtual HRESULT SetMediaType(const CMediaType *pmt);
       virtual HRESULT StartStreaming();

       static CUnknown * WINAPI CreateInstance(LPUNKNOWN lpunk, HRESULT *phr); 
       STDMETHODIMP NonDelegatingQueryInterface(REFIID riid, void **ppv);

       STDMETHODIMP SetVideoWindow(HWND hWnd);
       STDMETHOD_(void, SetDisplayMode)(DisplayMode);
       STDMETHOD_(DisplayMode, GetDisplayMode)(void);

       void CreateDefaultWindow();

private:
       HWND m_hWnd;
       CD2DRenderer* m_renderer;
       HANDLE m_event;
       BITMAPINFOHEADER m_bmpInfo;
       CMediaType m_mediaType;
       CColorSpaceConverter* m_converter;
};

Using Direct2D for video rendering

Windows offers a wide variety of APIs suitable for real time video rendering: Video For Windows, GDI/GDI+, Direct3D, DXVA - are all good choices; however, I wanted to take a look at a relatively new and promising API called Direct2D. Direct2D is a brand new API designed for hardware accelerated 2D graphics. Since it is somewhat a replacement to the old GDI/GDI+, we can also use it for image rendering. Direct2D is a lightweight COM API based on Direct3D10; however, it is much simpler than Direct3D API.

Steps for creating Direct2D applications:

Create a factory object
Create a render target (here I am using a HWND render target)
Create a Direct2D bitmap object.

Step 1 is performed in a renderer class constructor since it should be called only once. Steps 2 and 3 should be performed when a filter agrees on a connection type and a media type and frame bounds are known. When a sample is ready for presentation, its data buffer is copied to the Direct2D bitmap and presented on screen.

Direct2D HWND Render Target

Render target is a surface on which the rendering takes place. It is a device specific resource, and may be lost upon some changes like resolution change. So it needs to be recreated during program runtime. Since the HWND render target is bound to some window, its initial size will be the same as the window size; however, during program runtime, the size of the window may change, so the render target needs to be resized too. Since we're dealing with video rendering, I decided to check for window size inside the DrawSample method which is executed for each frame in the video sequence. The method shown below creates the render target, sets the transformation matrix, and also creates the bitmap which will be used for display:

HRESULT CD2DRenderer::CreateResources()
{
       D2D1_PIXEL_FORMAT pixelFormat = 
       {
        DXGI_FORMAT_B8G8R8A8_UNORM,
        D2D1_ALPHA_MODE_IGNORE
    };

    D2D1_RENDER_TARGET_PROPERTIES renderTargetProps = 
       {
        D2D1_RENDER_TARGET_TYPE_DEFAULT,
        pixelFormat,
        0,
        0,
        D2D1_RENDER_TARGET_USAGE_NONE,
        D2D1_FEATURE_LEVEL_DEFAULT
    };

       RECT rect;
       ::GetClientRect(m_hWnd, &rect);

    D2D1_SIZE_U windowSize = 
       {
        rect.right - rect.left,
        rect.bottom - rect.top
    };
    
    D2D1_HWND_RENDER_TARGET_PROPERTIES hWndRenderTargetProps = 
       {
        m_hWnd,
        windowSize,
        D2D1_PRESENT_OPTIONS_IMMEDIATELY 
    };

       HR(m_d2dFactory->CreateHwndRenderTarget(renderTargetProps, 
          hWndRenderTargetProps, &m_hWndTarget));
    
       //  (0,0) + --------> X
       //        |
       //        |
       //        |
       //        Y

       if(m_bFlipHorizontally)
       {
              // Flip the image around the X axis
              D2D1::Matrix3x2F scale = D2D1::Matrix3x2F(1, 0,
                                                        0, -1,
                                                        0, 0);

              // Move it back into place
              D2D1::Matrix3x2F translate = 
                 D2D1::Matrix3x2F::Translation(0, windowSize.height);
              m_hWndTarget->SetTransform(scale * translate);
       }

       FLOAT dpiX, dpiY;
       m_d2dFactory->GetDesktopDpi(&dpiX, &dpiY);

    D2D1_BITMAP_PROPERTIES bitmapProps = 
       {
        pixelFormat,
        dpiX,
        dpiY
    };
    
    D2D1_SIZE_U bitmapSize = 
       {
              m_pBitmapInfo.biWidth,
              m_pBitmapInfo.biHeight
    };

       return m_hWndTarget->CreateBitmap(bitmapSize, bitmapProps, &m_bitmap);
}

Some image formats are bottom up images, so we need to flip them horizontally. Instead of doing it manually in the code, we can change the transformation matrix by flipping the image on the X axis and translating it back to its place. All further operations on the render target will be affected by the transformation matrix.

After that, we create a Direct2D bitmap object with the same pixel format as the render target, but with the size of the actual video frames. Since Direct2D is GPU accelerated, it is the best for performance to render RGB32 bitmaps so both the render target and the bitmap object must be created as RGB32 or ARGB pixel format.

Presenting the frames

After we sat up all the necessary resources, video streaming begins, and the DrawSample method is executed 25 or 30 times a second depending on the video frame rate.

HRESULT CD2DRenderer::DrawSample(const BYTE* pRgb32Buffer)
{
    CheckPointer(pRgb32Buffer, E_POINTER);

    if(!m_bitmap || !m_hWndTarget)
    {
        HR(CreateResources());
    }

    HR(m_bitmap->CopyFromMemory(NULL, pRgb32Buffer, m_pitch));
    
    if (!(m_hWndTarget->CheckWindowState() & D2D1_WINDOW_STATE_OCCLUDED))
    {
        RECT rect;
        ::GetClientRect(m_hWnd, &rect);
        D2D1_SIZE_U newSize = { rect.right, rect.bottom };
        D2D1_SIZE_U size = m_hWndTarget->GetPixelSize();

        if(newSize.height != size.height || newSize.width != size.width)
        {
            m_hWndTarget->Resize(newSize);
        }

        D2D1_RECT_F rectangle = D2D1::RectF(0, 0, newSize.width, newSize.height);

        if(m_displayMode == KeepAspectRatio)
        {
            ApplyLetterBoxing(rectangle, m_bitmap->GetSize());
        }

        m_hWndTarget->BeginDraw();

        m_hWndTarget->Clear(D2D1::ColorF(D2D1::ColorF::Black));
        m_hWndTarget->DrawBitmap(m_bitmap, rectangle);

        HRESULT hr = m_hWndTarget->EndDraw();
        if(hr == D2DERR_RECREATE_TARGET)
        {
            DiscardResources();
        }
    }

    return S_OK;
}

First, I need to make sure the render target and the bitmap are valid, since they may be discarded on the previous call. Then I copy the pixel data to the bitmap and check if the window is not occluded. Then, I check if the window size changed, and change the render target size and apply a letter boxing (discussed below) if needed. As I mentioned before, Direct2D is much simpler than its big brother Direct3D, so there is no swap chains, back buffers, and other heavy stuff :). All you have to do is call BeginDraw, clear the render target, present the bitmap, and call EndDraw to check whether the render target needs to be recreated.

Video letter-boxing

Video frames have an aspect ratio which is calculated by dividing the frame width by its height. Common aspect ratios are 4:3 and 16:9 for wide screen. Since the display window may be resized, the video frame may lose its aspect ratio and will be stretched or shrunken. In order to keep the original aspect ratio, I wrote a simple function called ApplyLetterBoxing which calculates the target rectangle for each video frame:

static inline void ApplyLetterBoxing(D2D1_RECT_F& rendertTargetArea, D2D1_SIZE_F& frameArea)
{
    const float aspectRatio = frameArea.width / frameArea.height;

    const float targetW = fabs(rendertTargetArea.right - rendertTargetArea.left);
    const float targetH = fabs(rendertTargetArea.bottom - rendertTargetArea.top);

    float tempH = targetW / aspectRatio;    
            
    if(tempH <= targetH)
    // desired frame height is smaller than display
    // height so fill black on top and bottom of display 
    {               
        float deltaH = fabs(tempH - targetH) / 2;
        rendertTargetArea.top += deltaH;
        rendertTargetArea.bottom -= deltaH;
    }
    else
    //desired frame height is bigger than display
    // height so fill black on left and right of display 
    {
        float tempW = targetH * aspectRatio;    
        float deltaW = fabs(tempW - targetW) / 2;

        rendertTargetArea.left += deltaW;
        rendertTargetArea.right -= deltaW;
    }
}

So, if the letter-boxing is enabled (and it is, by default), the aspect ratio of the video will always be kept, and the remaining areas of the display window will be black:

Color space conversion

Most video decoders output video frames in one of the YUV420 pixel formats. It is a planar format with sub sampled chrominance values. For more on that subject, visit the FOURCC web site. The filter in this code sample supports most common DirectShow pixel formats, both planar and packed:

YV12 – 12 bits per pixel planar format with Y plane followed by V and U planes
I420(IYUV) – same as YV12 but V and U are swapped
NV12 – 12 bits per pixel planar format with Y plane and interleaved UV plane
YUY2 – 16 bits per pixel packed YUYV array
RGB555 – 16 bits per pixel with 1 bit unused and 5 bits for each RGB channel
RGB565 – 16 bits per pixel with 5 bits Red, 6 bits Green, and 5 bits Blue
RGB24 – 24 bits per pixel with 8 bits for each RGB channel
RGB32 – 32 bits per pixel with 8 bits for Alpha and 8 bits for each RGB channel

All these formats, except the last one, needs conversion to be rendered by Direct2D since for now it does not support YUV images. Since this is an introduction level article, I am using simple C methods for this purpose – unfortunately, they are slow and CPU intensive, so in real world applications, you should consider more appropriate APIs and tools like IPP or swscale from the FFMPEG library.

Filter registration

Since each DirectShow filter is a COM container, it should be registered in the system Registry so the CoCreateInstance method call will find it by its GUID to locate the *.ax file at the registered path, load it into the process memory space, and create an instance of the filter class. Most of this registration code is repetitive for all filter types, and taken from the DirectShow SDK samples. Setup.cpp contains the registration code:

#include <olectl.h>
#include <initguid.h>
#include "D2DVideoRender.h"

#pragma warning(disable:4710) 

const AMOVIESETUP_MEDIATYPE sudOpPinTypes =
{
    &MEDIATYPE_Video,       // Major type
    &MEDIASUBTYPE_NULL      // Minor type
};

const AMOVIESETUP_PIN sudOpPin =
{
    L"Input",               // Pin string name
    TRUE,                   // Is it rendered
    FALSE,                  // Is it an output
    FALSE,                  // Can we have none
    FALSE,                  // Can we have many
    &CLSID_NULL,            // Connects to filter
    NULL,                   // Connects to pin
    1,                      // Number of types
    &sudOpPinTypes };       // Pin details

const AMOVIESETUP_FILTER sudBallax =
{
    &CLSID_Direct2DVideoRenderer,    // Filter CLSID
    FILTER_NAME,            // String name
    MERIT_DO_NOT_USE,       // Filter merit
    1,                      // Number pins
    &sudOpPin               // Pin details
};

CFactoryTemplate g_Templates[] = 
{
  { 
        FILTER_NAME,
    &CLSID_Direct2DVideoRenderer,
    CD2DVideoRender::CreateInstance,
    NULL,
    &sudBallax 
  }
};
int g_cTemplates = sizeof(g_Templates) / sizeof(g_Templates[0]);

STDAPI DllRegisterServer()
{
    return AMovieDllRegisterServer2(TRUE);
} 

STDAPI DllUnregisterServer()
{
    return AMovieDllRegisterServer2(FALSE);
}

Filter debug

After you have successfully registered your filter, you can debug it with GraphEdit which is located at C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\GraphEdt.exe. You can launch it from Visual Studio by adding this path to the project properties Debugging section:

This way when your DLL project is a startup project and you hit F5, GraphEdit will be launched and you can add the filter and debug it if necessary.

Using the code

First, you have to register the filter by calling the regsvr32 utility with a full path to the ax file. After that, you can try using the filter in GraphEdit. When used in GraphEdit, the video handle is not set, therefore a default window will be presenting the video. It will have the same dimensions as the video frame, and created in the center of the screen.

To use a filter in your own code, do the following:

Include IVideoRenderer.h in your project.
Create a filter graph manager.
Create a Direct2D Video Renderer filter and add it to the graph.
Add some source file or other source filter and render it – it will use the renderer that is already present in the graph.
Set a video window handle.
Run the graph.

Note: Error handling is skipped for clarity.

CComPtr<IGraphBuilder> m_graph;
CComPtr<IFilterGraph2> m_filterGraph2;
CComPtr<IMediaControl> m_mediaCtrl;
CComPtr<IBaseFilter>   m_renderFilter;
CComPtr<IVideoRenderer> m_render;
CComPtr<IQualProp> m_quality;

CoInitialize(NULL);

m_graph.CoCreateInstance(CLSID_FilterGraph);
m_graph->QueryInterface(IID_IFilterGraph2, (void**)&m_filterGraph2);
m_graph->QueryInterface(IID_IMediaControl, (void**)&m_mediaCtrl);

m_renderFilter.CoCreateInstance(CLSID_Direct2DVideoRenderer);

m_renderFilter->QueryInterface(IID_IVideoRenderer, (void**)&m_render);
m_renderFilter->QueryInterface(IID_IQualProp, (void**)&m_quality);
m_render->SetVideoWindow(m_hWnd);

m_filterGraph2->AddFilter(m_renderFilter, FILTER_NAME);
m_filterGraph2->RenderFile(fileName, NULL);             
m_mediaCtrl->Run();

Hope you will find this article useful. Any suggestions and comments are welcome.