DirectShow Filters Development Part 3: Transform Filters

Roman Ginzburg

4.83/5 (20 votes)

Feb 22, 2011

CPOL

6 min read

136881

4580

A text overlay filter and a JPEG/JPEG2000 encoder using transform filters.

Download source code - 820 KB

Introduction

Transform filters are probably the most interesting pieces of the DirectShow puzzle. They encapsulate complex image and video processing algorithms. From a filter development point of view, they are not harder to implement than others; however, they do require some additional coding and method overrides. As with rendering and source filters, transform filters also have base classes from which you should inherit when implementing your custom work.

Transform filters have at least two pins, one input pin and one output pin. Transform filters are divided into two categories - copy-transform filters and in-place transform filters. As their name implies, a copy-transform filter takes the data from the input pin, transforms it, and writes the outcome to the output pin, whereas an in-place filter performs its work on the input sample and passes it on to the output filter.

DirectShow provides three base classes for writing transform filters:

CTransformFilter - base class for copy-transform filters
CTransInPlaceFilter - base class for in-place transforms
CVideoTransfromFilter - designed for video decoding and has built-in quality control management for dropping frames in case of flooding

I will cover the first two classes in this article: the CTransInPlace descendent will be used for a text overlay filter, and CTransformFilter will be used for a JPEG/JPEG2000 encoder.

Before we continue, you should take a look at part 1 of this series as the filter development prerequisites, filter registration, and filter debugging are all the same.

Text Overlay Filter

A text overlay filter adds some user defined text to each and every frame that goes through the filter. It can be used for displaying subtitles or a logo. Adding text to the video frame does not change its media subtype or format, therefore an in-place transform suits perfectly. I will be using GDI+ for overlays, as it provides a convenient API for creating in-place bitmaps and drawing characters on a bitmap.

using namespace Gdiplus;
using namespace std;

class CTextOverlay : public CTransInPlaceFilter, public ITextAdditor
{
public:
        DECLARE_IUNKNOWN;

        CTextOverlay(LPUNKNOWN pUnk, HRESULT *phr);
        virtual ~CTextOverlay(void);

        virtual HRESULT CheckInputType(const CMediaType* mtIn);
        virtual HRESULT SetMediaType(PIN_DIRECTION direction, const CMediaType *pmt);
        virtual HRESULT Transform(IMediaSample *pSample);

        static CUnknown *WINAPI CreateInstance(LPUNKNOWN pUnk, HRESULT *phr); 
        STDMETHODIMP NonDelegatingQueryInterface(REFIID riid, void ** ppv);

        STDMETHODIMP AddTextOverlay(WCHAR* text, DWORD id, RECT position, 
                     COLORREF color = RGB(255, 255, 255), float fontSize = 20);
        STDMETHODIMP Clear(void);
        STDMETHODIMP Remove(DWORD id);

private:
        ULONG_PTR m_gdiplusToken;
        VIDEOINFOHEADER m_videoInfo;
        PixelFormat m_pixFmt;
        int m_stride;
        map<DWORD, Overlay*> m_overlays;
};

The only pure virtual method is the Transform method and it must be implemented in your class. In addition, I have also overridden the CheckInputType called for each media type during the pin connection negotiation. Since a transform filter has two pins at least, SetMediaType has the direction argument which indicates whether the connection is performed on the input or the output pin. You may want to save both the input and output video headers. In this case, I only need the input video header since it is exactly the same as the output:

HRESULT CTextOverlay::SetMediaType(PIN_DIRECTION direction, const CMediaType *pmt)
{
   if(direction == PINDIR_INPUT)
   {
      VIDEOINFOHEADER* pvih = (VIDEOINFOHEADER*)pmt->pbFormat;
      m_videoInfo = *pvih;
      HRESULT hr = GetPixleFormat(m_videoInfo.bmiHeader.biBitCount, &m_pixFmt);
      if(FAILED(hr))
      {
             return hr;
      }

      BITMAPINFOHEADER bih = m_videoInfo.bmiHeader;
      m_stride = bih.biBitCount / 8 * bih.biWidth;
   }

   return S_OK;
}

The filter accepts RGB only formats with 15, 16, 24, and 32 bits per pixel, and using the GDI+ Bitmap class, it is possible to create in-place bitmap objects without any buffer copy. After that, I create a graphics object from that bitmap and call the Graphics::DrawString method to draw the user defined text on the bitmap:

HRESULT CTextOverlay::Transform(IMediaSample *pSample)
{
       CAutoLock lock(m_pLock);

       BYTE* pBuffer = NULL;
       Status s = Ok;
       map<DWORD, Overlay*>::iterator it;

       HRESULT hr = pSample->GetPointer(&pBuffer);
       if(FAILED(hr))
       {
           return hr;
       }

       BITMAPINFOHEADER bih = m_videoInfo.bmiHeader;
       Bitmap bmp(bih.biWidth, bih.biHeight, m_stride, m_pixFmt, pBuffer);
       Graphics g(&bmp);
       
       for ( it = m_overlays.begin() ; it != m_overlays.end(); it++ )
       {
          Overlay* over = (*it).second;
          SolidBrush brush(over->color);
          Font font(FontFamily::GenericSerif(), over->fontSize);
          s = g.DrawString(over->text, -1, &font, over->pos, 
                           StringFormat::GenericDefault(), &brush);
          if(s != Ok)
          {
             TCHAR msg[100];
             wsprintf(L"Failed to draw text : %s", over->text);
             ::OutputDebugString(msg);
          }
       }

    return S_OK;
}

Using the ITextAditor interface, you can add a text overlay with ID, remove them by ID, or remove all. Each overlay contains the text, the bounding rectangle, color, and font size:

DECLARE_INTERFACE_(ITextAdditor, IUnknown)
{
   STDMETHOD(AddTextOverlay)(WCHAR* text, DWORD id, RECT position, 
             COLORREF color, float fontSize) PURE;
   STDMETHOD(Clear)(void) PURE;
   STDMETHOD(Remove)(DWORD id) PURE;
};

Overlay objects are stored in a map in a thread safe manner so you can freely add and remove overlays during playback. Thread-safety in the DirectShow framework is achieved using Critical Sections and the CAutoLock class which is usually declared in the beginning of the method, and when going out of scope at the end of the method - the Critical Section is released.

JPEG / JPEG2000 Encoder

It took me a while to decide what type of video encoding to implement, and eventually I decided to make a simple intra-frame encoder - each video frame is encoded with no reference to the previous or next frame. This type of encoding is easier to implement than inter-frame encoding standards like MPEG4 or H264, but suffers from larger stream throughput since there is much redundant pixel information between neighbor frames. I also created a base class for other intra-frame encoder types, and you can easily swap the implementation by inheriting from CBaseCompressor and updating the Factory method which creates the concrete implementations:

struct CBaseCompressor
{
     virtual HRESULT Init(BITMAPINFOHEADER* pBih) PURE;
 
     virtual HRESULT Compress(BYTE* pInput, DWORD inputSize, BYTE* pOutput, 
     DWORD* outputSize) PURE;
 
     virtual HRESULT SetQuality(BYTE quality) PURE;
 
     virtual HRESULT GetMediaSubTypeAndCompression(GUID* mediaSubType,
     DWORD* compression) PURE;
};

By default, the encoding standard is JPEG, and it is based on a code I found here on CodeProject. Using the IJ2KEncoder::SetEncoderType method, you can change the implementation to the JPEG2000 encoding standard which is based on the OpenJpeg library. Please note that if one of the filter's pins is connected, you cannot change the encoder implementation, so it is best to set the desired encoding algorithm right after filter creation.

JPEG2000 and Media Sub Types

When using a JPEG compressor, DirectShow provides a built-in media sub type called MEDIASUBTYPE_MJPG, and it is declared in the uuids.h file. Regarding JPEG2000, I could not find any appropriate GUID, so I created one using the following macro definition:

DEFINE_GUID( MEDIASUBTYPE_MJ2C, MAKEFOURCC('M', 'J', '2', 'C'),
             0x0000, 0x0010, 0x80, 0x00, 0x00, 0xaa, 
             0x00, 0x38, 0x9b, 0x71);

When using the BITMAPINFOHEADER structure for compressed images, you have to set the biCompression field to MAKEFOURCC('M', 'J', '2', 'C'). This way, the filter can connect to JPEG2000 decoders, like this one.

MJ2C means a JPEG2000 code stream, and it is actually a motion JPEG2000 definition where each frame consists of compressed image data. Another standard is J2K, and it is usually used for still image encoding and also contains headers.

Although JPEG2000 provides better compression ratios and better image quality, especially at lower bit rates, it is more CPU intensive than JPEG and hence less suitable for large resolution videos. During a research I made on JPEG2000 implementations, I found a project called CUJ2K - a JPEG2000 implementation based on CUDA - a GPU based API developed by NVIDIA. Since this library was designed for still images located on the hard drive, it uses the command line to pass the source and destination paths for the images. To make use of it for in-memory buffers, it required some additional work, so I decided to go with OpenJpeg; however, it is worth looking at if you need better performance.

Filter Implementation

To implement a transform filter, you have to implement six methods:

Transform - receives input and output media samples.
CheckInputType - checks whether the input pin can connect to an upstream filter.
CheckTransform - checks whether a transformation is possible between input and output media types.
DecideBufferSixe - sets the memory buffer size for the output media samples.
GetMediaType - returns the media type used to connect the output pin with the downstream filter.
SetMediaType - called when the input and output pins are successfully connected.

class CJ2kCompressor : public CTransformFilter, public IJ2KEncoder
{
public:
       DECLARE_IUNKNOWN;

       CJ2kCompressor(LPUNKNOWN pUnk, HRESULT *phr);
       virtual ~CJ2kCompressor(void);

       // CTransfromFilter overrides
       virtual HRESULT Transform(IMediaSample * pIn, IMediaSample *pOut);
       virtual HRESULT CheckInputType(const CMediaType* mtIn);
       virtual HRESULT CheckTransform(const CMediaType* mtIn, 
                       const CMediaType* mtOut);
       virtual HRESULT DecideBufferSize(IMemAllocator * pAlloc, 
                       ALLOCATOR_PROPERTIES *pProp);
       virtual HRESULT GetMediaType(int iPosition, CMediaType *pMediaType);
       virtual HRESULT SetMediaType(PIN_DIRECTION direction, const CMediaType *pmt);

       static CUnknown * WINAPI CreateInstance(LPUNKNOWN pUnk, HRESULT *pHr);
       STDMETHODIMP NonDelegatingQueryInterface(REFIID riid, void ** ppv);

       // IJ2KEncoder
       STDMETHODIMP SetQuality(BYTE quality);
       STDMETHODIMP SetEncoderType(EncoderType encoderType);

private:
       VIDEOINFOHEADER m_VihIn;   
       VIDEOINFOHEADER m_VihOut; 
       CBaseCompressor* m_encoder;
};

The Transform method implementation is pretty straightforward: I get the buffer pointers from the input and output media samples and then pass them to the CBaseCompressor implementation. After that, I set the actual output media sample size and set the sync point to true since every frame is a reference frame:

HRESULT CJ2kCompressor::Transform(IMediaSample* pIn, IMediaSample* pOut)
{
     HRESULT hr = S_OK;

     BYTE *pBufIn, *pBufOut;
     long sizeIn;
     DWORD sizeOut;

     hr = pIn->GetPointer(&pBufIn);
     if(FAILED(hr))
     {
        return hr;
     }

     sizeIn = pIn->GetActualDataLength();

     hr = pOut->GetPointer(&pBufOut);
     if(FAILED(hr))
     {
        return hr;
     }
 
     hr = m_encoder->Compress(pBufIn, sizeIn, pBufOut, &sizeOut);

     if(FAILED(hr))
     {
        return hr;
     }

     hr = pOut->SetActualDataLength(sizeOut);

     if(FAILED(hr))
     {
        return hr;
     }

     hr = pOut->SetSyncPoint(TRUE);

     return hr;
}

Filter Registration

Since this filter is a video encoder, it should be registered in the video compressor filters category, and this is done using the IFilterMapper object:

STDAPI RegisterFilters( BOOL bRegister )
{
   HRESULT hr = NOERROR;
   WCHAR achFileName[MAX_PATH];
   char achTemp[MAX_PATH];
   ASSERT(g_hInst != 0);

   if( 0 == GetModuleFileNameA(g_hInst, achTemp, sizeof(achTemp))) 
   {
          return AmHresultFromWin32(GetLastError());
   }

   MultiByteToWideChar(CP_ACP, 0L, achTemp, lstrlenA(achTemp) + 1, 
                       achFileName, NUMELMS(achFileName));

   hr = CoInitialize(0);
   if(bRegister)
   {
          hr = AMovieSetupRegisterServer(CLSID_Jpeg2000Encoder, 
                  J2K_FILTER_NAME, achFileName, L"Both", L"InprocServer32");
   }

   if( SUCCEEDED(hr) )
   {
      IFilterMapper2 *fm = 0;
      hr = CoCreateInstance( CLSID_FilterMapper2, NULL, 
              CLSCTX_INPROC_SERVER, IID_IFilterMapper2, (void **)&fm);

      if( SUCCEEDED(hr) )
      {
         if(bRegister)
         {
           IMoniker *pMoniker = 0;
           REGFILTER2 rf2;
           rf2.dwVersion = 1;
           rf2.dwMerit = MERIT_DO_NOT_USE;
           rf2.cPins = 2;
            rf2.rgPins = psudPins;
           hr = fm->RegisterFilter(CLSID_Jpeg2000Encoder, J2K_FILTER_NAME, 
                &pMoniker, &CLSID_VideoCompressorCategory, NULL, &rf2);
         }
         else
         {
            hr = fm->UnregisterFilter(&CLSID_VideoCompressorCategory, 0, 
                                      CLSID_Jpeg2000Encoder);
         }
      }

      if(fm)
         fm->Release();
   }

   if( SUCCEEDED(hr) && !bRegister )
   {
          hr = AMovieSetupUnregisterServer( CLSID_Jpeg2000Encoder );
   }

   CoFreeUnusedLibraries();
   CoUninitialize();
   return hr;
}

STDAPI DllRegisterServer() 
{
       return RegisterFilters(TRUE);
}

STDAPI DllUnregisterServer()
{
       return RegisterFilters(FALSE);
}

References

History

23.2.2011

Initial version

13.3.2011

Changed source to use smart pointers
Fixed SetQuality implementation