Click here to Skip to main content
Click here to Skip to main content

Real-time video image processing / frame grabber using a minimalistic approach

By , 6 Apr 2010
 

Introduction

All my life, I have been possessed with anything that has something to do with robots and computer vision. So lately, I have been looking for some simple code to read images from movies in any format and from the webcam. Here, I want to share my experiences in this API jungle with you. Also, please consider voting using the bar bellow so I get an idea if this information was useful to anyone.

Background

As hard as I was looking, all I found were samples for the VFW interface, which is a practically dead API with a lot of issues. First of all, it's only for AVI files, and most of the codecs (like divx) just don't work in it anyway (those who have tried it knows), and also in some resolutions, on most webcams, it just crashes. There are a few samples for the newer DirectShow API but all the important code is just buried in the zillion unimportant wrappers/support classes/files. Not to mention that the informational value of the samples themselves is lost in often redundant, error handling jungle. And finally, they are based on using/compiling additional MS DirectShow filter samples, like SampleGrabber or its bug fixed version, GrabberSample, making the whole approach non-portable between the SDK versions. They also seem just disappeared from the last SDK and the Web itself. So here is what I think is a much simpler/versatile alternative to those.


New method not requiring DirectShow SDK

Lately I added to article this new method because a lot of people had problems compiling DirectShow SDK related stuff. Most of them just need to get the RGB data and forget the whole DirectShow thing so I got an idea. If we know how data flows inside directshow graph why don't we just redirect particular method to grab data ?
Now in Directshow data can flow either via:

Push method:    Where FILTER1->out pin pushes data to in<-FILTER2 via Receive on his IMemInputPin interface (our Renderer case).
Pull method:      Where second filter in<-FILTER2 asks first source FILTER1->out for more data via SyncReadAlligned method of his IAsyncReader interface.

In our case (last filter renderer) we capture data of Push method. you can find  sample capturing Pull mode in my other article Fun with Google TTS.
I tested Following code on VisualStudio 2008 and 2010 and it compiled without DirectShow SDK just fine.  Also remember that you will probably need to convert received data to RGB if you wish but it's trivial.

The code

#include <windows.h>
#include <dshow.h>

#pragma comment(lib,"Strmiids.lib")

#define DsHook(a,b,c) if (!c##_) { INT_PTR* p=b+*(INT_PTR**)a;   VirtualProtect(&c##_,4,PAGE_EXECUTE_READWRITE,&no);\
                                          *(INT_PTR*)&c##_=*p;   VirtualProtect(p,    4,PAGE_EXECUTE_READWRITE,&no);   *p=(INT_PTR)c; }


// Here you get image video data in buf / len. Process it before calling Receive_ because renderer dealocates it.
HRESULT ( __stdcall * Receive_ ) ( void* inst, IMediaSample *smp ) ; 
HRESULT   __stdcall   Receive    ( void* inst, IMediaSample *smp ) {     
    BYTE*     buf;    smp->GetPointer(&buf); DWORD len = smp->GetActualDataLength();
    HRESULT   ret  =  Receive_   ( inst, smp );   
    return    ret; 
}

int WINAPI WinMain(HINSTANCE inst,HINSTANCE prev,LPSTR cmd,int show){
    HRESULT hr = CoInitialize(0); MSG msg={0}; DWORD no;

    IGraphBuilder*  graph= 0;  hr = CoCreateInstance( CLSID_FilterGraph, 0, CLSCTX_INPROC,IID_IGraphBuilder, (void **)&graph );
    IMediaControl*  ctrl = 0;  hr = graph->QueryInterface( IID_IMediaControl, (void **)&ctrl );

    ICreateDevEnum* devs = 0;  hr = CoCreateInstance (CLSID_SystemDeviceEnum, 0, CLSCTX_INPROC, IID_ICreateDevEnum, (void **) &devs);
    IEnumMoniker*   cams = 0;  hr = devs?devs->CreateClassEnumerator (CLSID_VideoInputDeviceCategory, &cams, 0):0;  
    IMoniker*       mon  = 0;  hr = cams->Next (1,&mon,0);  // get first found capture device (webcam?)    
    IBaseFilter*    cam  = 0;  hr = mon->BindToObject(0,0,IID_IBaseFilter, (void**)&cam);
                               hr = graph->AddFilter(cam, L"Capture Source"); // add web cam to graph as source
    IEnumPins*      pins = 0;  hr = cam?cam->EnumPins(&pins):0;   // we need output pin to autogenerate rest of the graph
    IPin*           pin  = 0;  hr = pins?pins->Next(1,&pin, 0):0; // via graph->Render
                               hr = graph->Render(pin); // graph builder now builds whole filter chain including MJPG decompression on some webcams
    IEnumFilters*   fil  = 0;  hr = graph->EnumFilters(&fil); // from all newly added filters
    IBaseFilter*    rnd  = 0;  hr = fil->Next(1,&rnd,0); // we find last one (renderer)
                               hr = rnd->EnumPins(&pins);  // because data we are intersted in are pumped to renderers input pin 
                               hr = pins->Next(1,&pin, 0); // via Receive member of IMemInputPin interface
    IMemInputPin*   mem  = 0;  hr = pin->QueryInterface(IID_IMemInputPin,(void**)&mem);

    DsHook(mem,6,Receive); // so we redirect it to our own proc to grab image data

    hr = ctrl->Run();   

    while ( GetMessage(   &msg, 0, 0, 0 ) ) {  
        TranslateMessage( &msg );   
        DispatchMessage(  &msg ); 
    }
};

Changing WebCam resolution

Optionally you can add following code to set resolution right after you get capture device output pin. [thx to LightWing in forum below for finding it ;) ]

   ...
        IPin*           pin  = 0;  hr = pins?pins->Next(1,&pin, 0):0; // via graph->Render
   
	IAMStreamConfig* cfg = 0;  hr = pin->QueryInterface( IID_IAMStreamConfig, (void **)&cfg);  // (Those are optional steps to set better resolution)
	int       sz,max_res = 0;  hr = cfg->GetNumberOfCapabilities(&max_res, &sz); VIDEO_STREAM_CONFIG_CAPS cap[2]; // find last =
	AM_MEDIA_TYPE*  fmt  = 0;  hr = cfg->GetStreamCaps(max_res-1, &fmt, (BYTE*)cap); 	// max supported resolution (cap contains res x and y sizes)
                                   hr = cfg->SetFormat(fmt); // and set it to device before capture starts
   ...


Previous method requiring DirectShow SDK:

Starting with DirectShow

Most users who for the first time try to compile any DirectShow sample from web will spend 90% of time searching for the SDK itself (as for now it has been moved to the Platform SDK from the DirectX SDK, so be sure to get latest) and then spend the rest of the day solving the undefined variables (add YOUR_PLATFORMSDK_DIR\Samples\Multimedia\DirectShow\BaseClasses to the compiler include paths) and unresolved externals (add strmiids.lib, winmm.lib, and strmbasd.lib (strmbase.lib for the release build) to the linker input). But the latter two libs are not redistributed by Microsoft and must usually be built manually (just and run vcvars32 and nmake (nmake nodebug=1 for release lib) in the include directory mentioned above and copy there the libs from the subdirectories created by nmake).

Using the code

So here is the sample. Just create an empty Windows C++ application in whatever development environment you have (Disable UNICODE in VC++ or you get linking errors), and copy/paste it there. I stripped it to a bare minimum to run, so please forgive its compressed formatting for the unimportant code parts. And as usual, you can find information on any used API simply by searching for its name in Google and using the first found (usually MSDN) page.

#include <windows.h>
#include <dshow.h>
#include <streams.h>

int w,h; HWND hwnd; MSG msg = {0}; BITMAPINFOHEADER bmih={0}; 

struct __declspec(  uuid("{71771540-2017-11cf-ae26-0020afd79767}")  ) CLSID_Sampler;

struct Sampler : public CBaseVideoRenderer {
    Sampler( IUnknown* unk, HRESULT *hr ) : CBaseVideoRenderer(__uuidof(CLSID_Sampler), NAME("Frame Sampler"), unk, hr) {};
    HRESULT CheckMediaType(const CMediaType *media ) {    
        VIDEOINFO* vi; if(!IsEqualGUID( *media->Subtype(), MEDIASUBTYPE_RGB24) || !(vi=(VIDEOINFO *)media->Format()) ) return E_FAIL;
        bmih=vi->bmiHeader;	SetWindowPos(hwnd,0,0,0,20+(w=vi->bmiHeader.biWidth),60+(h=vi->bmiHeader.biHeight),SWP_NOZORDER|SWP_NOMOVE);
        return  S_OK;
    }
    HRESULT DoRenderSample(IMediaSample *sample){
        BYTE* data; sample->GetPointer( &data ); 
        // Process RGB Frame data* here. For Example: ZeroMemory(data+w*h,w*h);
        BITMAPINFO bmi={0}; bmi.bmiHeader=bmih; RECT r; GetClientRect( hwnd, &r );
        HDC dc=GetDC(hwnd);
        StretchDIBits(dc,0,16,r.right,r.bottom-16,0,0,w,h,data,&bmi,DIB_RGB_COLORS,SRCCOPY);
        ReleaseDC(dc);
        return  S_OK;
    }
    HRESULT ShouldDrawSampleNow(IMediaSample *sample, REFERENCE_TIME *start, REFERENCE_TIME *stop) {
        return S_OK; // disable droping of frames
    }
};

int WINAPI WinMain(HINSTANCE inst,HINSTANCE prev,LPSTR cmd,int show){
    HRESULT hr = CoInitialize(0);
    hwnd=CreateWindowEx(0,"LISTBOX",0,WS_SIZEBOX,0,0,0,0,0,0,0,0); 
           
    IGraphBuilder*  graph= 0; hr = CoCreateInstance( CLSID_FilterGraph, 0, CLSCTX_INPROC,IID_IGraphBuilder, (void **)&graph );
    IMediaControl*  ctrl = 0; hr = graph->QueryInterface( IID_IMediaControl, (void **)&ctrl );

    Sampler*        sampler      = new Sampler(0,&hr); 
    IPin*           rnd  = 0; hr = sampler->FindPin(L"In", &rnd);
                              hr = graph->AddFilter((IBaseFilter*)sampler, L"Sampler");

    ICreateDevEnum* devs = 0; hr = CoCreateInstance (CLSID_SystemDeviceEnum, 0, CLSCTX_INPROC, IID_ICreateDevEnum, (void **) &devs);
    IEnumMoniker*   cams = 0; hr = devs?devs->CreateClassEnumerator (CLSID_VideoInputDeviceCategory, &cams, 0):0;  
    IMoniker*       mon  = 0; hr = cams?cams->Next (1, &mon, 0):0;
    IBaseFilter*    cam  = 0; hr = mon?mon->BindToObject(0,0,IID_IBaseFilter, (void**)&cam):0;
    IEnumPins*      pins = 0; hr = cam?cam->EnumPins(&pins):0; 
    IPin*           cap  = 0; hr = pins?pins->Next(1,&cap, 0):0;
                              hr = graph->AddFilter(cam, L"Capture Source"); 

    IBaseFilter*    vid  = 0; hr = graph->AddSourceFilter (L"c:\\Windows\\clock.avi", L"File Source", &vid);
    IPin*           avi  = 0; hr = vid?vid->FindPin(L"Output", &avi):0;

    hr = graph->Connect(cap?cap:avi,rnd);
    hr = graph->Render( cap?cap:avi );
    hr = ctrl->Run();
    SendMessage(hwnd, LB_ADDSTRING, 0, (long)(cap?"Capture source ...":"File source ..."));
    ShowWindow(hwnd,SW_SHOW);

    while ( msg.message != WM_QUIT) {  
        if( PeekMessage(      &msg, 0, 0, 0, PM_REMOVE ) ) {
            TranslateMessage( &msg );   
            DispatchMessage(  &msg ); 
            if(msg.message == WM_KEYDOWN && msg.wParam==VK_ESCAPE ) break;
        }   Sleep(30);
    }
};

Points of interest

The most important lines are in DoRenderSample since there you get your image in RGB format in every frame. The sample itself tries to find and connects to the first found pin on the first found video capturing device. And if none is found, then it tries to open and run a video file from the provided path. So in fact, these are two tutorials in one. The cool side-feature is that you can use the Alt+PrintScreen key to grab a screenshot from (any?) movie/ webcam and paste/save it to your favorite image editor.

Place for enhancements 

Consider using some form of smart pointers like CComPtr<IGRAPHBUILDER> instead of raw pointers to interfaces like IGraphBuilder*. I removed them to keep the sample buildable by people who don't have the ATL libs. Example code to have more control over the video playback (auto-rewind in this case).

...
IMediaControl*  ctrl  = 0;  hr = graph->QueryInterface( IID_IMediaControl, (void **)&ctrl );
IMediaSeeking*  seek  = 0;  hr = graph->QueryInterface( IID_IMediaSeeking, (void **)&seek );     
IMediaEvent*    event = 0;  hr = graph->QueryInterface( IID_IMediaEventEx, (void **)&event );
...
    while ( msg.message != WM_QUIT) {  
        long code, a, b; event->GetEvent(&code, &a, &b, 0); 
        if( code == EC_COMPLETE ) {	
              LONGLONG pos=0;
              hr=seek->SetPositions(&pos,AM_SEEKING_AbsolutePositioning,0,0);
              hr=ctrl->Run();
        }
...

You may add this code to inspect (connect to a remote graph) the whole render graph, graphically, in graphedit (to debug the capture input pins, for example):

WCHAR wsz[256]; (void)StringCchPrintfW(wsz, NUMELMS(wsz),L"FilterGraph %08x  pid %08x\0", (DWORD_PTR) 0, GetCurrentProcessId());
...
IGraphBuilder*  graph      = 0; hr = CoCreateInstance( CLSID_FilterGraph, 0, CLSCTX_INPROC,IID_IGraphBuilder, (void **)&graph );
IMoniker*       moniker    = 0; hr = CreateItemMoniker(L"!", wsz, &moniker); DWORD regno = 0xfedcba98;
IRunningObjectTable* gedit = 0; hr = GetRunningObjectTable(0, &gedit);
                                hr = gedit->Register(ROTFLAGS_REGISTRATIONKEEPSALIVE, graph, moniker, &regno); 

Some valuable tips from the Forum users.

Unresolved externals: Make sure you did build libs mentioned in article.
VS6: Get Platform SDK and don't forget to set standard SDK include / lib paths
VS2005: unresolved externals
Go to "Configuration Properties" -> "General" -> "Character Set"
By default, it is set to "Use Unicode Character Set". Change it to "Not Set" and the project should link.

So enjoy it, and don't forget that most of the advanced error handling etc. is up to you to implement, and this is just very basic sample so you can focus on how it works.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Ladislav Nevery
Software Developer (Senior)
Slovakia Slovakia
Member
Past Projects:
[Siemens.sk]Mobile network software: HLR-Inovation for telering.at (Corba)
Medical software: CorRea module for CT scanner
[cauldron.sk]Computer Games:XboxLive/net code for Conan, Knights of the temple II, GeneTroopers, CivilWar, Soldier of fortune II
[www.elveon.com]Computer Games:XboxLive/net code for Elveon game based on Unreal Engine 3
ESET Reasearch.
Looking for job

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionHow can I save the video recording in a file?membershahzebrahat29 Apr '13 - 11:31 
can I record video of few seconds in a file ?
QuestionThanksmemberEnsamblador5 Sep '12 - 21:42 
This is exactly what i was searching, i had allready installed windows sdk, and then i just did copy-paste and it worked. Ready to start robotics vision ideas.
GeneralNicemvpEspen Harlinn4 Sep '12 - 10:56 
Good idea Big Grin | :-D
Espen Harlinn
Principal Architect, Software - Goodtech Projects & Services AS

Whenever methodologies become productized, objectivity is removed from the equation. -- Mike Myatt

Questiongreate article, no source download.memberSpoolrd1 Aug '12 - 22:26 
this is a great article!.
Questioncapturing avi problemmemberhoseinhero25 Jul '12 - 9:25 
hey Ladislav. I have been tried VFW capture, but it dosen't support my webcam driver. and in directshow I have trouble when i want to save webcam data to an avi file. I tried your first sample but I can't save data to disk. Would you please show me how can I save it to disk.
AnswerRe: capturing avi problem [modified]membermerano8 Apr '13 - 13:48 
You wanted to save as AVI; what Container exactly ?
 
Here is a (untested) Example for saving as BMP Image:
 
HRESULT DoRenderSample(IMediaSample *sample)
{
  BYTE* data; sample->GetPointer( &data ); 
  // Process RGB Frame data* here. For Example: ZeroMemory(data+w*h,w*h);
  bmih.bisize = sizeof(BITMAPINFOHEADER);  
  // bmih.biBitCount = 24;
  // bmih.biPlanes = 1;
  // bmih.biCompression = BI_RGB;
  bmih.biSizeImage = ((((bmih.biWidth * bmih.biBitCount) + 31) & ~31) >> 3) * bmih.biHeight;
 
  int nBitsOffset = sizeof(BITMAPFILEHEADER) + bmih.biSize; 
  LONG lImageSize = bmih.biSizeImage;
  LONG lFileSize  = nBitsOffset + lImageSize;
  BITMAPFILEHEADER bmfh;
  bmfh.bfType    = 'B'+('M'<<8);
  bmfh.bfOffBits = nBitsOffset;
  bmfh.bfSize    = lFileSize;
  bmfh.bfReserved1 = bmfh.bfReserved2 = 0;
  //Create a new file for writing
  FILE *pFile = fopen(szPathName, "wb");
  //Write the bitmap file header
  UINT nWrittenFileHeaderSize = fwrite(&bmfh, 1, sizeof(BITMAPFILEHEADER), pFile);
  //And then the bitmap info header
  UINT nWrittenInfoHeaderSize = fwrite(&BMIH, 1, sizeof(BITMAPINFOHEADER), pFile);
  //Finally, write the image data itself 
  //-- the data represents our drawing
  UINT nWrittenDIBDataSize = fwrite(m_pDrawingSurfaceBits, 1, lImageSize, pFile);
  fclose(pFile);
 
see:
Saving a Drawing to a Bitmap File[^]
 

I think you can save BMP-Data as (uncompressed RGB) Video.
 
A simple interface to the Video for Windows API for creating AVI movies from individual images[^]
 
Another way is to use a Framework like OpenCV.
 
see also: http://msdn.microsoft.com/en-us/library/dd377618%28v=VS.85%29.aspx[^]

modified 8 Apr '13 - 20:27.

Questioncan't use the codemembertedysoegiantoarsenal3 Jul '12 - 19:32 
hello.. i'm trying to use your code(New method not requiring DirectShow SDK) by copying it to visual studio 2005(i started using win32 project) . but when i run it, it has so many errors.. can you help me to solve this problem?
Questionwhere Receive() and Receive_() arguments come from ? [modified]memberdsant25 Apr '12 - 2:22 
Hello,
I can't find where Receive() and Receive_() second argument value "IMediaSample * smp" come from ? Or, it would answer too, where Receive() is called ?
Any Idea ?

modified 26 Apr '12 - 5:51.

GeneralMy vote of 5memberdsant25 Apr '12 - 2:07 
What I was looking for one week ! Same remarks as you : a lot of VFW examples but VFW dead, compilation not possible using last Visual Studio Express using the other people code...
I am very happy !
GeneralMy vote of 5membercoronafire12 Dec '11 - 14:21 
Clear simple code, love it
GeneralMy vote of 5memberD Greene9 Dec '11 - 15:51 
Very helpful starting point.
Question2 displaysmemberMember 83193911 Dec '11 - 4:04 
Thank you very much for great example! Tons of Microsoft examples can be deleted forever!
I need you advise now. If I try to display video over 2 monitors (in "Extended Desktop" mode under XP) I can see video only on the first monitor and the second one displays just black window. I've found that this problem is related to DirectShow, and again there is no clear solution from MS and lots of redundant and strange examples in the net... What would be the best way to solve this problem?
Thanks a lot
GeneralMy vote of 5memberliftmike23 Jul '11 - 1:41 
Excellent Example! I wonder from where you got the "6" in DsHook(mem,6,Receive); Is there a doc on pointer tables?
GeneralMy vote of 5membermerano7 Mar '11 - 7:30 
One of the best working samples for framegrabbing
GeneralMy vote of 5membermr colorbooks23 Feb '11 - 14:09 
impressive works! thanks.
GeneralMy vote of 5memberforgi00713 Jan '11 - 4:36 
I was about to write the same stuff, you have saved me a lot of time, thank you!
forgi
GeneralIt got slow when get buffer datamemberstonechao28 Dec '10 - 20:49 
Thanks for the great example!
However, I have a small question about getting frame data.
I found that it getting quite slow if I copy buf to a array.
even I use memcpy, it still affect the preview performance a lot.
It may because by the manipulations to buf.
But It works great if I write the value to buf. writting value to buf won't affect the performance.
 
Would you please give me a hint of the issue?
Thansk a lot.
 
HRESULT __stdcall Receive ( void* inst, IMediaSample *smp ) {
BYTE* buf;
smp->GetPointer(&buf);
DWORD len = smp->GetActualDataLength();
 
clock_t start(clock());
clock_t finish(clock());
int diff1=0;
int sheight = 480;
int swidth = 640;
start = clock();
memcpy(imgdata, buf, 307200); // this line will cause the preview slow
// However, this following part won't slow down the previewing
/*
int i;
for (i=0; i<200000; i++);
{
buf[i]=100; // this part won't slow down the previewing
}
*/
 
finish = clock();
diff1= finish - start;
HRESULT ret = Receive_ ( inst, smp );
return ret;
}
GeneralRe: It got slow when get buffer datamemberodedelyada19 Feb '11 - 6:57 
Reading is slow because the code is taking the frame bits from the Frame Renderer, that buffer is on the GPU.
 
In order to make it work faster, you need to connect to an earlier filter with a buffer in the RAM.
 
However, earlier filters sometimes are in wierd formats so iterating backwards on the graph might be difficult.
 
My solution was adding a pass through filter (in my case, DMO color control) to the graph, disconnecting the old connection between the Frame Renderer and the filter before it. Finally, I connected the new filter to the pins I just freed.
 
On my machine the new filter flipped the image vertically, so I flipped it back.
 
Performance turned into a non-issue.
 
Oded.
GeneralDownload code and example programmembervanwoudenberg21 Dec '10 - 3:02 
The code looks wonderful. How can I download the attachted file (zip) and executable?
 
Best regards!
GeneralReal time videomemberMiriV10 Nov '10 - 6:22 
Hello
This is very useful article, Thanks
I need to do in c#.
Do you have the same sample in c#
 

Miri
QuestioncapAVI and clipboard?memberVaclav_Sal5 Oct '10 - 6:14 
I am looking for an expert on using capAVI and clipboard.
I am having a heck of a time trying to retrieve individual pixels from a bitmap.THe problem is that AVI "format" taken from the clipboard is CF_DIB, however, the bits are packed in RGB tripplets.
Of course true DIB wants RGBQUAD! I have a kluge using BITMAP to get the bits and the BITMNAPINFO to get the bitmap dimesions.
I am open for better suggestions.
Thanks for reading.
Vaclav
GeneralFormat of datamemberragnor129 Aug '10 - 3:59 
Hi, this is great article, thanks.
 
I have two questions:
1. what is the format of image video data?
2. how to get image resolution (width x height) within "Receive" function?
 
Thanks for answer
GeneralMe toomemberdsant25 Apr '12 - 23:38 
I am looking for the format data too. Have you found ?
QuestionHow to set the frame rate?membercb714916 Aug '10 - 12:21 
Great sample- nice and concise! The issue I'm having is avi plaback is way too fast. It's as though the frame rate info is being ignored. Can anyone show how to get & set the frame rate?
QuestionSmall question: how to hide, close, or not to create preview window?memberpvagin8 May '10 - 11:14 
I want to capture data but how to make this without creating window with captured video.
And one question, there is button on camera, and when pressing this button during the capturing data using the code above, new window of Microsoft Snapshot utiliy appears with single captured frame.
So how to add some hook to this call and capture this single frame also.
Thanks.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130523.1 | Last Updated 6 Apr 2010
Article Copyright 2006 by Ladislav Nevery
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid