Introduction
All my life, I have been possessed with anything that has something to do
with robots and computer vision. So lately, I have been looking for some simple
code to read images from movies in any format and from the webcam. Here, I want
to share my experiences in this API jungle with you. Also, please consider
voting using the bar bellow so I get an idea if this information was useful to
anyone.
Background
As hard as I was looking, all I found were samples for the VFW interface,
which is a practically dead API with a lot of issues. First of all, it's only
for AVI files, and most of the codecs (like divx) just don't work in it anyway
(those who have tried it knows), and also in some resolutions, on most webcams,
it just crashes. There are a few samples for the newer DirectShow API but all
the important code is just buried in the zillion unimportant wrappers/support
classes/files. Not to mention that the informational value of the samples
themselves is lost in often redundant, error handling jungle. And finally, they
are based on using/compiling additional MS DirectShow filter samples, like
SampleGrabber or its bug fixed version, GrabberSample, making the whole approach
non-portable between the SDK versions. They also seem just disappeared from the
last SDK and the Web itself. So here is what I think is a much simpler/versatile
alternative to those.
New method not requiring DirectShow SDK
Lately I added to article this new method because a lot of people had problems compiling DirectShow SDK related stuff. Most of them just need to get the RGB data and forget the whole DirectShow thing so I got an idea. If we know how data flows inside directshow graph why don't we just redirect particular method to grab data ?
Now in Directshow data can flow either via:
Push method: Where FILTER1->out pin pushes data to in<-FILTER2 via Receive on his IMemInputPin interface (our Renderer case).
Pull method: Where second filter in<-FILTER2 asks first source FILTER1->out for more data via SyncReadAlligned method of his IAsyncReader interface.
In our case (last filter renderer) we capture data of Push method. you can find sample capturing Pull mode in my other article Fun with Google TTS.
I tested Following code on VisualStudio 2008 and 2010 and it compiled without DirectShow SDK just fine. Also remember that you will probably need to convert received data to RGB if you wish but it's trivial.
The code
#include <windows.h>
#include <dshow.h>
#pragma comment(lib,"Strmiids.lib")
#define DsHook(a,b,c) if (!c##_) { INT_PTR* p=b+*(INT_PTR**)a; VirtualProtect(&c##_,4,PAGE_EXECUTE_READWRITE,&no);\
*(INT_PTR*)&c##_=*p; VirtualProtect(p, 4,PAGE_EXECUTE_READWRITE,&no); *p=(INT_PTR)c; }
HRESULT ( __stdcall * Receive_ ) ( void* inst, IMediaSample *smp ) ;
HRESULT __stdcall Receive ( void* inst, IMediaSample *smp ) {
BYTE* buf; smp->GetPointer(&buf); DWORD len = smp->GetActualDataLength();
HRESULT ret = Receive_ ( inst, smp );
return ret;
}
int WINAPI WinMain(HINSTANCE inst,HINSTANCE prev,LPSTR cmd,int show){
HRESULT hr = CoInitialize(0); MSG msg={0}; DWORD no;
IGraphBuilder* graph= 0; hr = CoCreateInstance( CLSID_FilterGraph, 0, CLSCTX_INPROC,IID_IGraphBuilder, (void **)&graph );
IMediaControl* ctrl = 0; hr = graph->QueryInterface( IID_IMediaControl, (void **)&ctrl );
ICreateDevEnum* devs = 0; hr = CoCreateInstance (CLSID_SystemDeviceEnum, 0, CLSCTX_INPROC, IID_ICreateDevEnum, (void **) &devs);
IEnumMoniker* cams = 0; hr = devs?devs->CreateClassEnumerator (CLSID_VideoInputDeviceCategory, &cams, 0):0;
IMoniker* mon = 0; hr = cams->Next (1,&mon,0); IBaseFilter* cam = 0; hr = mon->BindToObject(0,0,IID_IBaseFilter, (void**)&cam);
hr = graph->AddFilter(cam, L"Capture Source"); IEnumPins* pins = 0; hr = cam?cam->EnumPins(&pins):0; IPin* pin = 0; hr = pins?pins->Next(1,&pin, 0):0; hr = graph->Render(pin); IEnumFilters* fil = 0; hr = graph->EnumFilters(&fil); IBaseFilter* rnd = 0; hr = fil->Next(1,&rnd,0); hr = rnd->EnumPins(&pins); hr = pins->Next(1,&pin, 0); IMemInputPin* mem = 0; hr = pin->QueryInterface(IID_IMemInputPin,(void**)&mem);
DsHook(mem,6,Receive);
hr = ctrl->Run();
while ( GetMessage( &msg, 0, 0, 0 ) ) {
TranslateMessage( &msg );
DispatchMessage( &msg );
}
};
Changing WebCam resolution
Optionally you can add following code to set resolution right after you get capture device output pin. [thx to LightWing
in forum below for finding it ;) ]
...
IPin* pin = 0; hr = pins?pins->Next(1,&pin, 0):0;
IAMStreamConfig* cfg = 0; hr = pin->QueryInterface( IID_IAMStreamConfig, (void **)&cfg); int sz,max_res = 0; hr = cfg->GetNumberOfCapabilities(&max_res, &sz); VIDEO_STREAM_CONFIG_CAPS cap[2]; AM_MEDIA_TYPE* fmt = 0; hr = cfg->GetStreamCaps(max_res-1, &fmt, (BYTE*)cap); hr = cfg->SetFormat(fmt); ...
Previous method requiring DirectShow SDK:
Starting with DirectShow
Most users who for the first time try to compile any DirectShow sample from
web will spend 90% of time searching for the SDK itself (as for now it has been
moved to the Platform SDK from the DirectX SDK, so be sure to get latest) and
then spend the rest of the day solving the undefined variables (add
YOUR_PLATFORMSDK_DIR\Samples\Multimedia\DirectShow\BaseClasses to the
compiler include paths) and unresolved externals (add strmiids.lib,
winmm.lib, and strmbasd.lib (strmbase.lib for the release
build) to the linker input). But the latter two libs are not redistributed by
Microsoft and must usually be built manually (just and run vcvars32 and
nmake (nmake nodebug=1 for release lib) in the include directory
mentioned above and copy there the libs from the subdirectories created by
nmake).
Using the code
So here is the sample. Just create an empty Windows
C++ application in whatever development environment you have (Disable UNICODE in VC++ or you get linking errors), and copy/paste it
there. I stripped it to a bare minimum to run, so please forgive its compressed
formatting for the unimportant code parts. And as usual, you can find
information on any used API simply by searching for its name in Google and using
the first found (usually MSDN) page.
#include <windows.h>
#include <dshow.h>
#include <streams.h>
int w,h; HWND hwnd; MSG msg = {0}; BITMAPINFOHEADER bmih={0};
struct __declspec( uuid("{71771540-2017-11cf-ae26-0020afd79767}") ) CLSID_Sampler;
struct Sampler : public CBaseVideoRenderer {
Sampler( IUnknown* unk, HRESULT *hr ) : CBaseVideoRenderer(__uuidof(CLSID_Sampler), NAME("Frame Sampler"), unk, hr) {};
HRESULT CheckMediaType(const CMediaType *media ) {
VIDEOINFO* vi; if(!IsEqualGUID( *media->Subtype(), MEDIASUBTYPE_RGB24) || !(vi=(VIDEOINFO *)media->Format()) ) return E_FAIL;
bmih=vi->bmiHeader; SetWindowPos(hwnd,0,0,0,20+(w=vi->bmiHeader.biWidth),60+(h=vi->bmiHeader.biHeight),SWP_NOZORDER|SWP_NOMOVE);
return S_OK;
}
HRESULT DoRenderSample(IMediaSample *sample){
BYTE* data; sample->GetPointer( &data );
BITMAPINFO bmi={0}; bmi.bmiHeader=bmih; RECT r; GetClientRect( hwnd, &r );
HDC dc=GetDC(hwnd);
StretchDIBits(dc,0,16,r.right,r.bottom-16,0,0,w,h,data,&bmi,DIB_RGB_COLORS,SRCCOPY);
ReleaseDC(dc);
return S_OK;
}
HRESULT ShouldDrawSampleNow(IMediaSample *sample, REFERENCE_TIME *start, REFERENCE_TIME *stop) {
return S_OK; }
};
int WINAPI WinMain(HINSTANCE inst,HINSTANCE prev,LPSTR cmd,int show){
HRESULT hr = CoInitialize(0);
hwnd=CreateWindowEx(0,"LISTBOX",0,WS_SIZEBOX,0,0,0,0,0,0,0,0);
IGraphBuilder* graph= 0; hr = CoCreateInstance( CLSID_FilterGraph, 0, CLSCTX_INPROC,IID_IGraphBuilder, (void **)&graph );
IMediaControl* ctrl = 0; hr = graph->QueryInterface( IID_IMediaControl, (void **)&ctrl );
Sampler* sampler = new Sampler(0,&hr);
IPin* rnd = 0; hr = sampler->FindPin(L"In", &rnd);
hr = graph->AddFilter((IBaseFilter*)sampler, L"Sampler");
ICreateDevEnum* devs = 0; hr = CoCreateInstance (CLSID_SystemDeviceEnum, 0, CLSCTX_INPROC, IID_ICreateDevEnum, (void **) &devs);
IEnumMoniker* cams = 0; hr = devs?devs->CreateClassEnumerator (CLSID_VideoInputDeviceCategory, &cams, 0):0;
IMoniker* mon = 0; hr = cams?cams->Next (1, &mon, 0):0;
IBaseFilter* cam = 0; hr = mon?mon->BindToObject(0,0,IID_IBaseFilter, (void**)&cam):0;
IEnumPins* pins = 0; hr = cam?cam->EnumPins(&pins):0;
IPin* cap = 0; hr = pins?pins->Next(1,&cap, 0):0;
hr = graph->AddFilter(cam, L"Capture Source");
IBaseFilter* vid = 0; hr = graph->AddSourceFilter (L"c:\\Windows\\clock.avi", L"File Source", &vid);
IPin* avi = 0; hr = vid?vid->FindPin(L"Output", &avi):0;
hr = graph->Connect(cap?cap:avi,rnd);
hr = graph->Render( cap?cap:avi );
hr = ctrl->Run();
SendMessage(hwnd, LB_ADDSTRING, 0, (long)(cap?"Capture source ...":"File source ..."));
ShowWindow(hwnd,SW_SHOW);
while ( msg.message != WM_QUIT) {
if( PeekMessage( &msg, 0, 0, 0, PM_REMOVE ) ) {
TranslateMessage( &msg );
DispatchMessage( &msg );
if(msg.message == WM_KEYDOWN && msg.wParam==VK_ESCAPE ) break;
} Sleep(30);
}
};
Points of interest
The most important lines are in DoRenderSample since there you
get your image in RGB format in every frame. The sample itself tries to find and
connects to the first found pin on the first found video capturing device. And
if none is found, then it tries to open and run a video file from the provided
path. So in fact, these are two tutorials in one. The cool side-feature is that
you can use the Alt+PrintScreen key to grab a screenshot from (any?) movie/
webcam and paste/save it to your favorite image editor.
Place for enhancements
Consider using some form of smart pointers like
CComPtr<IGRAPHBUILDER> instead of raw pointers to interfaces
like IGraphBuilder*. I removed them to keep the sample buildable by
people who don't have the ATL libs. Example code to have more control over the
video playback (auto-rewind in this case).
...
IMediaControl* ctrl = 0; hr = graph->QueryInterface( IID_IMediaControl, (void **)&ctrl );
IMediaSeeking* seek = 0; hr = graph->QueryInterface( IID_IMediaSeeking, (void **)&seek );
IMediaEvent* event = 0; hr = graph->QueryInterface( IID_IMediaEventEx, (void **)&event );
...
while ( msg.message != WM_QUIT) {
long code, a, b; event->GetEvent(&code, &a, &b, 0);
if( code == EC_COMPLETE ) {
LONGLONG pos=0;
hr=seek->SetPositions(&pos,AM_SEEKING_AbsolutePositioning,0,0);
hr=ctrl->Run();
}
...
You may add this code to inspect (connect to a remote graph) the whole render
graph, graphically, in graphedit (to debug the capture input pins, for
example):
WCHAR wsz[256]; (void)StringCchPrintfW(wsz, NUMELMS(wsz),L"FilterGraph %08x pid %08x\0", (DWORD_PTR) 0, GetCurrentProcessId());
...
IGraphBuilder* graph = 0; hr = CoCreateInstance( CLSID_FilterGraph, 0, CLSCTX_INPROC,IID_IGraphBuilder, (void **)&graph );
IMoniker* moniker = 0; hr = CreateItemMoniker(L"!", wsz, &moniker); DWORD regno = 0xfedcba98;
IRunningObjectTable* gedit = 0; hr = GetRunningObjectTable(0, &gedit);
hr = gedit->Register(ROTFLAGS_REGISTRATIONKEEPSALIVE, graph, moniker, ®no);
Some valuable tips from the Forum users.
Unresolved externals: Make sure you did build libs mentioned in
article.
VS6: Get Platform SDK and don't forget to set standard SDK include /
lib paths
VS2005: unresolved externals
Go to "Configuration Properties"
-> "General" -> "Character Set"
By default, it is set to "Use Unicode
Character Set". Change it to "Not Set" and the project should link.
So enjoy it, and don't forget that most of the advanced error handling etc.
is up to you to implement, and this is just very basic sample so you can focus
on how it works.