Introduction
After starting to use the Win8-Desktop I found that some old technologies do not work well especially DirectShow.
For instance, capturing of live video from web-camera by DirectShow works perfectly on WinXP, Vista, Win7, and allow to get the specific resolution.
For example from Microsoft Life Studio Web-Camera I can got video with 1080p. However, on Win8-Desktop I can get only
a 640x480 video. The fact is that
the function in the line code, which on Win7 returns HRESULT -
S_OK returns FAILED on Win8-Desktop. After reading info on MSDN,
I have got an idea that Microsoft will have reason to stop support for DirectShow and expand another technology - Media Foundation.
I found some info about supporting of capturing of video from a web-camera by Media Foundation with the needed parameters, but this info is very distributed.
I think, it would be useful to have only one C++ class which includes all procedures of initialization, but hides all of them and has
a simple interface. I made it and present it in this article.
Background
I lead the project of Augment Reality and I need simple support for capturing video from web-camera. I used the simple library videoInput from the website
http://muonics.net/school/spring05/videoInput/ which uses DirectShow for this purpose. However, it was not working well on
a Win8-Desktop. I found that there is a problem with setting the resolution of the capturing of the video. I found the solution to this problem by using Media Foundation,
but my project used videoInput and I thought that it would be useful to create a new library with the same interface as videoInput, but which uses Media Foundation.
So I got the needed purpose, and I think that my new library would be useful for other people who have faced the same problem in the process of development
of the program of image recognition.
Using the code
The library videoInput was written in Visual Studio 2012 videoInputVS2012.zip (static library
videoInput-staticlib-VS2012x86.zip) and includes nine classes:
videoInput - class-interface. For using this library
it is enough to include videoInput.h and videoInput .lib in your project. This class is made as
a singleton which makes managing of resources easy.
Media_Foundation - is a class singleton which manages the allocation and realizing of resources of Media Foundation.
videoDevices - is a class singleton
which manages allocation and realizing of video devices and access to the separate video device.
videoDevice - is the class for manipulation of capturing
of video device, getting raw data, checking a new frame, getting supported resolutions, setting
the needed resolution, closing the video device.
ImageGrabberThread - is the class
for manipulation of thread of the grabbing of the image.
ImageGrabber - is the class for initialization and grabbing images from
the video device. It controls the process
of grabbing and finishes it.
RawImage - is the temp class which contains for writing and reading one frame.
FormatReading - is class for reading info about
supported resolution into customer's MediaType.
DebugPrintOut - is the class for printing text into console. It is enough to use
the file videoInput.h as the interface of the library.
Listing of it is presented below.
#pragma once
#include <guiddef.h>
struct IMFMediaSource;
struct MediaType
{
unsigned int MF_MT_FRAME_SIZE;
unsigned int height;
unsigned int width;
unsigned int MF_MT_YUV_MATRIX;
unsigned int MF_MT_VIDEO_LIGHTING;
unsigned int MF_MT_DEFAULT_STRIDE;
unsigned int MF_MT_VIDEO_CHROMA_SITING;
GUID MF_MT_AM_FORMAT_TYPE;
wchar_t *pMF_MT_AM_FORMAT_TYPEName;
unsigned int MF_MT_FIXED_SIZE_SAMPLES;
unsigned int MF_MT_VIDEO_NOMINAL_RANGE;
unsigned int MF_MT_FRAME_RATE;
unsigned int MF_MT_FRAME_RATE_low;
unsigned int MF_MT_PIXEL_ASPECT_RATIO;
unsigned int MF_MT_PIXEL_ASPECT_RATIO_low;
unsigned int MF_MT_ALL_SAMPLES_INDEPENDENT;
unsigned int MF_MT_FRAME_RATE_RANGE_MIN;
unsigned int MF_MT_FRAME_RATE_RANGE_MIN_low;
unsigned int MF_MT_SAMPLE_SIZE;
unsigned int MF_MT_VIDEO_PRIMARIES;
unsigned int MF_MT_INTERLACE_MODE;
unsigned int MF_MT_FRAME_RATE_RANGE_MAX;
unsigned int MF_MT_FRAME_RATE_RANGE_MAX_low;
GUID MF_MT_MAJOR_TYPE;
wchar_t *pMF_MT_MAJOR_TYPEName;
GUID MF_MT_SUBTYPE;
wchar_t *pMF_MT_SUBTYPEName;
MediaType();
~MediaType();
void Clear();
};
struct Parametr
{
long CurrentValue;
long Min;
long Max;
long Step;
long Default;
long Flag;
Parametr();
};
struct CamParametrs
{
Parametr Brightness;
Parametr Contrast;
Parametr Hue;
Parametr Saturation;
Parametr Sharpness;
Parametr Gamma;
Parametr ColorEnable;
Parametr WhiteBalance;
Parametr BacklightCompensation;
Parametr Gain;
Parametr Pan;
Parametr Tilt;
Parametr Roll;
Parametr Zoom;
Parametr Exposure;
Parametr Iris;
Parametr Focus;
};
class videoInput
{
public:
virtual ~videoInput(void);
static videoInput& getInstance();
void closeDevice(unsigned int deviceID);
void setEmergencyStopEvent(unsigned int deviceID, void *userData, void(*func)(int, void *));
void closeAllDevices();
CamParametrs getParametrs(unsigned int deviceID);
void setParametrs(unsigned int deviceID, CamParametrs parametrs);
unsigned int listDevices(bool silent = false);
unsigned int getCountFormats(unsigned int deviceID);
unsigned int getWidth(unsigned int deviceID);
unsigned int getHeight(unsigned int deviceID);
wchar_t *getNameVideoDevice(unsigned int deviceID);
IMFMediaSource *getMediaSource(unsigned int deviceID);
MediaType getFormat(unsigned int deviceID, int unsigned id);
bool isDevicesAcceable();
bool isDeviceSetup(unsigned int deviceID);
bool isDeviceMediaSource(unsigned int deviceID);
bool isDeviceRawDataSource(unsigned int deviceID);
void setVerbose(bool state);
bool setupDevice(unsigned int deviceID, unsigned int id = 0);
bool setupDevice(unsigned int deviceID, unsigned int w,
unsigned int h, unsigned int idealFramerate = 30);
bool isFrameNew(unsigned int deviceID);
bool getPixels(unsigned int deviceID, unsigned char * pixels,
bool flipRedAndBlue = false, bool flipImage = false);
private:
bool accessToDevices;
videoInput(void);
void processPixels(unsigned char * src, unsigned char * dst, unsigned int width,
unsigned int height, unsigned int bpp, bool bRGB, bool bFlip);
void updateListOfDevices();
};
This class can be used in one of two modes - RawData grabbing and MediaSource. If using only
the first mode there is no need to include the headers of Media Foundation
and its libraries. In this case, the interface IMFMediaSource in
the method IMFMediaSource *getMediaSource(unsigned int deviceID)
returns NULL and is predefined in videoInput.h. In the second mode you can use
the mentioned method and use it in your application and normal source of media data from
the web-camera.
The next listing shows how to use videoInput in case of getting raw data of
the frame. This example uses the OpenCV framework for presenting
live video (this code TestVideoInputVS2012x86.zip). This framework has
its own function for capturing web-camera,
but is based on DirectShow and on Win8-Desktop it has the mentioned problem. This example
is presented on the next listing:
#include "stdafx.h"
#include "videoInput.h"
#include "highgui.h"
#pragma comment(lib, "lib\\opencv\\Release\\opencv_highgui242.lib")
#pragma comment(lib, "lib\\opencv\\Release\\opencv_core242.lib")
#pragma comment(lib, "videoInput.lib")
void StopEvent(int deviceID, void *userData)
{
videoInput *VI = &videoInput::getInstance();
VI->closeDevice(deviceID);
}
int _tmain(int argc, _TCHAR* argv[])
{
videoInput *VI = &videoInput::getInstance();
int i = VI->listDevices();
if(i > 0)
{
if(VI->setupDevice(i-1, 640, 480, 60))
{
VI->setEmergencyStopEvent(i - 1, NULL, StopEvent);
if(VI->isFrameNew(i-1))
{
int countLeftFrames = 0;
cvNamedWindow("VideoTest", CV_WINDOW_AUTOSIZE);
CvSize size = cvSize(VI->getWidth(i-1), VI->getHeight(i-1));
IplImage* frame;
frame = cvCreateImage(size, 8,3);
while(1)
{
if(VI->isFrameNew(i-1))
{
VI->getPixels(i - 1, (unsigned char *)frame->imageData);
cvShowImage("VideoTest", frame);
countLeftFrames = 0;
}
else
countLeftFrames++;
char c = cvWaitKey(33);
if(c == 27)
break;
if(c == 49)
{
CamParametrs CP = VI->getParametrs(i-1);
CP.Brightness.CurrentValue = 128;
CP.Brightness.Flag = 1;
VI->setParametrs(i - 1, CP);
}
if(!VI->isDeviceSetup(i - 1))
{
break;
}
if(countLeftFrames > 60)
break;
}
VI->closeDevice(i - 1);
cvDestroyWindow("VideoTest");
}
}
}
if(VI->setupDevice(i-1, 1920, 1080, 60))
{
if(VI->isFrameNew(i-1))
{
int countLeftFrames = 0;
cvNamedWindow("VideoTest1", CV_WINDOW_AUTOSIZE);
CvSize size = cvSize(VI->getWidth(i-1), VI->getHeight(i-1));
IplImage* frame;
frame = cvCreateImage(size, 8,3);
while(1)
{
if(VI->isFrameNew(i-1))
{
VI->getPixels(i - 1, (unsigned char *)frame->imageData,false);
cvShowImage("VideoTest1", frame);
countLeftFrames = 0;
}
else
countLeftFrames++;
char c = cvWaitKey(33);
if(c == 27)
break;
if(!VI->isDeviceSetup(i - 1))
{
break;
}
if(countLeftFrames > 60)
break;
}
VI->closeDevice(i - 1);
cvDestroyWindow("VideoTest1");
}
}
return 0;
}
In this code the pointer on class videoInput can be got by calling
the method videoInput::getInstance(). Before using camera it needs to get
the list of suitable devices using the function VI->listDevices().
The device is initialized by calling the method VI->setupDevice(i-1, 640, 480, 60). There are two overloaded methods
setupDevice - setting the desired resolution and frames per second, and setting the number of needed type output.
The first method finds the existent MediaType with the needed parameters, or uses
the default type with number 0. Grabbing images from MediaSource starts by first calling VI->isFrameNew(i-1). After calling this method the raw data can be gotten by
the method
VI->getPixels(i - 1, (unsigned char *)frame->imageData,false). Parameters of
the video camera can be got by calling the method VI->getParametrs(i-1).
The new parameters can be set by the method VI->setParametrs(i - 1, CP).
The method of closing of the device VI->closeDevice(i - 1)
stops the thread of grabbing and releases the context of the video device. The example shows fast using, stopping, and reusing
of the same video device. The global function
StopEvent(int deviceID, void *userData) is used as a callback function in
the method VI->setEmergencyStopEvent(i - 1, NULL, StopEvent).
This function is called in the case of unexpected stopping - e.g., removing web-camera from the USB socket.
The second example is based on the SimpleCapture example from the Windows SDK (this code -SimpleCaptureVS2012x86.zip,
application - SimpleCaptureVS2012x86-exe.zip). This example is too big for listing, but I can describe several differences
from the original one. Firstly, I removed all the original linking for the web-camera and set
the videoInput library.

Secondly, I included the the second dialog for choosing suitable resolution from the list of supported Media Types. It is important to mention that the interface
IMFMediaSource ought not be stopped manually. It is released by
calling the function closeDevice(unsigned int deviceID).

Points of Interest
I have spent much time on searching for suitable information on the Microsoft
website for developers and I have not gotten help from experts in that site. And I was not
alone in searching for a solution for this problem. I was surprised that the problem
of using a web-camera with Media Foundation was not presented,
and I hope that my article will become a useful contribution on this site.