About one year ago, I wrote a short article about using Media Foundation API for capturing live-video from web-camera - old article. Many developers paid attention to the article and gave some advice about it. I was surprised that this code had been started to use in project OpenCV. It had inspired me to review the code and I found that it has poor quality of coding. As a result, I had decided to rewrite the project, but after some time I made a decision to write a new article, because the new code is different from the old one. I think that it would be better to have two different articles with two different solutions of the problem.
The previous article about using Media Foundation for capturing live-video from web-camera was developed in target to keep interface of the other library - videoInput. However, now I think that it would be better to use another interface for more flexible solution. I have decided to include into the solution options for setting image format of the uncompressed image data, interface for processing of removing video device by object way, setting for synchronised and asynchronised reading mode, encapsulation requests to the library into object instances and result-code.
For development of the most stable code, I have decided to use Test-Driven Development strategy. As a result, I developed 45 tests for testing of classes and code. Many objects of the Media Foundation are tested many times in different ways. It allows to develop
more clear and stable code, which can be easily verified, and it includes into the project united code-result enumeration for checking result of execution of the code which has 51 items. Also I removed the thread which was used in the previous article and simplified the write-read conveyor, and it made code more stable.
The new project is very different from the previous one and I think it will help to resolve the problems of other developers.
Using the Code
videoInput is written on Visual Studio 2012 - videoInputVS2012-static.zip . It can be used like a static library and it is enough to include into the new project only videoInput.lib and videoInput.h. Also code of this project can be downloaded - videoInputVS2012-Source and it can be included into any project.
The project includes 15 classes and interfaces:
videoInput - is class-singleton. This class is made as a singleton which makes managing of resources easy.
MediaFoundation - is a class-singleton which manages the allocation and releasing of resources of Media Foundation. Almost all calls of Media Foundation functions are made from this class and results of these calls are controlled in this class.
VideoCaptureDeviceManager - is class-singleton which manages the allocation, access and releasing of the video devices.
VideoCaptureDevice - is a class which is inherited from
IUnknow interface. It allows to use smart pointer -
CComPtr, for control live-time of this class. This class allows control of the selected video capture device.
VideoCaptureSession - is a class which is inherited from
IMFAsyncCallback. This class is used for processing events which are generated by
IMFMediaSession. This class processes event of the video capture device and starts capture of video.
VideoCaptureSink - is a class which is inherited from
IMFSampleGrabberSinkCallback. This class is used for getting raw data and writing it into the buffer.
IWrite - is a class-interface for using different types of buffers by the only one way in
VideoCaptureSink for writing data into the buffers.
IRead - is a class-interface for using different types of buffers by the only one way in
VideoCaptureDevice for reading raw data from buffers.
IReadWriteBuffer - is a class-interface which is inherited from
IUnknown. This class is the base class for different types of buffers and can be used with a smart pointer
ReadWriteBufferRegularAsync - is a class which is inherited from
IReadWriteBuffer. This class uses
critical section for blocking access to writing-reading of data by user thread and Media Foundation inner thread. However, the user thread checks flag
readyToRead before entering into the
critical section, if it is not ready, the user thread goes out buffer without reading data. As a result, the user thread is not blocked by writing Media Foundation inner thread.
ReadWriteBufferRegularSync - is a class which is inherited from
IReadWriteBuffer. This class uses
critical section for blocking access to writing-reading of data by user thread and Media Foundation inner thread. However, the user thread is blocked by
WaitForSingleObject before entering into
critical section. As a result, the user thread waits the object event about 1 second from Media Foundation inner thread and when it comes the user thread starts reading from buffer, but if event does not come the user thread leaves the buffer with code state -
ReadWriteBufferFactory - is class-singleton which produces
FormatReader - is a class for reading data of
DebugPrintOut - is the class for printing text into console.
CComMassivPtr - is a template class for working with massive of objects with
IUnknow interface. This class allows to encapsulate calling function
SafeRelease() and control releasing of these objects.
It is enough to use the file videoInput.h
as the interface of the library. Listing of it is presented below:
using namespace std;
unsigned int MF_MT_FRAME_SIZE;
unsigned int height;
unsigned int width;
unsigned int MF_MT_YUV_MATRIX;
unsigned int MF_MT_VIDEO_LIGHTING;
unsigned int MF_MT_DEFAULT_STRIDE;
unsigned int MF_MT_VIDEO_CHROMA_SITING;
unsigned int MF_MT_FIXED_SIZE_SAMPLES;
unsigned int MF_MT_VIDEO_NOMINAL_RANGE;
unsigned int MF_MT_ALL_SAMPLES_INDEPENDENT;
unsigned int MF_MT_SAMPLE_SIZE;
unsigned int MF_MT_VIDEO_PRIMARIES;
unsigned int MF_MT_INTERLACE_MODE;
RGB24 = 0,
RGB32 = 1,
AYUV = 2
STOP = 0,
CAPTUREDEVICEREMOVED = 1
virtual void Invoke(StopCallbackEvent::CallbackEvent callbackEvent) = 0;
ASYNC = 0,
SYNC = 1
unsigned int indexStream;
unsigned int indexMediaType;
unsigned char *pPixels;
OK = 0,
UNKNOWN_ERROR = 1,
MEDIA_FOUNDATION_INITIALIZECOM_ERROR = 2,
MEDIA_FOUNDATION_INITIALIZEMF_ERROR = 3,
MEDIA_FOUNDATION_SHUTDOWN_ERROR = 4,
MEDIA_FOUNDATION_ENUMDEVICES_ERROR = 5,
MEDIA_FOUNDATION_CREATEATTRIBUTE_ERROR = 6,
MEDIA_FOUNDATION_READFRIENDLYNAME_ERROR = 7,
MEDIA_FOUNDATION_READSYMBOLICLINK_ERROR = 8,
MEDIA_FOUNDATION_GETDEVICE_ERROR = 9,
MEDIA_FOUNDATION_createPresentationDescriptor_ERROR = 10,
MEDIA_FOUNDATION_GETTHEAMOUNTOFSTREAMS_ERROR = 11,
MEDIA_FOUNDATION_GETSTREAMDESCRIPTORBYINDEX_ERROR = 12,
MEDIA_FOUNDATION_ENUMMEDIATYPE_ERROR = 13,
VIDEOCAPTUREDEVICEMANAGER_GETLISTOFDEVICES_ERROR = 14,
MEDIA_FOUNDATION_SETSYMBOLICLINK_ERROR = 15,
MEDIA_FOUNDATION_SETCURRENTMEDIATYPE_ERROR = 16,
MEDIA_FOUNDATION_GETCURRENTMEDIATYPE_ERROR = 17,
MEDIA_FOUNDATION_SELECTSTREAM_ERROR = 18,
MEDIA_FOUNDATION_CREATESESSION_ERROR = 19,
MEDIA_FOUNDATION_CREATEMEDIATYPE_ERROR = 20,
MEDIA_FOUNDATION_SETGUID_ERROR = 21,
MEDIA_FOUNDATION_SETUINT32_ERROR = 22,
MEDIA_FOUNDATION_CREATESAMPLERGRABBERSINKACTIVE_ERROR = 23,
MEDIA_FOUNDATION_CREATETOPOLOGY_ERROR = 24,
MEDIA_FOUNDATION_CREATETOPOLOGYNODE_ERROR = 25,
MEDIA_FOUNDATION_SETUNKNOWN_ERROR = 26,
MEDIA_FOUNDATION_SETOBJECT_ERROR = 27,
MEDIA_FOUNDATION_ADDNODE_ERROR = 28,
MEDIA_FOUNDATION_CONNECTOUTPUTNODE_ERROR = 29,
MEDIA_FOUNDATION_SETTOPOLOGY_ERROR = 30,
MEDIA_FOUNDATION_BEGINGETEVENT_ERROR = 31,
VIDEOCAPTUREDEVICEMANAGER_DEVICEISSETUPED = 32,
VIDEOCAPTUREDEVICEMANAGER_DEVICEISNOTSETUPED = 33,
VIDEOCAPTUREDEVICEMANAGER_DEVICESTART_ERROR = 34,
VIDEOCAPTUREDEVICE_DEVICESTART_ERROR = 35,
VIDEOCAPTUREDEVICEMANAGER_DEVICEISNOTSTARTED = 36,
VIDEOCAPTUREDEVICE_DEVICESTOP_ERROR = 37,
VIDEOCAPTURESESSION_INIT_ERROR = 38,
VIDEOCAPTUREDEVICE_DEVICESTOP_WAIT_TIMEOUT = 39,
VIDEOCAPTUREDEVICE_DEVICESTART_WAIT_TIMEOUT = 40,
READINGPIXELS_DONE = 41,
READINGPIXELS_REJECTED = 42,
READINGPIXELS_MEMORY_ISNOT_ALLOCATED = 43,
READINGPIXELS_REJECTED_TIMEOUT = 44,
VIDEOCAPTUREDEVICE_GETPARAMETRS_ERROR = 45,
VIDEOCAPTUREDEVICE_SETPARAMETRS_ERROR = 46,
VIDEOCAPTUREDEVICE_GETPARAMETRS_GETVIDEOPROCESSOR_ERROR = 47,
VIDEOCAPTUREDEVICE_GETPARAMETRS_GETVIDEOCONTROL_ERROR = 48,
VIDEOCAPTUREDEVICE_SETPARAMETRS_SETVIDEOCONTROL_ERROR = 49,
VIDEOCAPTUREDEVICE_SETPARAMETRS_SETVIDEOPROCESSOR_ERROR = 50
static videoInput& getInstance();
ResultCode::Result getListOfDevices(vector<Device> &listOfDevices);
ResultCode::Result setupDevice(DeviceSettings deviceSettings,
ResultCode::Result closeDevice(DeviceSettings deviceSettings);
ResultCode::Result readPixels(ReadSetting readSetting);
ResultCode::Result getParametrs(CamParametrsSetting ¶metrs);
ResultCode::Result setParametrs(CamParametrsSetting parametrs);
ResultCode::Result setVerbose(bool state);
videoInput& operator=(const videoInput&);
The interface of library has become simple and some work with raw data has become the duty of the developer. Methods of the
videoInput have the next purpose:
getListofDevices - Method for filling of the list of the active video devices. The fact is that in the old project, this list was filled at the time of initialisation and did not change. Now, calling this method will generate a new list of the active video devices and can be changed. Each device includes
symbolicLink and list of
friendlyName is used for presenting "readable" name of the device.
symbolicLink is the unique name for management of the device.
setupDevice - Method for setting up and starting capture from device. It has two arguments:
captureSettings. The first is used for setting device into the chosen mode by
indexMediaType. The second is used for setting capturing mode. It includes
videoFormat for choosing the format of the raw image data -
pIStopCallback is a pointer on the interface of callback class that is executed in the case of the removing video capture device;
readMode is enumeration of the type buffer with synchronised and asynchronised reading raw data.
closeDevice - Method is used for stopping and closing the selected device. This device is defined by
closeAllDevices - Method for closing all setup devices.
readPixels - Method for reading data from buffer into the pointer. The argument
symbolicLink for access to the device and
pPixels for keeping pointer on the used raw image data.
getParametrs - Method for getting parameters of video capture device.
setParametrs - Method for setting parameters of video capture device.
setVerbose - Method for setting mode of the printing information on console.
All these methods return item from
ResultCode::Result enumeration. It allows to control the result execution of each method and get information about almost all processes in the code.
I would like to mark three special features of this library:
- The resolution of the captured image is defined by selecting appropriate
MediaType of the device. In the old article, the resolution is set by manually. However, it leads to a mistake that it is possible to set any resolution. I have decided that the new solution should present more clearly the limits of the video capture devices.
- It is possible to chose one of three colour formats for captured image -
AYUV. It allows to select the appropriate format. Colour formats
RGB32 have the same appearance, but the second is more suitable for processing data on the modern processors with aligning(16).
- Reading from buffer can be done in one of two ways - synchrony and asynchrony. This choice is made by setting mode
readMode. In the mode
ReadMode::ASYNC reading from the buffer is not blocked and if the data is not ready, the method
readPixels() returns result-code
ResultCode::READINGPIXELS_REJECTED. In the mode
ReadMode::SYNC reading from the buffer is blocked and if the data is not ready, the method
readPixels() blocks the user thread for 1 second and after returns result-code
The next listing of code presents using this library with OpenCV for capturing live-video from web-camera.
#pragma comment(lib, "../Debug/videoInput.lib")
#pragma comment(lib, "lib/opencv_highgui248d.lib")
#pragma comment(lib, "lib/opencv_core248d.lib")
int _tmain(int argc, _TCHAR* argv)
using namespace std;
ResultCode::Result result = videoInput::getInstance().getListOfDevices(listOfDevices);
deviceSettings.symbolicLink = listOfDevices.symbolicName;
deviceSettings.indexStream = 0;
deviceSettings.indexMediaType = 0;
captureSettings.pIStopCallback = 0;
captureSettings.readMode = ReadMode::SYNC;
captureSettings.videoFormat = CaptureVideoFormat::RGB32;
MediaType MT = listOfDevices.listStream.listMediaType;
cvNamedWindow ("VideoTest", CV_WINDOW_AUTOSIZE);
CvSize size = cvSize(MT.width, MT.height);
frame = cvCreateImage(size, 8,4);
readSetting.symbolicLink = deviceSettings.symbolicLink;
readSetting.pPixels = (unsigned char *)frame->imageData;
result = videoInput::getInstance().setupDevice(deviceSettings, captureSettings);
ResultCode::Result readState = videoInput::getInstance().readPixels(readSetting);
if(readState == ResultCode::READINGPIXELS_DONE)
char c = cvWaitKey(33);
if(c == 27)
result = videoInput::getInstance().closeDevice(deviceSettings);
At the end, I would like to say that in this project, I removed the code for reordering colour BGR to RGB and vertical flipping. The fact is that these specific features of the Windows platform are compensated in many different ways - for example OpenCV library process in BGR colour order by default, DirectX and OpenGL support texturing BGR and RGB, and vertical flipping. I have decided that the developer can write the code for reordering colour and vertical flipping for own purpose.
I published the code of the project on the Git repository of this site and anyone can get a clone of it by using link videoInput
Points of Interest
I developed this project on the base of the TDD strategy. I think this will help other developers to resolve some problems with this code.
This project is situated on Git Repos CodeProject: videoInput.
This article is based on the old article.