Looking at today’s tendency of video surveillance systems, it can be easily noticed that popularity of IP based solutions is growing really fast. There are quite a lot of manufacturers, who provide great range of IP video cameras and video servers, meant to serve in a professional IP-enabled video surveillance system. More than that, many companies provide solutions aimed at converting current CCTV video surveillance systems to IP based systems, keeping current equipment and infrastructure. In addition to these companies who provide not just hardware solutions, but software as well, there are a lot of other companies who focus on the software part of IP video surveillance systems mostly, providing complete solutions for small or enterprise business, as well as for personal use.
In the article, I would like to share some of my experience in working with different IP video cameras and video servers from different manufacturers. The provided information is related mostly to accessing these cameras from your own application, which may be a simple application for your personal needs, or something more sophisticated and even close to some sort of video surveillance application.
As a demo application to the article, I am providing a C# application, which allows single camera viewing as well as multiple camera viewing simultaneously. The application allows simultaneous view of not only several cameras from a single video server, but allows many different cameras of different manufacturers at a time. The range of supported video sources by the application is:
- Continuously updating JPEG source;
- MJPEG (motion JPEG) streams;
- Some Axis cameras and video servers (205, 206, 2100, 2110, 2120, 2130R, 2400, 2401, 2460);
- D-Link cameras (JPEG support only);
- Panasonic cameras;
- PiXORD cameras;
- StarDot cameras (NetCam, Express 6);
- Local devices, which support the DirectShow interface;
- MMS (Microsoft Media Services) streams.
What are IP cameras ?
The main difference and advantage of IP cameras is that they provide output in digital form, and can be plugged directly to an Ethernet switch and accessed over an IP network. To achieve this, IP cameras not only have the camera, but also a small computer on board, which usually run embedded Linux. The purpose of this computer is to:
- convert an analogue image to a compressed digital image (some cameras/servers have an additional compression microprocessor in addition to the main CPU for the purpose);
- provide access to the image via IP network (usually these cameras run a web server, which provide the ability not only for accessing digital images, but also camera configuration information through the HTTP protocol).
Video servers are much more sophisticated devices, and usually come without cameras. Instead of this, they are equipped with several video input connectors (usually from 1 to 6), and the user may plug any analogue camera he would like to these connectors. Like IP cameras, video servers also convert image from analogue cameras to digital form and provide access to it through an IP network, but they also provide additional options for video archive creation (for this reason, video servers are equipped with hard drives).
The fact that IP cameras and servers can be accessed over an IP network is a great benefit. It allows monitoring not only from the actual location of these cameras, where you have specially equipped monitoring systems, it also allows doing it from any other IP-enabled point of the world using special video surveillance applications, like web browsers (see image below). And do it as from a usual workstation, or from PDAs and cell phones. The range of applications for IP-enabled video solutions is far away from just doing monitoring and storing video archives. The digital output of these cameras/servers allows them to be easily integrated with hundreds of applications:
- motion detection/tracking (in the whole video frame, or in specified areas of interest);
- traffic control and plate numbers recognition;
- people tracking with ability for person identification;
The simplest video format, which is supported by almost all IP video cameras/servers, cannot be even called a video format. The format is just usual JPEG. Most cameras allow retrieving a single image from them by accessing a special URL (should be documented by the camera manufacturer). For example, the following URL allows retrieving an image from an Axis camera: http://webcam.mmhk.cz/axis-cgi/jpg/image.cgi .
This approach has advantages and disadvantages. The disadvantage is that it is required to send a new HTTP request to the camera’s web server each time you need to retrieve a new image, which adds some speed loss because of the extra data (HTTP headers) being sent/received. The advantage is that a monitoring application can easily control the maximum frame rate on its own – it will access the URL to get the next frame with any arbitrary speed (once per minute or 15 times per second, if the network and camera speed allows it).
The second popular format is MJPEG (Motion JPEG). This format allows to download not just a single JPEG image, but a stream of JPEGs. As in the case of the normal JPEG format, the client application does an HTTP request to a special camera’s URL, like this one: http://220.127.116.11/axis-cgi/mjpg/video.cgi . But the camera replies to this request with not just a single JPEG, but with a stream of JPEGs delimited with a special delimiter, which is specified in one of the HTTP headers. When the client application does not want to receive video data any more, it closes the connection with the camera.
The MJPEG approach seems to be much better, because it has one obvious advantage - it requires sending an HTTP request only once and then receiving JPEG from the camera continuously. But in this approach, you cannot control the frame rate so easily. Accessing such an MJPEG URL, your camera will feed you with some sort of predefined frame rate. In case you would like to change it, you will need to add some extra parameters to the URL. This sounds not so problematic, but in reality, it may lead to some problems. I’ll try to describe the most common one. Suppose you requested (or it was the default setting) 15 frames per second from a certain camera. But, it so happened, that somewhere on the way between you and your camera network speed went down and you receive only 5 frames per second. Suppose your camera has buffer for 30 frames, for example. So, your camera generated 30 frames per 2 seconds, but you consumed them only per 6 seconds. That means that you will see the last frame with a 4 seconds delay – which will be too late in most cases. Of course it is just a sample, cameras will flush their buffers from time to time and do something else to avoid such sort of delays. But, here is a real sample I saw. One guy once entered a room monitored by some camera, spent there a short period of time, and then he went to another room and saw himself walking in that previous room in the camera monitoring application (the application provided by the camera manufacturer).
Many cameras from many different manufacturers support much more than just JPEG or MJPEJ formats. There are cameras which support MPEG-2, and some others that support MPEG-4. Also, some cameras support not only video, but sound transmission as well, and even bidirectional.
Further in this article, I will talk a little bit more about accessing some cameras - accessing single JPEG frames, and MJPEG streams (the MPEG format is not covered by the article). Most camera manufacturers provide APIs and SDKs on their sites, so these information could be learned in more detail.
Accessing JPEGs and MJPEGs
Displaying data from any JPEG source (camera) is really simple – you just need to continuously create HTTP requests to the source, download response data, and extract a bitmap from them. Here is a quick sample of retrieving a single JPEG frame from an IP camera:
string sourceURL = "http://webcam.mmhk.cz/axis-cgi/jpg/image.cgi";
byte buffer = new byte;
int read, total = 0;
HttpWebRequest req = (HttpWebRequest) WebRequest.Create( sourceURL );
WebResponse resp = req.GetResponse( );
Stream stream = resp.GetResponseStream( );
while ( ( read = stream.Read( buffer, total, 1000 ) ) != 0 )
total += read;
Bitmap bmp = (Bitmap) Bitmap.FromStream(
new MemoryStream( buffer, 0, total ) );
But, don't forget that most cameras are not free with open access like in the sample above. Most probably, you will want to protect your camera with a password, which should be specified somehow:
HttpWebRequest req =
(HttpWebRequest) WebRequest.Create( sourceURL );
req.Credentials = new NetworkCredential( login, password );
Accessing MJPEG sources is much more complicated. First of all, let's take a look at the response content type. It should look something like this:
Maybe, it will not look exactly the same, but its type will be
multipart/x-mixed-replace for sure, followed by a certain boundary. In this case, the boundary value is
"--myboundary" . Now, let's take a look at an actual stream data:
... image binary data ...
... image binary data ...
Summarizing this all together, the algorithm of accessing an MJPEG source becomes clear:
- Parse response content type to extract the actual boundary value;
- Read the initial portion of the stream, searching for the first boundary;
- Read binary data until the next boundary;
- Extract an image from the read buffer;
- Process the image (display, do whatever else);
- Continue with steps 3-5 in a loop.
Actually, the idea of accessing an MJPEG source does not look so complicated as I stated before, but it is for sure, that its implementation will be not so trivial as in the case of the JPEG source.
Axis cameras and video servers
Axis cameras and video servers seem to be the best IP video cameras I managed to work. From the user's perspective, these cameras provide good video quality and frame rate, as well as it is very easy to install them and configure. From the programmer's perspective, these devices are even better – the company provides the best developer's documentation I’ve ever seen for IP cameras. The company provides complete documentation on how to access these cameras over HTTP as well as provides an SDK [^].
The following are URL formats to access JPEG and MJPEG stream of Axis IP cameras/servers:
Both of these URLs may accept some parameters, which may make impact on the result. The most popular parameters are resolution (to specify the desired size of the video output), camera (to specify the camera’s number in the case of a video server), and the desired frame rate (only for MJPEG sources):
To get the complete HTTP API and the list of all supported parameters, please, refer the Axis support web site.
StarDot cameras and video servers
It looks like StarDot does not have a great range of IP cameras and video servers, and the range did not change for the past two years. All they have for now is one model of IP camera and one model of video server. As to me, they have the only benefit – their video server supports up to 6 analogue cameras. But everything else does not make them a competitor to such companies like Axis. For example, the frame rate of their IP cameras is really small (not acceptable for security) and these cameras do not support MJPEG. The amount of information for developers also seems to be poor.
URL formats to access their products are really simple:
StarDot Express 6 (video server)
The product range of PiXORD mostly consists of different models of IP cameras, which seem to be rather nice cameras providing good quality and frame rate and supporting MJPEG streams. The company provides SDK for their products, but it becomes accessible only after the registration procedure.
Here are URL formats to access their IP cameras:
I did not work a lot with Panasonic cameras, just found several cameras on the Internet, which you can also browse from the sample application provided with the article.
URL formats to access Panasonic cameras:
D-Link has a wide range of IP video cameras, and is known as one of the first companies who started to use MPEG-4 in their cameras. Actually, these cameras have MPEG video as their primary goal – they don’t support anything else, like MJPEG. Most of their cameras also have audio support, and some models even have bidirectional audio support. From the user's perspective, it is rather simple to install and configure these cameras, which support a lot of different settings. From the developer's perspective, these cameras are not so easy. The company does not want to share lots of development information, and it is really hard to find any developer's information on their site. This fact makes these cameras not so nice in case you want to develop your own surveillance software instead of using their own (which is buggy). By the way, that story I told before about a guy who viewed himself walking in another room was about a D-Link camera.
At this moment, I know only one way to access D-Link cameras (JPEG format):
Some other video sources
Many other video sources can be accessed using other approaches than HTTP. For example, you can easily access such video sources like local web cameras connected to your PC through an USB port, or you can access remote video streams over MMS (Microsoft Media Services). One of the most common approaches to access these two types of video sources is to use DirectShow. The sample application demonstrates the technique, and you can study more about it using several other articles on CodeProject dedicated to the topic.
The application code
The main goal of the application was to make it flexible and extensible. The application itself can communicate with any video source – it may be an IP video camera or a server, it may be a local camera attached to USB, it may be an MMS stream from a remote server, or it may be any other video source. And more of it, the application can work with all these video sources simultaneously, displaying them all on a single screen.
Another main feature of the application is that it can be easily extended on the fly. The main application module knows nothing about any video sources and how to configure them; it knows only how to display them. All the logic of communication with a particular video source is hidden in separate modules, and the application is not tightly coupled with them. If you have a new video source and you want the application to work with it, you don’t need to change any line of code in the application itself. You just need to create a new module which is responsible for communication with your custom video source, and place the module to the application’s folder.
The key approach to implementing the idea was to create an interface which describes all the common functionality of all video sources. The interface is
IVideoSource. Then, a set of classes were created which implement the above interface and encapsulate all the routines for the communication with the particular video source and extracting image data from it. Each such class is fully responsible for all the work required to provide the application with images to display. The code of these classes does not go to the application code, but it goes to separate assemblies, which represent those application modules, which can be easily added to the application to extend its functionality.
Each video source module may contain an arbitrary amount of video providers – classes, which provide access to the video source. Most of such modules contain only one video provider, but some of them have several – it may be preferred to group video providers somehow (all video providers to access cameras/servers of one manufacturer goes to the same module, for example).
All these video providers can be used as complete classes to access different video sources from your application. But there are still two missing things, which are required to implement to make the application extensible and configurable. First of all, all our video providers should be self-describable and self-configurable. For this purpose, two more interfaces were added:
IVideoSourcePage. Each class, which implements the
IVideoSourceDescription interface, provides the name and description of the provider, which allows saving and loading its configuration and the creation of the configured video provider. Classes which implement the
IVideoSourcePage interface represent a property page for the video provider configuration. These additional classes also go to the video provider’s modules. Combining all these together makes clear that a simplest module, which contains only one video provider, should contain three classes: provider description, provider configuration page, and the video provided itself.
And the last thing to make the solution working should be implemented on the application side – the application should find all modules and collect all information about the video source providers which live there. This actually can be done very easily through reflection. First of all, the application searches for all DLL files in the application folder. Then, it tries to load each the file as an assembly and enumerate all types in the assembly, searching for types which implement the
IVideoSourceDescription interface. Once such a type is found, it is instantiated and requested to provide the video provider’s name, description, and other information. This module's investigation procedure is called only once on the application startup, but the application can be easily modified to call the procedure by user request (it may be useful if the user added a new video provider module, but does not want to restart the application).
Some underwater stones
There is one known bug of the .NET 1.0 (and 1.1) framework, which is actually not a bug, but a feature. But the feature makes a great problem to communication with some cameras in MJPEG mode. The problem is that some MJPEG video sources don’t conform to the HTTP standard 100%. Or to say it in a little bit different way – Microsoft was too picky, and implemented the first version of their framework very strictly conforming to the HTTP standard. Some cameras have something very little missing in the HTTP header and .NET immediately refuses to work with them, generating a
WebException with the following description:
The underlying connection was closed:
The server committed an HTTP protocol violation.
Fortunately, it is a known feature of .NET and it is possible to fix it. First of all, you will need to get at least the 1.1 version of the framework and install the first service pack for it. Then, you will need to create an application configuration file for your application, and place it in the application folder. Here is a minimal content of the file to make the MJPEG sources working:
<supportedRuntime version="v1.1.4322" />
<httpWebRequest useUnsafeHeaderParsing="true" />
The second problem is that the
HttpWebRequest class of .NET has such a feature as connection group. By default, all HTTP requests are created in the same connection group, but each connection group has a limit of simultaneously opened connections. So, this creates such a problem, that you cannot monitor many cameras at the same time. The good news is that the problem can also be solved easily – the
HttpWebRequest class has a property called
ConnectionGroupName, so you can manage connections grouping on your own.
The attached application demonstrates all of the techniques described in the article, and allows monitoring of many different cameras from different manufacturers. The application lets you monitor a single camera, or several cameras on a single screen at a time (full screen mode supported). Please don’t consider the application to be a complete video surveillance application, but treat it as a demo, as a proof-of-concept, as a starting point for your own software. However, the application may be useful for many personal reasons.
You can find one more interesting application here on CodeProject, which also works with video cameras, and is based on techniques I described in the article.
The demo application includes many free cameras from all over the world: Las Vegas, Stuttgart Airport, and many many more. You can easily find more freely accessible cameras on the Internet, add them to the application, and enjoy monitoring them.