From the previous articles in this series, we have the Python code for detecting humans in images using SSD DNN models. And we’ve shown that this code can be launched on a Raspberry Pi device. In this article, we’ll adapt our code for detecting persons in video streams.

We’ll use video files in the mp4 format as our video "stream" data. It is the simplest way of testing the detection algorithm. First of all, we can use data from any camera – or download a video clip from the Internet. Second, we can repeat the experiment with the same input data multiple times, which is an important aspect of the testing. The Raspberry Pi device features a camera module, and we could use it for testing the detection algorithm on a live video stream. However, it would be almost impossible to test the algorithm multiple times under the same conditions because it is very hard to exactly repeat a scene for the live camera.

We already have some utility classes for handling Caffe DNN models, processing images, and drawing detections on them. Also, have the main class, SSD, which receives a frame (image) and detects persons in it. Now, using the existing code, let’s write a new Python class for video processing:

class VideoSSD:	
    def __init__(self, ssd):
        self.ssd = ssd
	
    def detect(self, video, class_num, min_confidence):
        detection_num = 0;
    	  fps = FPS()    	
    	  capture = cv2.VideoCapture(video)
 
    	  # Capture all frames
        while(True):	
            (ret, frame) = capture.read()
            if frame is None:
                break
 
            fps.start()
        	obj_data = self.ssd.detect(frame)
        	persons = self.ssd.get_objects(frame, obj_data, class_num, min_confidence)
        	fps.stop()
        	p_count = len(persons)
        	detection_num += p_count
        	
            if len(persons)>0:
            Utils.draw_objects(persons, "PERSON", (0, 0, 255), frame)
        	
        	# Display the resulting frame
            cv2.imshow('Person detection',frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

        capture.release()
    	  cv2.destroyAllWindows()    
    	
    	  f = fps.fps()
    	  return (detection_num, f)

The VideoSSD class is initialized with an instance of the SSD class, which is used in the video processor for person detection. The main detect method has three input arguments: video – a full path to the video file, class_num – the class number to detect, and min_confidence - the threshold of the class detection. In this method, we first initialize the fps object for performance evaluation and create VideoCapture from the cv2 module. Then we loop over all the frames in the video stream, detect persons in each of the frames, count all the detected persons, and calculate the performance. The code can be modified for use with a live camera instead of a video file. All you need to do is modify the initialization of cv2.VideoCapture for the appropriate parameters.

The detect method returns a tuple that contains the total number of the detected persons and the average FPS. Note that we do not save the frames with the detected persons to a file (the way we did for the detections in images). We just draw the detections in the frames and show them in the GUI window by calling the cv2.imshow method. So we’ll see the detections on the screen.

Now let’s write the code for detecting persons in a video file:

# testing video detector
proto_file = r"/home/pi/Desktop/PI_RPD/mobilenet.prototxt"
model_file = r"/home/pi/Desktop/PI_RPD/mobilenet.caffemodel"
ssd_net = CaffeModelLoader.load(proto_file, model_file)
print("Caffe model loaded from: "+model_file)
 
proc_frame_size = 300
# frame processor for MobileNet
ssd_proc = FrameProcessor(proc_frame_size, 1.0/127.5, 127.5)
person_class = 15
 
ssd = SSD(ssd_proc, ssd_net)
 
video_file = r"/home/pi/Desktop/PI_RPD/video/person_1.mp4"
video_ssd = VideoSSD(ssd)
(detections, fps) = video_ssd.detect(video_file, person_class, 0.5)
print ("Detection count:"+str(detections))
print ("FPS:"+str(fps))

Running the above code on the Raspberry Pi 3B device against a test video, we get the following results: the total detection count is 54; the average FPS is about 1.27.

While running the program, we see the GUI window with detections of the persons. Here is the saved screen video:

As you can see from the video, detections are very slow because the processing unit needs about 0.8 seconds per frame. That’s the same performance we benchmarked while processing still images… but it looks slow for video processing.

At this point, we should ask ourselves if it is enough to process one frame in 0.8 seconds to realize a real-time person detection. The answer depends on the detection purpose. If the goal is to count all the customers who had entered a shopping center, the answer is "No." But if we are developing a video surveillance system, and the goal is to just inform the operator of an intruder appearance, the answer would be "Yes." Let’s note that on our test video, each person appears for several seconds. If we can process one frame in a second, we can detect the person’s appearance with a high probability.

Looks like we can use the AI models for detecting a person’s appearance on live cameras. However, our person detection code is not geared for real-time, because it processes all the frames, which makes it slow.

Next Steps

In the next article, we’ll modify the code for real-time processing on an edge device.