Click here to Skip to main content
15,946,342 members
Articles / Programming Languages / Python
Article

Performing Object Detection on a video file using CodeProject.AI Server

Rate me:
Please Sign up or sign in to vote.
5.00/5 (3 votes)
9 Apr 2024CPOL2 min read 4.4K   8   5
Learn how to process a video file offline using CodeProject.AI Server

Image 1

Introduction

A lot of attention is given to processing live webcam feeds in areas of object detection or face recognition. There are uses cases for processing video from saved or downloaded videos so let's do a very quick walkthrough of using CodeProject.AI Server to process a video clip.

For this we'll use Python, and specifically the OpenCV library due to its built in support for many things video. The goal is to load a video file, run it against an object detector, and generate a file containing objects and timestamps, as we as viewing the video itself with the detection bounding boxes overlaid.

Setup

This will be bare-bones so we can focus on using CodeProject.AI Server rather than focussing on tedious setup steps. We will be running the YOLOv5 6.2 Object Detection module within CodeProject.AI Server. This module provides decent performance, but most conveniently it runs in the shared Python virtual environment setup in the runtimes/ folder. We will shamelessly use this same venv ourselves.

All our code will run inside the CodeProject.AI Server codebase, with our demo sitting under the /demos/clients/Python/ObjectDetect folder in the video_process.py file.

To run this code, go to the /demos/clients/Python/ObjectDetect folder and run

Shell
# For Windows
 ..\..\..\..\src\runtimes\bin\windows\python39\venv\Scripts\python video_process.py

# for Linux/macOS
 ../../../../src/runtimes/bin/macos/python38/venv/bin/python video_process.py 

To halt the program type "q" in the terminal from which you launched the file.

The Code

So how did we do it?

Below is minimal version of the code for opening a video file and sending to CodeProject.AI Server

Python
vs = FileVideoStream(file_path).start()

with open("results,txt", 'w') as log_file:
    while True:
        if not vs.more():
            break

        frame = vs.read()
        if frame is None:
            break

        image = Image.fromarray(frame)
        image = do_detection(image, log_file)
        frame = np.asarray(image)

        if frame is not None:
            frame = imutils.resize(frame, width = 640)
            cv2.imshow("Movie File", frame)

vs.stop()
cv2.destroyAllWindows()

We open the video file using FileVideoStream, then iterate over the stream object until we run out of frames. Each frame is sent to a do_detection method which does the actual object detection. We've also opened a log file called "results.txt" which we pass to do_detection, which will log the items and locations detected in the image to this file.

Python
def do_detection(image, log_file):
   
    # Convert to format suitable for a POST to CodeProject.AI Server
    buf = io.BytesIO()
    image.save(buf, format='PNG')
    buf.seek(0)

    # Send to CodeProject.AI Server for object detection. It's better to have a
    # session object created once at the start and closed at the end, but we
    # keep the code simpler here for demo purposes    
    with requests.Session() as session:
        response = session.post(server_url + "vision/detection",
                                files={"image": ('image.png', buf, 'image/png') },
                                data={"min_confidence": 0.5}).json()

    # Get the predictions (but be careful of a null return)
    predictions = None
    if response is not None and "predictions" in response:
       predictions = response["predictions"]

    if predictions is not None:
        # Draw each bounding box and label onto the image we based in
        font = ImageFont.truetype("Arial.ttf", font_size)
        draw = ImageDraw.Draw(image)

        for object in predictions:
            label = object["label"]
            conf  = object["confidence"]
            y_max = int(object["y_max"])
            y_min = int(object["y_min"])
            x_max = int(object["x_max"])
            x_min = int(object["x_min"])

            if y_max < y_min:
                temp = y_max
                y_max = y_min
                y_min = temp

            if x_max < x_min:
                temp = x_max
                x_max = x_min
                x_min = temp

            draw.rectangle([(x_min, y_min), (x_max, y_max)], outline="red", width=line_width)
            draw.text((x_min + padding, y_min - padding - font_size), f"{label} {round(conf*100.0,0)}%", font=font)

            log_file.write(f"{object_info}: ({x_min}, {y_min}), ({x_max}, {y_max})\n")

    # Return our (now labelled) image
    return image

The only tricky parts are

  1. Extract frames from a video file
  2. Encode each frame correctly so it can be sent as a HTTP POST to the CodeProject.AI Server API
  3. Draw the bounding boxes and labels of detected objects onto the frame, and display each frame in turn

ALl the real grunt work has been done by CodeProject.AI Server in detecting the objects in each frame

Summing up

The techniques explained here are transferrable to many of the modules in CodeProject.AI Server: take some data, convert to a form suitable for a HTTP POST, make the API call, and then display the results. As long as you have the data to send a module that satisfies your need, you're good to go.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Founder CodeProject
Canada Canada
Chris Maunder is the co-founder of CodeProject and ContentLab.com, and has been a prominent figure in the software development community for nearly 30 years. Hailing from Australia, Chris has a background in Mathematics, Astrophysics, Environmental Engineering and Defence Research. His programming endeavours span everything from FORTRAN on Super Computers, C++/MFC on Windows, through to to high-load .NET web applications and Python AI applications on everything from macOS to a Raspberry Pi. Chris is a full-stack developer who is as comfortable with SQL as he is with CSS.

In the late 1990s, he and his business partner David Cunningham recognized the need for a platform that would facilitate knowledge-sharing among developers, leading to the establishment of CodeProject.com in 1999. Chris's expertise in programming and his passion for fostering a collaborative environment have played a pivotal role in the success of CodeProject.com. Over the years, the website has grown into a vibrant community where programmers worldwide can connect, exchange ideas, and find solutions to coding challenges. Chris is a prolific contributor to the developer community through his articles and tutorials, and his latest passion project, CodeProject.AI.

In addition to his work with CodeProject.com, Chris co-founded ContentLab and DeveloperMedia, two projects focussed on helping companies make their Software Projects a success. Chris's roles included Product Development, Content Creation, Client Satisfaction and Systems Automation.

Comments and Discussions

 
QuestionRecognition of vehicle type Pin
Frank nKansas16-Jun-24 14:37
Frank nKansas16-Jun-24 14:37 
QuestionRealtime Number Plate Recognition Pin
Pedro Hernández19-Apr-24 7:27
Pedro Hernández19-Apr-24 7:27 
AnswerRe: Realtime Number Plate Recognition Pin
Chris Maunder3-May-24 5:12
cofounderChris Maunder3-May-24 5:12 
GeneralRe: Realtime Number Plate Recognition Pin
Pedro Hernández3-May-24 6:49
Pedro Hernández3-May-24 6:49 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.