使用 CodeProject.AI 服务器对视频文件进行目标检测

Chris Maunder

5.00/5 (3投票s)

2024 年 4 月 9 日

CPOL

2分钟阅读

6651

了解如何使用 CodeProject.AI 服务器离线处理视频文件

引言

目标检测或人脸识别领域通常会关注实时摄像头源的处理。但也有使用场景需要处理保存或下载的视频，让我们快速了解一下如何使用 CodeProject.AI 服务器处理视频片段。

我们将使用 Python，特别是 OpenCV 库，因为它内置了对许多视频相关功能的支持。目标是加载一个视频文件，将其传递给目标检测器，并生成一个包含对象和时间戳的文件，以及显示带有检测边界框的视频本身。

安装

这将是一个精简的版本，以便我们专注于使用 CodeProject.AI 服务器，而不是关注繁琐的设置步骤。我们将运行 CodeProject.AI 服务器中的 YOLOv5 6.2 目标检测模块。该模块提供良好的性能，但最方便的是，它运行在 runtimes/ 文件夹中共享的 Python 虚拟环境中。我们将毫不犹豫地使用相同的 venv。

我们的所有代码都将在 CodeProject.AI 服务器代码库中运行，我们的演示位于 /demos/clients/Python/ObjectDetect 文件夹下的 video_process.py 文件中。

要运行此代码，请转到 /demos/clients/Python/ObjectDetect 文件夹并运行

# For Windows
 ..\..\..\..\src\runtimes\bin\windows\python39\venv\Scripts\python video_process.py

# for Linux/macOS
 ../../../../src/runtimes/bin/macos/python38/venv/bin/python video_process.py

要停止程序，请在启动文件的终端中键入“q”。

代码

我们是如何做到的？

以下是打开视频文件并发送到 CodeProject.AI 服务器的最小代码版本

    vs = FileVideoStream(file_path).start()

    with open("results,txt", 'w') as log_file:
        while True:      
            if not vs.more():
                break

            frame = vs.read()
            if frame is None:
                break

            image = Image.fromarray(frame)
            image = do_detection(image, log_file)
            frame = np.asarray(image)

            if frame is not None:
                frame = imutils.resize(frame, width = 640)
                cv2.imshow("Movie File", frame)

    vs.stop()
    cv2.destroyAllWindows()

我们使用 FileVideoStream 打开视频文件，然后遍历流对象，直到帧用完。每个帧都发送到 do_detection 方法，该方法执行实际的目标检测。我们还打开一个名为“results.txt”的日志文件，并将其传递给 do_detection，该方法会将检测到的项目和位置记录到该文件中。

def do_detection(image, log_file):
   
    # Convert to format suitable for a POST to CodeProject.AI Server
    buf = io.BytesIO()
    image.save(buf, format='PNG')
    buf.seek(0)

    # Send to CodeProject.AI Server for object detection. It's better to have a
    # session object created once at the start and closed at the end, but we
    # keep the code simpler here for demo purposes    
    with requests.Session() as session:
        response = session.post(server_url + "vision/detection",
                                files={"image": ('image.png', buf, 'image/png') },
                                data={"min_confidence": 0.5}).json()

    # Get the predictions (but be careful of a null return)
    predictions = None
    if response is not None and "predictions" in response:
       predictions = response["predictions"]

    if predictions is not None:
        # Draw each bounding box and label onto the image we based in
        font = ImageFont.truetype("Arial.ttf", font_size)
        draw = ImageDraw.Draw(image)

        for object in predictions:
            label = object["label"]
            conf  = object["confidence"]
            y_max = int(object["y_max"])
            y_min = int(object["y_min"])
            x_max = int(object["x_max"])
            x_min = int(object["x_min"])

            if y_max < y_min:
                temp = y_max
                y_max = y_min
                y_min = temp

            if x_max < x_min:
                temp = x_max
                x_max = x_min
                x_min = temp

            draw.rectangle([(x_min, y_min), (x_max, y_max)], outline="red", width=line_width)
            draw.text((x_min + padding, y_min - padding - font_size), f"{label} {round(conf*100.0,0)}%", font=font)

            log_file.write(f"{object_info}: ({x_min}, {y_min}), ({x_max}, {y_max})\n")

    # Return our (now labelled) image
    return image

唯一棘手的部分是

从视频文件中提取帧
正确编码每个帧，以便将其作为 HTTP POST 发送到 CodeProject.AI 服务器 API
在帧上绘制检测到的对象的边界框和标签，并依次显示每个帧

CodeProject.AI 服务器已经完成了检测每个帧中对象的所有实际工作

总结

此处解释的技术可以应用于 CodeProject.AI 服务器中的许多模块：获取一些数据，将其转换为适合 HTTP POST 的格式，进行 API 调用，然后显示结果。只要你有要发送的数据和满足你需求的模块，一切就绪。