AI 社交距离检测器：检测视频序列中的人

Dawid Borycki

4.43/5 (5投票s)

2020 年 12 月 8 日

CPOL

3分钟阅读

6261

141

在本文中，我们将对测试数据集中的帧进行对象检测，包括存储在视频文件中的视频序列。

下载源代码 - 1.2 KB

对象检测器通常应用于来自各种摄像头的视频流。有时你会在后处理中执行对象检测，即你得到完整的视频文件，并且必须寻找特定的对象。在本文中，我们将从这一点开始。然后，我们将看到如何过滤检测结果，仅显示人物。我们将实现如下图所示的结果（请注意，右图未检测到自行车）。

对于对象检测，我们使用 TensorFlow 和 MobileNet 模型。视频序列来自此链接。所有配套代码都在这里。

读取视频文件

为了读取视频文件，我创建了一个 VideoReader 类 (请参阅 *Part_05* 文件夹中的 video_reader.py)。在内部，此类使用 OpenCV 的 VideoCapture。我对 VideoCapture 的使用与之前从摄像头读取帧的情况非常相似。主要区别在于我需要将文件路径传递给 VideoCapture 初始化器

def __init__(self, file_path):       
    try:
        self.video_capture = opencv.VideoCapture(file_path)
    except expression as identifier:
        print(identifier)

然后，我通过调用 VideoCapture 类实例的 read 方法，从文件中读取连续的帧

def read_next_frame(self):                    
    (capture_status, frame) = self.video_capture.read()
    
    # Verify the status
    if(capture_status):
        return frame
 
    else:
        return None

要使用 VideoReader 类，首先调用初始化器以提供输入视频文件，然后根据需要多次调用 read_next_frame 方法以读取帧。当该方法到达文件末尾时，它将返回 None。

人物检测

为了检测人物，我从之前创建的模块开始，包括 Inference 和 ImageHelper 类。我们将在 main.py 中引用它们。这些模块的源代码包含在 *Part_03* 文件夹中，并在之前的文章中进行了解释。

为了引用这些模块，我使用以下语句补充了 main.py 文件，假设主脚本是从 *Part_05* 文件夹执行的

import sys
sys.path.insert(1, '../Part_03/')
 
from inference import Inference as model
from image_helper import ImageHelper as imgHelper

因此，我们可以轻松地访问视频文件帧上的对象检测

# Load and prepare model
model_file_path = '../Models/01_model.tflite'
labels_file_path = '../Models/02_labels.txt'
 
# Initialize model
ai_model = model(model_file_path, labels_file_path)   
 
# Initialize video reader
video_file_path = '../Videos/01.mp4'
video_reader = videoReader(video_file_path)

# Get frame from the video file
frame = video_reader.read_next_frame()
 
# Detect objects
score_threshold = 0.5
results = ai_model.detect_objects(frame, score_threshold)

然而，问题是我们检测了模型训练的所有对象。要仅检测人物，我们需要过滤 detect_objects 方法返回的结果。出于过滤目的，我们使用检测到的对象的标签。过滤方法可以按如下方式实现

def detect_people(self, image, threshold):
    # Detect objects
    all_objects = self.detect_objects(image, threshold)
 
    # Return only those with label of 'person'
    people = filter(lambda r: r['label'] == 'person', all_objects)
 
    return list(people)

我将上述方法 detect_people 添加到 Inference 类 (请参阅 Part_03 文件夹中的 inference.py)。 detect_people 函数在内部调用 detect_objects，然后使用 filter（一个内置的 Python 函数）过滤结果。第一个参数是过滤方法。在这里，我使用了一个匿名 lambda 函数，该函数返回一个布尔值。当当前检测结果的标签为“person”时，它为 True，否则为 False。

显示检测结果

为了显示检测到的人物，我使用了来自 image_helper 模块的静态 display_image_with_detected_objects 方法。但是，display_image_with_detected_objects 方法旨在显示图像，直到用户按下某个键。如果我将其用于视频序列，则用户需要为每一帧按下按键。为了使其适应视频，我通过添加另一个参数：delay 修改了该方法。我将此参数的值传递给 OpenCV 的 waitKey 方法以强制执行等待超时

@staticmethod
def display_image_with_detected_objects(image, inference_results, delay = 0):
    # Prepare window
    opencv.namedWindow(common.WINDOW_NAME, opencv.WINDOW_GUI_NORMAL)
 
    # Draw rectangles and labels on the image
    for i in range(len(inference_results)):
        current_result = inference_results[i]
        ImageHelper.draw_rectangle_and_label(image, 
            current_result['rectangle'], current_result['label'])
 
    # Display image
    opencv.imshow(common.WINDOW_NAME, image)
        
    # Wait until the user presses any key
    opencv.waitKey(delay)

默认情况下，延迟为 0，因此该方法仍然可以与期望它等待按键的调用一起使用。

整合

所有组件都准备就绪后，我们可以将它们放在一起

import sys
sys.path.insert(1, '../Part_03/')
 
from inference import Inference as model
from image_helper import ImageHelper as imgHelper
 
from video_reader import VideoReader as videoReader
 
if __name__ == "__main__": 
    # Load and prepare model
    model_file_path = '../Models/01_model.tflite'
    labels_file_path = '../Models/02_labels.txt'
 
    # Initialize model
    ai_model = model(model_file_path, labels_file_path)   
 
    # Initialize video reader
    video_file_path = '../Videos/01.mp4'
    video_reader = videoReader(video_file_path)
 
    # Detection and preview parameters
    score_threshold = 0.4
    detect_only_people = False
    delay_between_frames = 5
 
    # Perform object detection in the video sequence
    while(True):
        # Get frame from the video file
        frame = video_reader.read_next_frame()
 
        # If frame is None, then break the loop
        if(frame is None):
            break
        
        # Perform detection
        if(detect_only_people):
            results = ai_model.detect_people(frame, score_threshold)
        else:
            results = ai_model.detect_objects(frame, score_threshold)        
        
        # Display results        
        imgHelper.display_image_with_detected_objects(frame, results, delay_between_frames)

这里有两个开关来控制脚本的执行。首先，有 detect_only_people 变量，它控制脚本是检测所有对象 (False) 还是仅检测人物 (True)。其次，有 delay_between_frames 变量，它控制帧之间的延迟，从而控制结果预览的速度。默认情况下，我将其设置为 5 毫秒。

总结

在本文中，我们使用 MobileNet 对象检测器来查找视频序列中的人物。运行代码后，我们注意到检测并不完美。有些人没有被识别出来。即使降低检测分数，情况也没有改善。稍后我们将通过使用更强大的对象检测来解决此问题。但首先，我们将学习如何计算图像中人与人之间的距离，以检查他们是否太近。