混合边缘 AI 人脸检测

谢尔盖·L·格拉德基

4.71/5 (3投票s)

2021年7月19日

CPOL

4分钟阅读

9003

591

在本文中，我们将运行一个预训练的DNN模型来检测视频中的人脸。

引言

人脸识别是人工智能（AI）领域中，在过去十年中深度学习（DL）取得了巨大成功的一个领域。最佳的人脸识别系统可以以与人类相同甚至更高的精度识别图像和视频中的人物。人脸识别的两个主要基础阶段是人脸验证和人脸识别。

在本系列文章的第一个（当前）部分，我们将

讨论现有的人脸检测AI方法，并开发一个程序来运行预训练的DNN模型
考虑人脸对齐，并使用人脸标志点实现一些对齐算法
在树莓派设备上运行人脸检测DNN，探索其性能，并考虑可能的加速运行方法，以及实时检测人脸
创建一个简单的人脸数据库，并用从图像或视频中提取的人脸填充它

我们假设您熟悉DNN、Python、Keras和TensorFlow。欢迎下载此项目代码...

在上一篇文章中，我们讨论了人脸检测和人脸识别的原理。在这篇文章中，我们将重点介绍具体的人脸检测方法并实现其中一种。

人脸检测方法

人脸检测是任何面部识别过程的第一阶段。这是一个关键步骤，会影响所有后续步骤。它需要一种鲁棒的方法来最小化检测误差。人脸检测方法有很多种；我们将专注于基于AI的方法。

我们想提及以下现代人脸检测方法：Max-Margin Object Detection (MMOD)、Single-Shot Detector (SSD)、Multi-task Cascaded Convolutional Networks (MTCNN) 和 You Look Only Once (YOLO)。

MMOD模型在边缘设备上运行需要过多的资源。最快的DNN是YOLO；它在检测真实场景视频中的人脸时提供了相当好的精度。以上方法中最精确的是SSD。它的处理速度足以用于低功耗设备。

YOLO和SSD方法的主要缺点是它们无法提供面部地标信息。正如我们稍后将看到的，此信息对于人脸对齐很重要。

MTCNN提供良好的精度并找到面部地标。它足够轻量，可以在资源受限的边缘设备上运行。

MTCNN检测器

在本系列中，我们将使用一个免费的Keras实现的MTCNN检测器。您可以使用标准的pip命令在Python环境中安装此库。它需要OpenCV 4.1和TensorFlow 2.0（或更高版本）。

您可以通过运行简单的Python代码来测试MTCNN是否安装成功

import mtcnn

print(mtcnn.__version__)

输出必须显示已安装库的版本 - 0.1.0。

安装库后，我们可以编写基于MTCNN的代码来实现一个简单的人脸检测器

import os
import time
import numpy as np
import copy
import mtcnn
from mtcnn import MTCNN
import cv2

class MTCNN_Detector:    
    def __init__(self, min_size, min_confidence):
        self.min_size = min_size
        self.f_detector = MTCNN(min_face_size=min_size)
        self.min_confidence = min_confidence
    
    def detect(self, frame):
        faces = self.f_detector.detect_faces(frame)
        
        detected = []
        for (i, face) in enumerate(faces):
            f_conf = face['confidence']
            if f_conf>=self.min_confidence:
                detected.append(face)
        
        return detected
    
    def extract(self, frame, face):
        (x1, y1, w, h) =  face['box']
        (l_eye, r_eye, nose, mouth_l, mouth_r) = Utils.get_keypoints(face)
        
        f_cropped = copy.deepcopy(face)
        move = (-x1, -y1)
        l_eye = Utils.move_point(l_eye, move)
        r_eye = Utils.move_point(r_eye, move)
        nose = Utils.move_point(nose, move)
        mouth_l = Utils.move_point(mouth_l, move)
        mouth_r = Utils.move_point(mouth_r, move)
            
        f_cropped['box'] = (0, 0, w, h)
        f_img = frame[y1:y1+h, x1:x1+w].copy()
            
        f_cropped = Utils.set_keypoints(f_cropped, (l_eye, r_eye, nose, mouth_l, mouth_r))
        
        return (f_cropped, f_img)

检测器类有一个带有两个参数的构造函数：min_size - 面部最小像素尺寸；和min_confidence - 确认检测到的对象是人脸的最小置信度。该类的detect方法使用内部MTCNN检测器来获取帧中的人脸，然后过滤掉置信度至少达到最小值的检测对象。最后一个方法extract用于从帧中裁剪人脸图像。

我们还需要以下Utils类

class Utils:    
    @staticmethod
    def draw_face(face, color, frame, draw_points=True, draw_rect=True, n_data=None):
        (x1, y1, w, h) =  face['box']
        confidence = face['confidence']
        x2 = x1+w
        y2 = y1+h
        if draw_rect:
            cv2.rectangle(frame, (x1, y1), (x2, y2), color, 1)
        y3 = y1-12
        if not (n_data is None):
            (name, conf) = n_data
            text = name+ (" %.3f" % conf)
        else:
            text = "%.3f" % confidence
        
        cv2.putText(frame, text, (x1, y3), cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 1, cv2.LINE_AA)
        if draw_points:
            (l_eye, r_eye, nose, mouth_l, mouth_r) = Utils.get_keypoints(face)
            Utils.draw_point(l_eye, color, frame)
            Utils.draw_point(r_eye, color, frame)
            Utils.draw_point(nose, color, frame)
            Utils.draw_point(mouth_l, color, frame)
            Utils.draw_point(mouth_r, color, frame)
        
    @staticmethod
    def get_keypoints(face):
        keypoints = face['keypoints']
        l_eye = keypoints['left_eye']
        r_eye = keypoints['right_eye']
        nose = keypoints['nose']
        mouth_l = keypoints['mouth_left']
        mouth_r = keypoints['mouth_right']
        return (l_eye, r_eye, nose, mouth_l, mouth_r)
    
    def set_keypoints(face, points):
        (l_eye, r_eye, nose, mouth_l, mouth_r) = points
        keypoints = face['keypoints']
        keypoints['left_eye'] = l_eye
        keypoints['right_eye'] = r_eye
        keypoints['nose'] = nose
        keypoints['mouth_left'] = mouth_l
        keypoints['mouth_right'] = mouth_r
        
        return face
        
    @staticmethod
    def move_point(point, move):
        (x, y) = point
        (dx, dy) = move
        res = (x+dx, y+dy)
        return res
        
    @staticmethod
    def draw_point(point, color, frame):
        (x, y) =  point
        x1 = x-1
        y1 = y-1
        x2 = x+1
        y2 = y+1
        cv2.rectangle(frame, (x1, y1), (x2, y2), color, 1)
        
    @staticmethod
    def draw_faces(faces, color, frame, draw_points=True, draw_rect=True, names=None):
        for (i, face) in enumerate(faces):
            n_data = None
            if not (names is None):
                n_data = names[i]
            Utils.draw_face(face, color, frame, draw_points, draw_rect, n_data)

在MTCNN检测器的输出中，每个面部对象是一个字典，包含以下键：box、confidence和keypoints。keypoints项是一个包含面部地标数据的字典：left_eye、right_eye、nose、mouth_left和mouth_right。Utils类提供了对人脸数据的简单访问，并实现了几个函数来操作数据并在图像周围绘制边界框。

图像中的人脸检测

现在我们可以编写Python代码来检测图像中的人脸

d = MTCNN_Detector(30, 0.5)
print("Detector loaded.")

f_file = r"C:\PI_FR\frames\frame_5_02.png"
fimg = cv2.imread(f_file)

faces = d.detect(fimg)

for face in faces:
	print(face)

Utils.draw_faces(faces, (0, 0, 255), fimg, True, True)

res_path = r"C:\PI_FR\detect"
f_base = os.path.basename(f_file)
r_file = os.path.join(res_path, f_base+"_detected.png")
cv2.imwrite(r_file, fimg)

for (i, face) in enumerate(faces):
	(f_cropped, f_img) = d.extract(fimg, face)
	Utils.draw_faces([f_cropped], (255, 0, 0), f_img, True, False)
	dfname = os.path.join(res_path, f_base + ("_%06d" % i) + ".png")
	cv2.imwrite(dfname, f_img)

运行上述代码后，会在detect文件夹中生成此图像。

正如您所见，检测器已以大约99%的良好置信度找到了所有三张脸。我们还在同一目录中获得了裁剪后的人脸。

运行相同的代码处理不同的帧，我们可以测试各种情况下的检测。以下是两帧的结果。

结果表明，检测器能够找到戴眼镜的人脸，并且还能成功检测出婴儿的脸。

视频中的人脸检测

在对单独的图像进行了检测器测试后，现在让我们编写代码来检测视频中的人脸

class VideoFD:    
    def __init__(self, detector):
        self.detector = detector
    
    def detect(self, video, save_path = None, align = False, draw_points = False):
        detection_num = 0;
        capture = cv2.VideoCapture(video)
        img = None

        dname = 'AI face detection'
        cv2.namedWindow(dname, cv2.WINDOW_NORMAL)
        cv2.resizeWindow(dname, 960, 720)
        
        frame_count = 0
        dt = 0
        face_num = 0
        # Capture all frames
        while(True):    
            (ret, frame) = capture.read()
            if frame is None:
                break
            frame_count = frame_count+1
            
            t1 = time.time()
            faces = self.detector.detect(frame)
            t2 = time.time()
            p_count = len(faces)
            detection_num += p_count
            dt = dt + (t2-t1)
            
            if (not (save_path is None)) and (len(faces)>0) :
                f_base = os.path.basename(video)
                for (i, face) in enumerate(faces):
                    (f_cropped, f_img) = self.detector.extract(frame, face)
                    if (not (f_img is None)) and (not f_img.size==0):
                        if draw_points:
                            Utils.draw_faces([f_cropped], (255, 0, 0), f_img, draw_points, False)
                        face_num = face_num+1
                        dfname = os.path.join(save_path, f_base + ("_%06d" % face_num) + ".png") 
                        cv2.imwrite(dfname, f_img)
            
            if len(faces)>0:
                Utils.draw_faces(faces, (0, 0, 255), frame)
            
            # Display the resulting frame
            cv2.imshow(dname,frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
            
        capture.release()
        cv2.destroyAllWindows()    
        
        fps = frame_count/dt
        
        return (detection_num, fps)

VideoFD类只是包装了我们的MTCNN检测器实现，并将从视频文件中提取的帧馈送给它。它使用了OpenCV库中的VideoCapture类。

我们可以使用以下代码启动视频检测器

d = MTCNN_Detector(50, 0.95)
vd = VideoFD(d)
v_file = r"C:\PI_FR\video\5_3.mp4"

save_path = r"C:\PI_FR\detect"
(f_count, fps) = vd.detect(v_file, save_path, False, False)

print("Face detections: "+str(f_count))
print("FPS: "+str(fps))

这是从屏幕捕获的视频结果

测试显示结果很好：视频文件中的大多数帧都检测到了人脸。在Core i7 CPU上，处理速度约为20 FPS。对于人脸检测这样困难的任务来说，这令人印象深刻。

后续步骤

看来我们可以使用MTCNN检测器实现实时视频检测。我们的最终目标是在低功耗边缘设备上运行检测器。在开始与边缘设备进行实验之前，我们必须实现人脸识别流程的另一部分 - 人脸对齐。在下一篇文章中，我们将解释如何基于检测器找到的面部地标执行对齐。敬请关注！