检测 ARM 树莓派算法 Python

基于 ARM 的驾驶员分心检测

谢尔盖·L·格拉德基

5.00/5 (3投票s)

2022年2月7日

CPOL

8分钟阅读

5911

169

如何创建驾驶员分心检测器，以及如何在 Raspberry Pi 或 Jetson Nano 等基于 ARM 的设备上运行它。

引言

在讨论基于人工智能（AI）或深度学习（DL）的计算机视觉（CV）时，我们通常会想到一台强大的台式机或服务器来处理图像或视频。但有时，我们需要在便携式设备上运行复杂的 CV 算法。

例如，要创建一个防止驾驶员分心的计算机系统，最实用的解决方案是使用专用软件的独立设备。驾驶员、车队经理或制造商可以将其放置在车辆中，在驾驶员可能分心时提醒他们。

那么，我们能否在便携式 Arm 设备上运行复杂的算法呢？在本文中，我们将演示如何创建一个分心驾驶员检测器，并展示如何在树莓派设备上运行它。我们将使用 Python 开发程序，使用 OpenCV 进行计算机视觉算法，并使用卷积神经网络（CNN）来检测可能的驾驶员分心。

发明算法

我们将使用一种简单的检测类型，检查眼睛是否在短时间内闭合。我们可以表征许多其他分心症状，但这种症状可能最可靠。

现代 AI 算法可以轻松完成此任务。一种方法是使用专门的 CNN 来检测所谓的面部标志点。下图展示了一个极其常见的 68 点面部标志点图。

利用眼部标志点坐标，我们可以计算眼睛的宽高比。当眼睛闭合时，该比率会显着降低。通过跟踪这些数据，我们可以检测到潜在分心的时刻。

获取面部标志点的一种常见方法是检测面部边界框（围绕面部的框），并在其中定位标志点坐标。因此，此算法需要两个要素——一个面部检测器和一个标志点评估器。我们将对这两个子任务使用深度神经网络（DNN）。您可以在 GitLab 上找到面部检测 TensorFlow 模型。对于面部标志点评估器，我们将使用这个 Caffe 模型。

检测面部标志点

让我们开始编写面部标志点检测算法的代码。我们从基于 DNN 模型的人脸检测器开始。

class TF_FD:    
    def __init__(self, model, graph, min_size, min_confidence):
        self.min_size = min_size
        self.min_confidence = min_confidence
        self.detector = cv2.dnn.readNetFromTensorflow(model, graph)
        l_names = self.detector.getLayerNames()
        if len(l_names)>0:
            print('Face detector loaded:')
        else:
            print('Face detector loading FAILED')
        
    def detect(self, frame):
        width = frame.shape[1]
        height = frame.shape[0]
        
        inputBlob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300), \
                                          (104.0, 177.0, 123.0), True, False)

        self.detector.setInput(inputBlob, 'data');
        detection = self.detector.forward('detection_out');

        n = detection.shape[2]
        
        detected = []
        for i in range(n):
            conf = detection[0, 0, i, 2]
            if conf >= self.min_confidence:
                x1 = detection[0, 0, i, 3]
                y1 = detection[0, 0, i, 4]
                x2 = detection[0, 0, i, 5]
                y2 = detection[0, 0, i, 6]
                # skip faces out of the frame
                if Utils.point_is_out(x1,y1) or Utils.point_is_out(x2, y2):
                    continue
                fw = (x2-x1)*width
                fh = (y2-y1)*height
                if (fw>=self.min_size) and (fh>=self.min_size):
                    r = (x1, y1, x2, y2)
                    d = (conf, r)
                    detected.append(d)
        
        return detected

这个简单的类提供了用于从指定的模型和图形文件加载 TensorFlow 神经网络的构造函数。OpenCV 框架的 cv2.dnn 模块提供了加载各种流行格式的 DNN 模型的函数。构造函数有两个附加参数：最小面部尺寸和最小检测置信度。

该类的 detect 方法接收一个参数 frame，即图像或视频帧。该函数创建一个 blob 对象（一个我们用作检测器输入数据的特殊 4D 数组）。

请注意，我们在模型的 blobFromImage 函数中为参数使用了一些特定值。如果您使用其他面部检测模型，请记住根据需要更改这些值。

接下来，我们通过调用 forward 方法来运行检测器，并提取满足我们标准的（最小尺寸和置信度）所有面部的（检测置信度和边界框）数据。

接下来，我们开发第二个类，即面部标志点检测器

class CAFFE_FLD:    
    def __init__(self, model, proto):
        self.detector = cv2.dnn.readNetFromCaffe(proto, model)
        l_names = self.detector.getLayerNames()
        if len(l_names)>0:
            print('Face landmarks detector loaded:')
        else:
            print('Face landmarks detector loading FAILED')
    
    def get_face_rect(self, frame, face):
        width = frame.shape[1]
        height = frame.shape[0]
        
        (conf, rect) =  face
        (x1, y1, x2, y2) = rect
        fw = (x2-x1)*width
        fh = (y2-y1)*height
        
        if fw>fh:
            dx = (fw-fh)/(2*width)
            x1 = x1+dx
            x2 = x2-dx
        else:
            dy = (fh-fw)/(2*height)
            y1 = y1+dy
            y2 = y2-dy
        
        x1 = Utils.fit(x1)
        y1 = Utils.fit(y1)
        x2 = Utils.fit(x2)
        y2 = Utils.fit(y2)
        
        rect = (x1, y1, x2, y2)
        
        return rect
    
    def get_frame_points(self, face_rect, face_points):
        (x1, y1, x2, y2) = face_rect
        fw = (x2-x1)
        fh = (y2-y1)
        
        n = len(face_points)
        frame_points = []
        
        for i in range(n):
            v = face_points[i]
            if (i % 2) == 0:
                dv = x1
                df = fw
            else:
                dv = y1
                df = fh
            v = dv+v*df
            frame_points.append(v)
            
        return frame_points
    
    def get_face_image(self, frame, face):
        width = frame.shape[1]
        height = frame.shape[0]
        
        (conf, rect) =  face
        (x1, y1, x2, y2) = rect
        
        rect = self.get_face_rect(frame, face)
        (xi1, yi1, xi2, yi2) = Utils.rect_to_abs(rect, width, height)
        
        roi = frame[yi1:yi2, xi1:xi2]
        gray = cv2.cvtColor(roi, cv2.COLOR_RGB2GRAY)
        resized = cv2.resize(gray, (60, 60), 0.0, 0.0, interpolation=cv2.INTER_CUBIC)
        
        return (rect, gray, resized)
    
    def detect(self, f_img):
        width = f_img.shape[1]
        height = f_img.shape[0]
        
        inputBlob = cv2.dnn.blobFromImage(f_img, 1/127.5, (60, 60), (127.5))
        self.detector.setInput(inputBlob, 'data');
        
        detection = self.detector.forward();
        points = detection[0]
        
        return points

该类也在初始化时加载 DNN 模型，但它使用另一个函数，因为该模型是 Caffe 的特定格式。主要方法 detect 再次创建 blob 并运行神经网络以获取面部标志点。在这种情况下，detect 方法接收的不是整个帧，而是包含一个面部的经过特殊处理的帧部分。

我们可以使用为此目的专门设计的 get_face_image 方法来生成此“面部图像”。它找到包含面部的正方形框，将其从帧中裁剪出来，将蓝色、绿色、红色（BGR）数据转换为灰度图像（因为我们的 DNN 模型是在灰度图像上训练的），然后使用高质量插值方法将图像调整为 60x60 像素。

在树莓派上运行标志点检测

既然我们已经设计了面部标志点检测器，我们应该在 Arm 设备上对其进行测试，以验证其是否能以足够高的每秒帧数（FPS）运行该算法。我们将在 Raspberry Pi 4 Model B 设备上进行测试。我们已经使用预编译的二进制包在设备上安装了 Python OpenCV 框架。如果您使用其他设备，应按照相应的指南安装其软件包。

在本文中，我们不使用特殊的 AI 框架，神经网络在没有 GPU 或 TPU 加速的情况下进行处理。因此，所有 ML 工作负载仅在设备的 CPU 上运行。

我们将使用视频文件进行所有测试，以确保实验的可重复性。视频设置在办公室，但模仿了驾车场景。

以下类在视频文件上运行面部标志点检测

class VideoFLD:    
    def __init__(self, fd, fld):
        self.fd = fd
        self.fld = fld
    
    def process(self, video):
        frame_count = 0
        detection_num = 0;
        dt = 0
        dt_l = 0
        
        capture = cv2.VideoCapture(video)
        img = None

        dname = 'Arm-Powered Driver Distraction Detection'
        cv2.namedWindow(dname, cv2.WINDOW_NORMAL)
        cv2.resizeWindow(dname, 720, 720)
        
        # Capture all frames
        while(True):    
            (ret, frame) = capture.read()
            if frame is None:
                break
            frame_count = frame_count+1
            
            # work with square images
            width = frame.shape[1]
            height = frame.shape[0]
            if not (width == height):
                dx = int((width-height)/2)
                frame = frame[0:height, dx:dx+height]
            
            t1 = time.time()
            faces = self.fd.detect(frame)
            t2 = time.time()
            dt = dt + (t2-t1)
            
            f_count = len(faces)
            detection_num += f_count
            
            draw_points = []
            if (f_count>0):
                for (i, face) in enumerate(faces):
                    t1 = time.time()
                    (fi_rect, fi_gray, fi_resized) = self.fld.get_face_image(frame, face)
                    points = self.fld.detect(fi_resized)
                    frame_points = self.fld.get_frame_points(fi_rect, points)
                    t2 = time.time()
                    dt_l = dt_l + (t2-t1)
                    draw_points.append(frame_points)
    
            if len(faces)>0:
                Utils.draw_faces(faces, (255, 0, 0), 1, frame, True)
            if len(draw_points)>0:
                for (i, points) in enumerate(draw_points):
                    Utils.draw_points(points, (0, 0, 255), 1, frame)
            
            # Display the resulting frame
            cv2.imshow(dname,frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
            
        capture.release()
        cv2.destroyAllWindows()    
        
        fps = 0.0
        if dt>0:
            fps = frame_count/dt
            
        fps_l = 0.0
        if dt_l>0:
            fps_l = detection_num/dt_l
        
        return (detection_num, fps, fps_l)

在这里，我们使用我们的面部和标志点检测器来提供主要功能。我们使用 OpenCV 库中的 VideoCapture 类从视频文件中读取帧并将其馈送给检测器。

现在我们可以使用以下代码运行算法

w_path = '/home/pi/Desktop/PI_DD'
n_path = os.path.join(w_path, 'net')
fd_model = os.path.join(n_path, 'opencv_face_detector_uint8.pb')
fd_graph = os.path.join(n_path, 'opencv_face_detector.pbtxt')
fd = TF_FD(fd_model, fd_graph, 30, 0.5)

fld_model = os.path.join(n_path, 'face_landmarks.caffemodel')
fld_proto = os.path.join(n_path, 'face_landmarks.prototxt')
fld = CAFFE_FLD(fld_model, fld_proto)

v_path = os.path.join(w_path, 'video')
v_name = 'v_1.mp4'
v_file = os.path.join(v_path, v_name)
vfld = VideoFLD(fd, fld)

(detection_num, fps, fps_l) = vfld.process(v_file)

print("Face detections: "+str(detection_num))
print("Detection FPS: "+str(fps))
print("Landmarks FPS: "+str(fps_l))

您可以在以下视频中看到屏幕显示的结果

我们的面部标志点检测算法工作良好，并以合理的精度定位参考点。它为我们提供了约 2 FPS 的面部检测速度和约 60 FPS 的标志点评估速度。这绝对可用，而且考虑到我们只使用了 Pi 的 CPU，这已经相当不错了。

这个速度应该足以在三秒内检测到闭眼，适用于驾驶员分心的真实情况。所以，这应该足以满足我们的分心检测任务。

实现驾驶员分心检测

距离完成分心驾驶员检测算法只剩一步了：编写用于评估眼睛宽高比并跟踪它以评估潜在分心时刻的算法。

首先，我们在 CAFFE_FLD 类中添加两个简单的方法

   def get_eye_points(self, face_points, eye_id):
        i0 = 72
        i1 = i0+12*(eye_id-1)
        i2 = i1+12
            
        eye_points = face_points[i1:i2]
        return eye_points
    
    def get_eye_ratio(self, eye):
        n = int(len(eye)/2)
        pts = np.array(eye, dtype=np.float32)
        pts = pts.reshape([n, 2])
        
        rect = cv2.minAreaRect(pts)
        (w, h) = rect[1]
        
        if (w>h):
            ratio = h/w
        else:
            ratio = w/h
        
        return ratio

get_eye_points 方法从 68 个面部标志点数组中提取眼睛的点。get_eye_ratio 方法评估眼睛的宽高比。

现在我们可以编写代码来跟踪比率值并检测潜在分心的时刻。

class DERD:    
    def __init__(self, ratio_thresh, delta_time, eyes=2):
        self.ratio_thresh = ratio_thresh
        self.delta_time = delta_time
        self.eyes = eyes
        self.eye_closed_time = 0.0
        self.last_time = 0.0
    
    def start(self, time):
        self.eye_closed_time = 0.0
        self.last_time = time
    
    def detect(self, eye1_ratio, eye2_ratio, time):
        dt = time - self.last_time
        distraction = False
        
        d1 = (eye1_ratio<self.ratio_thresh)
        d2 = (eye2_ratio<self.ratio_thresh)
        
        if self.eyes == 2:
            d = d1 and d2
        else:
            d = d1 or d2
        
        if d:
            self.eye_closed_time += dt
        else:
            self.eye_closed_time -= dt
            
        if self.eye_closed_time<0.0:
            self.eye_closed_time = 0.0
            
        print('Eye 1: '+str(eye1_ratio))
        print('Eye 2: '+str(eye2_ratio))
        print('Eye closed time = '+str(self.eye_closed_time))
            
        if self.eye_closed_time>=self.delta_time:
            distraction = True
            self.start(time)
        
        self.last_time = time
        return distraction

ratio_thresh 参数是假设眼睛闭合的最小宽高比值。delta_time 参数表示眼睛必须闭合多长时间才能判断是否发生了分心。eyes 参数决定是一个眼睛还是两个眼睛必须闭合才被视为分心。

最后，我们稍微修改了视频检测器，将此分心检测算法包含在代码中，并在检测发生时生成警报。

class VideoDDD:    
    def __init__(self, fd, fld, eye_ratio_thresh=0.2, eyes=2, delta_time=2.0):
        self.fd = fd
        self.fld = fld
        self.derd = DERD(eye_ratio_thresh, delta_time, eyes)
    
    def process(self, video):
        frame_count = 0
        detection_num = 0;
        dt = 0
        dt_l = 0
        
        capture = cv2.VideoCapture(video)
        img = None

        dname = 'Arm-Powered Driver Distraction Detection'
        cv2.namedWindow(dname, cv2.WINDOW_NORMAL)
        cv2.resizeWindow(dname, 720, 720)
        
        # just suppose FPS=25
        delta = 0.040
        
        dd_time = -1000
        
        draw_points = []
        faces = []
        
        # Capture all frames
        while(True):    
            frame_t1 = time.time()
            
            (ret, frame) = capture.read()
            if frame is None:
                break
            frame_count = frame_count+1
            frame_time = (frame_count-1)*delta
            
            if frame_count==1:
                self.derd.start(frame_time)
            
            # work with square images
            width = frame.shape[1]
            height = frame.shape[0]
            if not (width == height):
                dx = int((width-height)/2)
                frame = frame[0:height, dx:dx+height]
            
            f_count = 0
            if (frame_count % 10) == 0:
                faces = []
                draw_points = []
                t1 = time.time()
                faces = self.fd.detect(frame)
                t2 = time.time()
                dt = dt + (t2-t1)
                f_count = len(faces)
                detection_num += 1
            
            distraction = False
            
            if (f_count>0):
                # supposed one face at the camera
                face = faces[0]
                t1 = time.time()
                (fi_rect, fi_gray, fi_resized) = self.fld.get_face_image(frame, face)
                points = self.fld.detect(fi_resized)
                frame_points = self.fld.get_frame_points(fi_rect, points)
                t2 = time.time()
                dt_l = dt_l + (t2-t1)
                    
                draw_points.append(frame_points)
                    
                eye1 = self.fld.get_eye_points(frame_points, 1)
                eye2 = self.fld.get_eye_points(frame_points, 2)
                #draw_points.append(eye1)
                #draw_points.append(eye2)
                    
                r1 = self.fld.get_eye_ratio(eye1)
                r2 = self.fld.get_eye_ratio(eye2)
                    
                distraction = self.derd.detect(r1, r2, frame_time)
    
            if len(faces)>0:
                Utils.draw_faces(faces, (255, 0, 0), 1, frame, True)
            if len(draw_points)>0:
                for (i, points) in enumerate(draw_points):
                    Utils.draw_points(points, (0, 0, 255), 1, frame)
            
            # Show distraction alarm for 1 second
            if distraction:
                dd_time = frame_time
            
            if dd_time>0:
                text = "ALARM! DRIVER DISTRACTION"
                xd1 = 10
                yd1 = 50
                cv2.putText(frame, text, (xd1, yd1), \
                    cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 1, cv2.LINE_AA)
                if (frame_time-dd_time)>1.0:
                    dd_time = -1000
                
            
            # Display the resulting frame
            cv2.imshow(dname,frame)
            
            frame_t2 = time.time()
            frame_dt = frame_t2 - frame_t1
            if frame_dt<delta:
                frame_dt = delta-frame_dt
                #print('Sleep='+str(frame_dt))
                time.sleep(frame_dt)
            
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
            
        capture.release()
        cv2.destroyAllWindows()    
        
        fps = 0.0
        if dt>0:
            fps = detection_num/dt
            
        fps_l = 0.0
        if dt_l>0:
            fps_l = detection_num/dt_l
        
        return (detection_num, fps, fps_l)

除了使用 DERD 类之外，我们还对帧处理算法进行了微调。我们添加了帧的时间戳比较来估算潜在分心的时间间隔。此外，我们现在只处理十分之一的帧来模拟近实时处理。

现在我们可以使用以下代码运行完成的分心驾驶员检测算法

w_path = '/home/pi/Desktop/PI_DD'
n_path = os.path.join(w_path, 'net')
fd_model = os.path.join(n_path, 'opencv_face_detector_uint8.pb')
fd_graph = os.path.join(n_path, 'opencv_face_detector.pbtxt')
fd = TF_FD(fd_model, fd_graph, 30, 0.5)

fld_model = os.path.join(n_path, 'face_landmarks.caffemodel')
fld_proto = os.path.join(n_path, 'face_landmarks.prototxt')
fld = CAFFE_FLD(fld_model, fld_proto)

v_path = os.path.join(w_path, 'video')
v_name = 'v_1.mp4'
v_file = os.path.join(v_path, v_name)
vddd = VideoDDD(fd, fld, 0.3, 1, 2.0)

(detection_num, fps, fps_l) = vddd.process(v_file)

print("Face detections: "+str(detection_num))
print("Detection FPS: "+str(fps))
print("Landmarks FPS: "+str(fps_l))

您可以看到，该算法能正确处理眼睛在足够长的时间间隔内闭合的情况，并生成警报。

这有助于我们捕捉驾驶员分心的一种原因——当驾驶员低头看放在大腿上的手机时，我们的瞌睡检测器会将他们识别为分心，因为它只看到他们的眼睑。随着世界各地许多地区禁止驾驶时使用手机，驾驶员试图通过将手机藏起来来适应。但我们的分心检测器将通过检测他们的眼睛何时看起来没有完全睁开来捕捉他们。

方便的是，该算法还可以用于检测驾驶员的瞌睡。无论驾驶员的眼睛是仅仅因为低头看手机而看起来闭着，还是他们真的闭着（因为驾驶员疲劳或睡着了），我们的设备都应该会发出警报。

该算法还能正确处理眼睛在短时间内闭合（例如，驾驶员眨眼）或头部短暂倾斜的情况。

后续步骤

我们已经实现了一种基于面部标志点的驾驶员分心算法——但我们可以添加其他算法！例如，我们可以通过测量鼻子标志点之间的角度来检测驾驶员何时转头。我们还可以检查驾驶员的嘴巴是否在张合，方法是随时间比较上下嘴部标志点之间的距离。如果是这样，这可能意味着驾驶员在驾驶时正在说话或进食。

为了更进一步，我们可以考虑升级到能够进行虹膜检测的 ML 模型，并尝试确定驾驶员的眼睛何时没有看向道路。

在本文中，我们演示了为便携式 Arm 设备开发 AI 计算机视觉应用程序是多么简单。我们选择此解决方案是为了实用性，因为我们的驾驶员分心检测系统必须在驾驶汽车中自主运行。我们展示了该应用程序可以在 Arm 设备上以实时模式运行，达到约 2 FPS 的处理速度。

尽管如此，我们仍然可以研究许多方面来改进这个驾驶员分心检测系统。例如，我们能提高 FPS 吗？为了回答这个问题，我们应该关注应用程序中最慢的部分——使用 TensorFlow 神经网络的面部检测。我们能改进这个模型的性能吗？是的。我们可以使用 Arm 的 Arm NN 库，他们专门开发了该库来加速 Arm 设备上 DNN 模型的处理。

有了 Arm NN 库，我们还可以将 NN 模型运行在连接的 GPU 或 NPU 单元上，以实现接近实时的速度。这将为我们提供更大的灵活性来发明更高级的驾驶员分心检测算法，或者使用其他面部检测 DNN 模型，例如 BlazeFace 神经网络模型。

我们解决方案的其他改进可能涉及生成新的分心标准。例如，我们可以推断，如果驾驶员的眼睛或头部在超过预定时间间隔的时间内朝向其他地方，他们很可能会分心。

我们希望这些想法能激发您的兴趣。我们鼓励您在此基础上进行扩展，或者在 Arm 设备上创建您自己的便携式 AI 解决方案。

历史

2022 年 2 月 7 日：初始版本