将对象检测与 TensorFlow 添加到机器人项目中

Phil Hopley

5.00/5 (12投票s)

2018 年 12 月 2 日

CPOL

16分钟阅读

32111

857

在本文中，我们将为现有的 ROS（机器人操作系统）家居机器人添加人工智能。

下载源代码 - 2.4 MB

引言

我目前正在这个网站上记录一个业余机器人项目的开发，该项目属于一系列名为“Rodney - 一个迟来的自主机器人”的文章。我的桌子上堆着一叠写有未来开发想法的便利贴。其中一张纸条上写着“AI TensorFlow 对象检测”。我不记得是什么时候或因为什么而写下这张纸条，但鉴于 Code Project 目前正在举办“AI TensorFlow 挑战赛”，现在是时候研究这个主题了。

机器人的视角

背景

我的机器人使用机器人操作系统 (ROS)。这是机器人编程的实际标准，在本文中，我们将把 TensorFlow 集成到一个 ROS 包中。我会尽量简化 ROS 代码的细节，但如果您想了解更多，我建议您访问机器人操作系统网站并阅读我关于 Rodney 的文章。

Rodney 已经能够移动头部并环顾四周，还能向它识别出的家庭成员打招呼。为此，我们使用了 OpenCV 的人脸检测和识别功能。我们将以类似的方式使用 TensorFlow 来检测家中的物体，例如家庭宠物。

最终，机器人将能够在家中导航，寻找特定的家庭成员来传递信息。同样，想象一下您下班回家晚了，想看看家里的狗狗。通过使用 Web 界面，您可以指示机器人找到狗狗并向您显示它正在做什么的视频流。在我们家里，我们喜欢说我们的狗没有被宠坏，而是被爱着的。这意味着她可以在任何房间睡觉，所以机器人需要在家中导航。但目前在这篇文章中，我们只会在机器人头部移动范围内寻找狗狗。

如果您对 ROS 或机器人不感兴趣，我包含了一个非 ROS 测试脚本，可用于仅运行对象检测库中的代码。

我最近读了一本书，书中认为未来所有软件工程师都需要具备深度学习的实践知识。随着其他您需要关注的技术的快速发展，这是一个大胆的说法。那么，如果您不了解您的“秩-2 张量”和“损失函数”，您该如何开始呢？

在本文中，我将展示如何在有限的时间和有限的 TensorFlow 知识下，搭建一个完整的、可运行的系统。

站在巨人的肩膀上

好的，到目前为止，很明显我们将使用 Google 的 TensorFlow 进行机器学习。然而，Google 还提供了其他一些资源，可以使我们的对象检测应用程序更容易上手。

第一个是Tensorflow 对象检测 API。正如 GitHub 页面所述，该 API“......是一个构建在 TensorFlow 之上的开源框架，可以轻松构建、训练和部署对象检测模型”。

等一下，它说“......可以轻松构建、训练......”一个模型。这仍然需要大量工作，并且需要对神经网络有深入的了解，对吗？

嗯，不完全是，因为我们要利用的第二个资源是 Google 的Tensorflow 检测模型库。它包含许多预训练的对象检测模型，我们将下载一个可以识别 90 种不同对象类别的模型，并从我们的代码中访问它。

树莓派安装

现在我必须说这有点棘手。虽然互联网上有很多关于在各种设备（包括树莓派）上安装 TensorFlow 的视频和书面说明，但没有一个完全适合我的设置。

对于我的机器人开发，我使用Ubiquity Robotics网站上提供的免费树莓派镜像。该镜像已包含 ROS 的 Kinetic 版本、OpenCV，并且基于轻量级的 Ubuntu 版本 lubuntu。

在此 GitHub 站点上提供了复制我在 Rodney 上设置的说明。

如果您想运行非 ROS 版本代码，则需要以下内容。我安装的版本显示在括号中。即使没有 ROS，您可能仍想查看 GitHub 网站上的说明，了解如何设置和编译 Protocol Buffers 以及设置 Python 路径。

OpenCV2 (3.3.1-dev)
Python (2.7)
TensorFlow (1.11.0 for Python 2.7)
Protocol Buffers (3.6.1)

您还需要下载TensorFlow 模型库和一个预训练模型。由于我是在树莓派上运行，我需要一个运行速度快的模型，但缺点是它的检测精度较低。因此，我正在使用ssdlite_mobilenet_v2_coco模型。

如果您在基于树莓派的机器人中需要更高的精度，可以使用更慢但检测效果更好的模型。由于 ROS 可以跨分布式网络运行。您可以使用第二台专门运行模型的 Pi，甚至可以在具有更多计算能力的独立工作站上运行模型。

代码

我将详细描述包含对象检测代码的 ROS 包，其余机器人代码将使用图表进行说明。这段代码在“Rodney - 一个迟来的自主机器人”系列文章中有详细介绍，所有代码都可以在本文的下载源代码 zip 文件中找到。

ROS 代码可以用多种语言编写，Rodney 项目同时使用 C++ 和 Python。由于 TensorFlow 接口和 Google 的对象检测 API 示例代码都使用 Python，因此我们将使用 Python 来编写对象检测节点。

对象检测包

我们的 ROS 包名为 tf_object_detection，位于 tf_object_detection 文件夹中。该包包含多个子文件夹。

子文件夹 config 包含一个配置文件 config.yaml。该文件通过提供 Google 的 object_detection 文件夹的路径并设置置信度阈值来配置节点。模型运行时，除了检测到的对象名称外，还会给出 detect 对象的置信度。如果该置信度低于配置文件中给出的值，对象将被我们的代码忽略。

object_detection:
  confidence_level: 0.60
  path: '/home/ubuntu/git/models/research/object_detection'

子文件夹 launch 包含一个文件 test.launch，可用于在机器人环境中测试该节点，只使用最少的机器人代码和硬件。它将配置文件加载到 ROS 参数服务器，启动两个节点以从 Pi Camera 发布图像，并启动我们的对象检测节点。

<?xml version="1.0" ?>
<launch>
  <rosparam command="load" file="$(find tf_object_detection)/config/config.yaml" />  
  <include file="$(find raspicam_node)/launch/camerav2_1280x960.launch" />
  <node name="republish" type="republish" pkg="image_transport" 
  output="screen" args="compressed in:=/raspicam_node/image/ raw out:=/camera/image/raw" />
  <node pkg="tf_object_detection" type="tf_object_detection_node.py" 
  name="tf_object_detection_node" output="screen" />
</launch>

子文件夹 msg 包含一个文件 detection_results.msg。此文件用于创建一个用户定义的 ROS 消息，该消息将用于返回每个 supplied 图像中检测到的对象列表。

string[] names_detected

子文件夹 scripts 包含文件 non-ros-test.py 和一个测试图像。此 Python 程序可用于在机器人代码外部测试对象检测库。

代码导入我们的对象检测库，创建一个实例，传入包含模型和 0.5（50%）置信度阈值的路径。然后将测试图像加载为 OpenCV 图像，并调用 scan_for_objects 函数来运行对象检测模型。返回的列表包含高于置信度阈值的检测对象。它还通过在检测到的对象周围绘制框来修改 supplied 图像。然后，我们的测试代码打印检测到的对象的名称并显示结果图像。

#!/usr/bin/env python
import sys
import os
import cv2

sys.path.append('/home/ubuntu/git/tf_object_detection/src')

import object_detection_lib

# Create the instance of ObjectDetection
odc = object_detection_lib.ObjectDetection
     ('/home/ubuntu/git/models/research/object_detection', 0.5)

cvimg = cv2.imread("1268309/test_image.jpg")

# Detect the objects
object_names = odc.scan_for_objects(cvimg)
print(object_names)

cv2.imshow('object detection', cvimg)
cv2.waitKey(0)
cv2.destroyAllWindows()

cv2.imwrite('adjusted_test_image.jpg', cvimg)

子文件夹 src 包含节点和我们对象检测库的主要代码。我将先描述库，然后再描述 ROS 代码。

如上所述，此库可以在没有机器人代码的情况下使用，并且可以被 non-ros-test.py 脚本使用。它基于 Google 的对象检测 API 教程，该教程可以在此处找到。我已将其转换为一个 Python 类，并将创建 TensorFlow 会话的调用从 run_inference_for_single_image 函数移到 __init__ 函数。如果您每次都重新创建会话，会话 run 的调用会导致模型在每次运行时都自动调整。这会导致模型运行的延迟相当大。这样，我们只会在第一次提供图像时才会有延迟（约 45 秒），后续的调用将在 1 秒内返回。

代码首先导入所需的模块：numpy、tensorflow 和对象检测 API 的两个模块：label_map_util 和 visualization_utils。label_map_util 用于将模型返回的对象编号转换为命名对象。例如，当模型返回 ID 18 时，它与狗有关。visualization_utils 用于在图像上绘制框、对象标签和百分比置信度。

import numpy as np
import tensorflow as tf
from utils import label_map_util
from utils import visualization_utils as vis_util

类初始化函数从 supplied 路径加载模型并打开 TensorFlow 会话。它还加载标签映射文件并存储 supplied 置信度触发阈值。

    def __init__(self, path, confidence): # path will be to the 
                                         # models/research/object_detection directory
        # Pre-trained model name
        MODEL_NAME = 'ssdlite_mobilenet_v2_coco_2018_05_09'
        PATH_TO_FROZEN_GRAPH = path + '/' + MODEL_NAME + '/frozen_inference_graph.pb'
        PATH_TO_LABELS = path + '/data/' + 'mscoco_label_map.pbtxt'

        # Load a frozen Tensorflow model into memory
        self.__detection_graph = tf.Graph()

        with self.__detection_graph.as_default():
            od_graph_def = tf.GraphDef()
            with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
                serialized_graph = fid.read()
                od_graph_def.ParseFromString(serialized_graph)
                tf.import_graph_def(od_graph_def, name='')

            # Open a session here. The first time we run the session it will take
            # a time to run as it autotunes, after that it will run faster
            self.__sess = tf.Session(graph=self.__detection_graph)

        # Load the label map. Label maps map indices to category names
        self.__category_index = label_map_util.create_category_index_from_labelmap
                                              (PATH_TO_LABELS, use_display_name=True)

        # Store the confidence level
        self.__confidence = confidence

run_inference_for_single_image 函数是 TensorFlow 大部分工作发生的地方。第一部分获取一个包含张量图中所有操作的列表。使用该列表，我们创建一个字典来保存运行图时我们将感兴趣的张量。您可以将这些视为我们希望从运行图中获得的结果。我们感兴趣的可能张量包括：

num_detections - 此值将告诉我们图像中检测到的对象数量。
detection_boxes - 对于每个检测到的对象，此值将包含四个用于框定对象图像的坐标。我们不需要担心坐标系，因为我们将使用 API 来绘制框。
detection_scores - 对于每个检测到的对象，将有一个分数，表示系统识别该对象的置信度。将此值乘以 100 可得到百分比。
detection_classes - 对于每个检测到的对象，将有一个标识该对象的编号。同样，实际编号对我们来说并不重要，因为我们将使用 API 将编号转换为对象名称。
detection_mask - 我们使用的模型不包含此操作。基本上，对于包含此操作的模型，除了绘制边界框之外，您还可以为对象叠加蒙版。

    def run_inference_for_single_image(self, image):
        with self.__detection_graph.as_default():
            # Get handles to input and output tensors
            ops = tf.get_default_graph().get_operations()
            all_tensor_names = {output.name for op in ops for output in op.outputs}
            tensor_dict = {}
            for key in [
                'num_detections', 'detection_boxes', 'detection_scores',
                'detection_classes', 'detection_masks'
            ]:
                tensor_name = key + ':0'
                if tensor_name in all_tensor_names:
                    tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(tensor_name)

此函数的下一部分仅在模型确实包含 detection_mask 操作时适用。如果将来我们使用此类模型，我的库中仍保留了代码。

if 'detection_masks' in tensor_dict:
    # The following processing is only for single image
    detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
    detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
    # Reframe is required to translate mask from box coordinates 
    # to image coordinates and fit the image size.
    real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
    detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
    detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
    detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
        detection_masks, detection_boxes, image.shape[0], image.shape[1])
    detection_masks_reframed = tf.cast(
        tf.greater(detection_masks_reframed, 0.5), tf.uint8)
    # Follow the convention by adding back the batch dimension
    tensor_dict['detection_masks'] = tf.expand_dims(detection_masks_reframed, 0)

在检测蒙版之后，代码获取名为 'image_tensor:0' 的张量。如果其他操作被视为输出，则此张量是图的输入，我们将在此处馈入我们希望处理的图像。创建图时，这很可能是作为 TensorFlow 占位符创建的。

image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

然后，我们终于运行图来执行对象检测。在调用 run 时，我们传递包含我们希望获取的张量的字典，这些张量将作为 Python 字典（包含 numpy 数组）返回给我们。我们还传递一个字典，该字典指示我们希望替换哪些张量值。在我们的例子中，这是 image_tensor（可以说是输入），我们必须首先将其形状更改为符合模型期望。

    # Run inference
    output_dict = self.__sess.run(tensor_dict,feed_dict={image_tensor: np.expand_dims(image, 0)})

    # all outputs are float32 numpy arrays, so convert types as appropriate
    output_dict['num_detections'] = int(output_dict['num_detections'][0])
    output_dict['detection_classes'] = output_dict['detection_classes'][0].astype(np.uint8)
    output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
    output_dict['detection_scores'] = output_dict['detection_scores'][0]
    if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
return output_dict

当我们希望进行对象检测时，我们调用 scan_for_objects 函数，并传入一个 OpenCV 图像。在张量术语中，此图像将是一个张量，其形状为图像像素高度 x 图像像素宽度 x 3（红、绿、蓝值）。然后，我们调用 run_inference_for_single_image 函数来运行模型。返回的字典包含检测到的对象类别、任何对象的边界框坐标和置信度。此字典中的值被馈送到 visualize_boxes_and_labels_on_image_array，它将在对象周围绘制框，并为置信度超过 supplied 值的对象添加标签。然后，代码创建一个按名称排列的对象列表，这些对象的置信度超过阈值，并返回该列表。

# This class function will be called from outside to scan the supplied img.
# if objects are detected it will adjust the supplied image by drawing boxes around the objects
# The img parameter is an OpenCV image
def scan_for_objects(self, image_np):
    # The img is already a numpy array of size height,width, 3

    # Actual detection.
    output_dict = self.run_inference_for_single_image(image_np)

    #print output_dict

    vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        output_dict['detection_boxes'],
        output_dict['detection_classes'],
        output_dict['detection_scores'],
        self.__category_index,
        instance_masks=output_dict.get('detection_masks'),
        use_normalized_coordinates=True,
        line_thickness=8,
        min_score_thresh=self.__confidence)

    # Return a list of object names detected
    detected_list = []
    total_detections = output_dict['num_detections']
    if total_detections > 0:
        for detection in range(0, total_detections):
            if output_dict['detection_scores'][detection] > self.__confidence:
                category = output_dict['detection_classes'][detection]
                detected_list.insert(0,self.__category_index[category]['name'])

    return detected_list

就 TensorFlow 而言，这就是全部所需。我现在将简要描述此包的其余代码。如果您对机器人代码不感兴趣，可以跳到“使用代码”部分，该部分将描述在 ROS 环境外部运行代码。

我们的对象检测节点的 ROS 代码包含在 tf_object_detection_node.py 文件中。

每个 ROS 节点都是一个正在运行的进程。在 main 函数中，我们向 ROS 注册我们的节点，创建 ObjectDetectionNode 类的实例，记录节点已启动，并通过调用 rospy.spin 将控制权交还给 ROS。

def main(args):
    rospy.init_node('tf_object_detection_node', anonymous=False)
    odn = ObjectDetectionNode()
    rospy.loginfo("Object detection node started")
    try:
        rospy.spin()
    except KeyboardInterrupt:
        print("Shutting down")

if __name__ == '__main__':
    main(sys.argv)

ObjectDetectionNode 包含该节点的其余代码。类 __init__ 函数首先创建一个 CVBridge 实例，该实例用于将 ROS 图像消息转换为 OpenCV 图像，反之亦然。然后，我们注册节点将发布和订阅的 ROS 主题（消息）。

tf_object_detection_node/adjusted_image 主题将包含包含检测到的对象周围边界框的图像。tf_object_detection_node/result 主题将包含检测到的对象名称的列表。

第一个订阅的主题是 tf_object_detection_node/start，接收到它将调用 StartCallback，该回调在接收到下一个摄像头图像时启动对象检测。

我们订阅的第二个主题是 camera/image/raw，它将包含来自摄像头的图像，并导致调用 Imagecallback。

__init__ 的其余部分从参数服务器读取配置值并创建对象检测库的实例。

def __init__(self):
        self.__bridge = CvBridge()
        # Publisher to publish update image
        self.__image_pub = rospy.Publisher
                           ("tf_object_detection_node/adjusted_image", Image, queue_size=1)
        # Publisher to publish the result
        self.__result_pub = rospy.Publisher
                            ("tf_object_detection_node/result", detection_results, queue_size=1)
        # Subscribe to topic which will kick off object detection in the next image
        self.__command_sub = rospy.Subscriber
                             ("tf_object_detection_node/start", Empty, self.StartCallback)
        # Subscribe to the topic which will supply the image fom the camera
        self.__image_sub = rospy.Subscriber("camera/image/raw",Image, self.Imagecallback)

        # Flag to indicate that we have been requested to use the next image
        self.__scan_next = False

        # Read the path for models/research/object_detection directory 
        # from the parameter server or use this default
        object_detection_path = rospy.get_param('/object_detection/path', 
                                '/home/ubuntu/git/models/research/object_detection')

        # Read the confidence level, any object with a level below this will not be used
        confidence_level = rospy.get_param('/object_detection/confidence_level', 0.50)

        # Create the object_detection_lib class instance
        self.__odc = object_detection_lib.ObjectDetection
                        (object_detection_path, confidence_level)

当收到 tf_object_detection_node/start 主题上的消息时，StartCallback 函数所做的就是设置一个标志，指示当收到下一个摄像头图像时，我们应该在该图像上运行对象检测。

    # Callback for start command message
    def StartCallback(self, data):
        # Indicate to use the next image for the scan
        self.__scan_next = True

当收到 camera/image/raw 主题上的消息时，Imagecallback 函数会检查是否正在等待对象检测操作。如果是，我们重置标志，将图像从 ROS 图像转换为 OpenCV 图像，并调用 scan_for_objects 函数处理该图像。如果检测到对象，则 supplied 图像将被更新以包含边界框和标签。然后，此调整后的图像将发布到 tf_object_detection_node/adjusted_image 主题。此主题不被机器人内部使用，但我们可以检查它作为调试/测试的一部分。该函数的最后一部分创建了一个包含检测到的对象名称的 Python 列表。然后，该列表将发布到 tf_object_detection_node/result 主题，并将由请求对象检测扫描的节点处理。

    # Callback for new image received
    def Imagecallback(self, data):
        if self.__scan_next == True:
            self.__scan_next = False
            # Convert the ROS image to an OpenCV image
            image = self.__bridge.imgmsg_to_cv2(data, "bgr8")

            # The supplied image will be modified if known objects are detected
            object_names_detected = self.__odc.scan_for_objects(image)

            # publish the image, it may have been modified
            try:
                self.__image_pub.publish(self.__bridge.cv2_to_imgmsg(image, "bgr8"))
            except CvBridgeError as e:
                print(e)

            # Publish names of objects detected
            result = detection_results()
            result.names_detected = object_names_detected
            self.__result_pub.publish(result)

机器人系统概述

在本节中，我将非常简要地描述对象检测节点如何与其他节点协同工作。描述此问题的最简单方法是使用相关节点图。节点（进程）显示为椭圆形，与它们相对的线是包含消息的主题。该图的完整尺寸图像在 zip 文件中。

在最左边，keyboard 节点实际上运行在连接到同一 Wi-Fi 网络（机器人）的独立工作站上。这允许我们向机器人发出命令。对于“搜索狗”命令，我们选择任务 3 的“3”键。rodney_node 接收 keydown 消息，并向 rodney_missions 节点传递 mission_request。此请求包含参数“dog”，以便 rodney_missions_node 知道要搜索的对象名称。rodney_missions_node 是一个分层状态机，在执行检测狗的任务状态时，它可以请求 head_control_node 来移动头部/摄像头，并请求 tf_object_detection_node 运行 TensorFlow 图。

每次对象检测运行后，结果会返回给状态机，其中检测到的任何对象都会与它正在搜索的对象名称进行比较。如果未检测到对象，则将摄像头/头部移动到下一个位置并请求新的扫描。如果检测到目标对象，则摄像头将保持在该位置，允许操作员查看摄像头提要。当操作员准备继续时，他们可以通过按“a”键来确认并继续搜索，或按“c”键取消搜索。

图中最右边的节点 serial_node 用于与控制移动头部/摄像头的伺服电机的 Arduino Nano 通信。

从图中可以看出，一些主题，如 /missions/acknowledge 和 tf_object_detection/result，似乎没有连接到 rodney_missions_node，这仅仅是因为图是自动生成的，节点在状态机执行过程中动态连接到这些主题。

节点深入细节可以在“Rodney - 一个迟来的自主机器人”系列文章中找到。

Using the Code

在非 ROS 环境下测试

要仅测试对象检测库，请从 tf_object_detection/scripts 文件夹运行以下命令。

$ ./non-ros-test.py

注意：由于每次运行脚本时都会打开 TensorFlow 会话，因此 TensorFlow 图需要一些时间来运行，因为模型将在每次运行时自动调整。

短时间后，将显示一个带有边界框对象和对象标签的图像，并在终端打印检测到的对象列表。

测试输入图像和生成的输出图像

在机器人硬件上测试

如果尚未完成，请在树莓派上创建 catkin 工作区，并使用以下命令对其进行初始化

$ mkdir -p ~/rodney_ws/src
$ cd ~/rodney_ws/
$ catkin_make

将 zip 文件中的包以及 ros-keyboard 包（来自https://github.com/lrse/ros-keyboard）复制到 ~/rodney_ws/src 文件夹。

使用以下命令构建代码

$ cd ~/rodney_ws/ 
$ catkin_make

构建代码后，所有节点都将在树莓派上使用 rodney 启动文件启动。

$ cd ~/rodney_ws/
$ roslaunch rodney rodney.launch

我们可以在独立的控制站上构建代码，然后使用以下命令在该控制站上启动键盘节点

$ cd ~/rodney_ws 
$ source devel/setup.bash 
$ export ROS_MASTER_URI=http://ubiquityrobot:11311 
$ rosrun keyboard keyboard

应该会运行一个标题为“ROS keyboard input”的小窗口。确保键盘窗口具有焦点，然后按“3”键启动任务 3。

Rodney 执行任务 3（狗检测）的视频

关注点

在本文中，我们不仅研究了对象检测 API 和预训练模型的使用，还将其集成到现有项目中以实际应用。

历史

2018/12/01：初始发布
2018/12/11：子部分更改
2019/04/28：修复了指向树莓派镜像说明的损坏链接