使用 YOLO Core ML 模型构建对象检测 iOS 应用

Jarek Szczegielniak

5.00/5 (2投票s)

2020年11月30日

CPOL

3分钟阅读

15737

655

在本系列的最后一篇文章中，我们将扩展应用程序以使用我们的 YOLO v2 模型进行对象检测。

下载 iOS YOLO - 92 MB

本系列假设您熟悉 Python、Conda 和 ONNX，并且在 Xcode 中开发 iOS 应用程序方面有一些经验。欢迎下载此项目的源代码。我们将使用 macOS 10.15+、Xcode 11.7+ 和 iOS 13+ 运行代码。

将模型添加到我们的应用程序

我们需要做的第一件事是复制 yolov2-pipeline.mlmodel（我们之前已将其保存到 ObjectDetectionDemo iOS 应用程序的 Models 文件夹中）并将其添加到项目文件中

在捕获的视频帧上运行对象检测

要使用我们的模型，我们需要对之前文章中的代码进行一些更改。

VideoCapture 类的 startCapture 方法需要接受并存储 Vision 框架请求参数 VNRequest

public func startCapture(_ visionRequest: VNRequest?) {
    if visionRequest != nil {
        self.visionRequests = [visionRequest!]
    } else {
        self.visionRequests = []
    }
        
    if !captureSession.isRunning {
        captureSession.startRunning()
    }
}

现在让我们添加具有以下 createObjectDetectionVisionRequest 方法的 ObjectDetection 类

createObjectDetectionVisionRequest method:
public func createObjectDetectionVisionRequest() -> VNRequest? {
    do {
        let model = yolov2_pipeline().model
        let visionModel = try VNCoreMLModel(for: model)
        let objectRecognition = VNCoreMLRequest(model: visionModel, completionHandler: { (request, error) in
            DispatchQueue.main.async(execute: {
                if let results = request.results {
                    self.processVisionRequestResults(results)
                }
            })
        })

        objectRecognition.imageCropAndScaleOption = .scaleFill
        return objectRecognition
    } catch let error as NSError {
        print("Model loading error: \(error)")
        return nil
    }
}

请注意，我们对 imageCropAndScaleOption 使用了 .scaleFill 值。这在将捕获的 480 x 640 大小缩放到模型所需的 416 x 416 大小时，会对图像引入轻微的失真。这不会对结果产生重大影响。另一方面，它将使进一步的缩放操作更简单。

引入的代码需要在主 ViewController 类中使用

self.videoCapture = VideoCapture(self.cameraView.layer)
self.objectDetection = ObjectDetection(self.cameraView.layer, videoFrameSize: self.videoCapture.getCaptureFrameSize())
        
let visionRequest = self.objectDetection.createObjectDetectionVisionRequest()
self.videoCapture.startCapture(visionRequest)

有了这样一个框架，我们就可以在每次捕获视频帧时执行 visionRequest 中定义的逻辑

public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
    guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
        return
    }
        
    let frameOrientation: CGImagePropertyOrientation = .up
    let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: frameOrientation, options: [:])
    do {
        try imageRequestHandler.perform(self.visionRequests)
    } catch {
        print(error)
    }
}

通过以上更改，yolov2_pipeline 模型用于每个捕获的帧，然后将检测结果传递给 ObjectDetection.processVisionRequestResults 方法。感谢我们之前实现的管道模型的 model_decoder 和 model_nms 组件，iOS 端不需要解码逻辑。我们只需读取最可能的观察结果 (objectObservation) 并在捕获的帧上绘制相应的框（调用 createBoundingBoxLayer 和 addSublayer 方法）

private func processVisionRequestResults(_ results: [Any]) {
    CATransaction.begin()
    CATransaction.setValue(kCFBooleanTrue, forKey: kCATransactionDisableActions)
        
    self.objectDetectionLayer.sublayers = nil
    for observation in results where observation is VNRecognizedObjectObservation {
        guard let objectObservation = observation as? VNRecognizedObjectObservation else {
            continue
        }

        let topLabelObservation = objectObservation.labels[0]
        let objectBounds = VNImageRectForNormalizedRect(
            objectObservation.boundingBox,
            Int(self.objectDetectionLayer.bounds.width), Int(self.objectDetectionLayer.bounds.height))
            
        let bbLayer = self.createBoundingBoxLayer(objectBounds, identifier: topLabelObservation.identifier, confidence: topLabelObservation.confidence)
        self.objectDetectionLayer.addSublayer(bbLayer)
    }
    CATransaction.commit()
}

绘制边界框

绘制框相对简单，并且与我们应用程序的机器学习部分无关。这里的主要困难是使用正确的比例和坐标系：“0,0”对于模型来说意味着左上角，但对于 iOS 和 Vision 框架来说意味着左下角。

ObjectDetection 类的两个方法将处理这个问题：setupObjectDetectionLayer 和 createBoundingBoxLayer。前者为框准备图层

private func setupObjectDetectionLayer(_ viewLayer: CALayer, _ videoFrameSize: CGSize) {
    self.objectDetectionLayer = CALayer()
    self.objectDetectionLayer.name = "ObjectDetectionLayer"
    self.objectDetectionLayer.bounds = CGRect(x: 0.0,
                                     y: 0.0,
                                     width: videoFrameSize.width,
                                     height: videoFrameSize.height)
    self.objectDetectionLayer.position = CGPoint(x: viewLayer.bounds.midX, y: viewLayer.bounds.midY)
        
    viewLayer.addSublayer(self.objectDetectionLayer)

    let bounds = viewLayer.bounds
       
    let scale = fmax(bounds.size.width  / videoFrameSize.width, bounds.size.height / videoFrameSize.height)

    CATransaction.begin()
    CATransaction.setValue(kCFBooleanTrue, forKey: kCATransactionDisableActions)
    
    self.objectDetectionLayer.setAffineTransform(CGAffineTransform(scaleX: scale, y: -scale))
    self.objectDetectionLayer.position = CGPoint(x: bounds.midX, y: bounds.midY)        
    CATransaction.commit()
}

createBoundingBoxLayer 方法创建要绘制的形状

private func createBoundingBoxLayer(_ bounds: CGRect, identifier: String, confidence: VNConfidence) -> CALayer {
    let path = UIBezierPath(rect: bounds)
        
    let boxLayer = CAShapeLayer()
    boxLayer.path = path.cgPath
    boxLayer.strokeColor = UIColor.red.cgColor
    boxLayer.lineWidth = 2
    boxLayer.fillColor = CGColor(colorSpace: CGColorSpaceCreateDeviceRGB(), components: [0.0, 0.0, 0.0, 0.0])
        
    boxLayer.bounds = bounds
    boxLayer.position = CGPoint(x: bounds.midX, y: bounds.midY)
    boxLayer.name = "Detected Object Box"
    boxLayer.backgroundColor = CGColor(colorSpace: CGColorSpaceCreateDeviceRGB(), components: [0.5, 0.5, 0.2, 0.3])
    boxLayer.cornerRadius = 6

    let textLayer = CATextLayer()
    textLayer.name = "Detected Object Label"
        
    textLayer.string = String(format: "\(identifier)\n(%.2f)", confidence)
    textLayer.fontSize = CGFloat(16.0)
        
    textLayer.bounds = CGRect(x: 0, y: 0, width: bounds.size.width - 10, height: bounds.size.height - 10)
    textLayer.position = CGPoint(x: bounds.midX, y: bounds.midY)
    textLayer.alignmentMode = .center
    textLayer.foregroundColor =  UIColor.red.cgColor
    textLayer.contentsScale = 2.0
        
    textLayer.setAffineTransform(CGAffineTransform(scaleX: 1.0, y: -1.0))
        
    boxLayer.addSublayer(textLayer)       
    return boxLayer
}

应用程序运行中

恭喜 – 我们有了一个可以工作的对象检测应用程序，我们可以在现实生活中进行测试，或者 – 在我们的例子中 – 使用来自 Pixels portal 的免费剪辑。

请注意，YOLO v2 模型对图像方向非常敏感，至少对于某些对象类（例如，Person 类）。如果您在处理之前旋转帧，其检测结果会变差。

这可以使用来自 Open Images 数据集的任何示例图像来说明。

完全相同的图像 – 以及两个截然不同的结果。我们需要牢记这一点，以确保馈送到模型的图像具有正确的方向。

下一步？

就这样！这是一段漫长的旅程，但我们终于走到了尽头。我们有一个用于实时视频流中对象检测的工作 iOS 应用程序。

如果您正在尝试决定下一步该做什么，那么您只会受到想象力的限制。考虑以下问题

你如何利用你所学到的知识为你的汽车构建一个危险检测器？如果你的 iPhone 安装在汽车的仪表板上，面向前方，你能否利用你所学到的知识制作一个可以检测危险并警告驾驶员的应用程序？
你能否构建一个鸟类探测器，当他们最喜欢的鸟降落在他们的喂食器上时，会提醒鸟类爱好者？
如果你的花园里有鹿和其他害虫，它们会漫游并吃掉你种植的所有蔬菜，你是否可以使用你新获得的 iOS 对象检测技能来构建一个由 iOS 驱动的电子稻草人来吓跑害虫？

可能性是无限的。为什么不尝试以上想法之一 - 或者提出你自己的想法 - 然后写下来呢？ CodeProject 社区很乐意看到你提出的东西。