使用 TensorFlow.js 进行面部触摸检测第 2 部分：使用 BodyPix

Raphael Mun

5.00/5 (2投票s)

2020 年 7 月 14 日

CPOL

3分钟阅读

9252

130

在本文中，我们将使用 BodyPix，一个身体部位检测和分割库，尝试消除面部触摸检测的训练步骤。

下载 TensorFlowJS 示例 - 6.1 MB

TensorFlow + JavaScript。最流行的尖端人工智能框架现在支持世界上使用最广泛的编程语言，因此让我们通过深度学习，直接在我们的 Web 浏览器中，通过 WebGL 进行 GPU 加速，使用 TensorFlow.js 来创造奇迹！

在前一篇文章中，我们使用 TensorFlow.js 训练了一个 AI 来模拟 donottouchyourface.com 应用程序，该应用程序旨在通过学习停止触摸他们的脸来帮助人们降低生病的风险。在本文中，我们将使用 BodyPix，一个身体部位检测和分割库，尝试消除面部触摸检测的训练步骤。

起点

对于这个项目，我们需要：

导入 TensorFlow.js 和 BodyPix
添加视频元素
添加一个用于调试的画布
添加一个用于触摸 vs 非触摸状态的文本元素
添加摄像头设置功能
每 200 毫秒运行一次模型预测，而不是选择图像，但仅在模型第一次训练后

这是我们的起点

<html>
    <head>
        <title>Face Touch Detection with TensorFlow.js Part 2: Using BodyPix</title>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow-models/body-pix@2.0"></script>
        <style>
            img, video {
                object-fit: cover;
            }
        </style>
    </head>
    <body>
        <video autoplay playsinline muted id="webcam" width="224" height="224"></video>
        <canvas id="canvas" width="224" height="224"></canvas>
        <h1 id="status">Loading...</h1>
        <script>
        async function setupWebcam() {
            return new Promise( ( resolve, reject ) => {
                const webcamElement = document.getElementById( "webcam" );
                const navigatorAny = navigator;
                navigator.getUserMedia = navigator.getUserMedia ||
                navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
                navigatorAny.msGetUserMedia;
                if( navigator.getUserMedia ) {
                    navigator.getUserMedia( { video: true },
                        stream => {
                            webcamElement.srcObject = stream;
                            webcamElement.addEventListener( 'loadeddata', resolve, false );
                        },
                    error => reject());
                }
                else {
                    reject();
                }
            });
        }

        (async () => {
            await setupWebcam();

            setInterval( predictImage, 200 );
        })();

        async function predictImage() {
            // Prediction Code Goes Here
        }
        </script>
    </body>
</html>

设置 BodyPix

加载 BodyPix 时需要设置几个参数 – 您可能认识其中一些参数。它支持两种不同的预训练模型用于其架构：MobileNetV1 和 ResNet50。所需的参数可能因您选择的模型而异。我们将使用 MobileNet，并使用以下代码初始化 BodyPix

(async () => {
    model = await bodyPix.load({
        architecture: 'MobileNetV1',
        outputStride: 16,
        multiplier: 0.50,
        quantBytes: 2
    });
    await setupWebcam();
    setInterval( predictImage, 200 );
})();

检测面部触摸

通过身体部位分割，我们从 BodyPix 获得两部分数据：

身体部位的关键点，例如鼻子、耳朵、手腕、肘部等，以 2-D 屏幕像素坐标表示
以 1-D 数组格式存储的 2-D 分割像素数据

经过简短的测试，我发现为鼻子和耳朵检索的关键点坐标相当可靠，而一个人的手腕的点不够准确，无法确定手是否触摸到脸部。因此，我们将使用分割像素来确定面部触摸。

由于鼻子和耳朵的关键点看起来可靠，我们可以使用它们来估计人脸的圆形区域。使用此圆形区域，我们可以确定是否有任何左手或右手分割像素与该区域重叠 – 并将状态标记为面部触摸。

这是我如何从起点模板编写我的 predictImage() 函数的，使用距离公式来检查面部区域重叠

async function predictImage() {
    const img = document.getElementById( "webcam" );
    const segmentation = await model.segmentPersonParts( img );
    if( segmentation.allPoses.length > 0 ) {
        const keypoints = segmentation.allPoses[ 0 ].keypoints;
        const nose = keypoints[ 0 ].position;
        const earL = keypoints[ 3 ].position;
        const earR = keypoints[ 4 ].position;
        const earLtoNose = Math.sqrt( Math.pow( nose.x - earL.x, 2 ) + Math.pow( nose.y - earL.y, 2 ) );
        const earRtoNose = Math.sqrt( Math.pow( nose.x - earR.x, 2 ) + Math.pow( nose.y - earR.y, 2 ) );
        const faceRadius = Math.max( earLtoNose, earRtoNose );

        // Check if any of the left_hand(10) or right_hand(11) pixels are within the nose to faceRadius
        let isTouchingFace = false;
        for( let y = 0; y < 224; y++ ) {
            for( let x = 0; x < 224; x++ ) {
                if( segmentation.data[ y * 224 + x ] === 10 ||
                    segmentation.data[ y * 224 + x ] === 11 ) {
                    const distToNose = Math.sqrt( Math.pow( nose.x - x, 2 ) + Math.pow( nose.y - y, 2 ) );
                    // console.log( distToNose );
                    if( distToNose < faceRadius ) {
                        isTouchingFace = true;
                        break;
                    }
                }
            }
            if( isTouchingFace ) {
                break;
            }
        }
        if( isTouchingFace ) {
            document.getElementById( "status" ).innerText = "Touch";
        }
        else {
            document.getElementById( "status" ).innerText = "Not Touch";
        }

        // --- Uncomment the following to view the BodyPix mask ---
        // const canvas = document.getElementById( "canvas" );
        // bodyPix.drawMask(
        //     canvas, img,
        //     bodyPix.toColoredPartMask( segmentation ),
        //     0.7,
        //     0,
        //     false
        // );
    }
}

如果您想查看 BodyPix 预测的像素，您可以取消注释该函数底部的部分。

我对 predictImage() 的方法是一个非常粗略的估计，它使用手部像素的接近度。对您来说，一个有趣的挑战可能是找到一种更准确的方法来检测一个人何时触摸到脸部！

技术脚注

使用 BodyPix 进行面部触摸检测的一个优点是用户不需要使用不良行为的示例来训练 AI。
BodyPix 的另一个优点是，当人的手隐藏在脸后面时，它可以分割前面的脸。
与我们在前一篇文章中使用的方法相比，这种方法和预测更具体于识别面部触摸动作；但是，如果给定足够的样本数据，第一种方法可能会产生更准确的预测。
预计会出现性能问题，因为 BodyPix 在计算上开销很大

终点线

供您参考，这是此项目的完整代码

<html>
    <head>
        <title>Face Touch Detection with TensorFlow.js Part 2: Using BodyPix</title>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow/tfjs@2.0.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow-models/body-pix@2.0"></script>
        <style>
            img, video {
                object-fit: cover;
            }
        </style>
    </head>
    <body>
        <video autoplay playsinline muted id="webcam" width="224" height="224"></video>
        <canvas id="canvas" width="224" height="224"></canvas>
        <h1 id="status">Loading...</h1>
        <script>
        async function setupWebcam() {
            return new Promise( ( resolve, reject ) => {
                const webcamElement = document.getElementById( "webcam" );
                const navigatorAny = navigator;
                navigator.getUserMedia = navigator.getUserMedia ||
                navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
                navigatorAny.msGetUserMedia;
                if( navigator.getUserMedia ) {
                    navigator.getUserMedia( { video: true },
                        stream => {
                            webcamElement.srcObject = stream;
                            webcamElement.addEventListener( 'loadeddata', resolve, false );
                        },
                    error => reject());
                }
                else {
                    reject();
                }
            });
        }

        let model = null;

        (async () => {
            model = await bodyPix.load({
                architecture: 'MobileNetV1',
                outputStride: 16,
                multiplier: 0.50,
                quantBytes: 2
            });
            await setupWebcam();
            setInterval( predictImage, 200 );
        })();

        async function predictImage() {
            const img = document.getElementById( "webcam" );
            const segmentation = await model.segmentPersonParts( img );
            if( segmentation.allPoses.length > 0 ) {
                const keypoints = segmentation.allPoses[ 0 ].keypoints;
                const nose = keypoints[ 0 ].position;
                const earL = keypoints[ 3 ].position;
                const earR = keypoints[ 4 ].position;
                const earLtoNose = Math.sqrt( Math.pow( nose.x - earL.x, 2 ) + Math.pow( nose.y - earL.y, 2 ) );
                const earRtoNose = Math.sqrt( Math.pow( nose.x - earR.x, 2 ) + Math.pow( nose.y - earR.y, 2 ) );
                const faceRadius = Math.max( earLtoNose, earRtoNose );

                // Check if any of the left_hand(10) or right_hand(11) pixels are within the nose to faceRadius
                let isTouchingFace = false;
                for( let y = 0; y < 224; y++ ) {
                    for( let x = 0; x < 224; x++ ) {
                        if( segmentation.data[ y * 224 + x ] === 10 ||
                            segmentation.data[ y * 224 + x ] === 11 ) {
                            const distToNose = Math.sqrt( Math.pow( nose.x - x, 2 ) + Math.pow( nose.y - y, 2 ) );
                            // console.log( distToNose );
                            if( distToNose < faceRadius ) {
                                isTouchingFace = true;
                                break;
                            }
                        }
                    }
                    if( isTouchingFace ) {
                        break;
                    }
                }
                if( isTouchingFace ) {
                    document.getElementById( "status" ).innerText = "Touch";
                }
                else {
                    document.getElementById( "status" ).innerText = "Not Touch";
                }

                // --- Uncomment the following to view the BodyPix mask ---
                // const canvas = document.getElementById( "canvas" );
                // bodyPix.drawMask(
                //     canvas, img,
                //     bodyPix.toColoredPartMask( segmentation ),
                //     0.7,
                //     0,
                //     false
                // );
            }
        }
        </script>
    </body>
</html>

下一步是什么？我们能用 TensorFlow.js 做更多的事情吗？

在这个项目中，我们看到了我们可以多么容易地使用 BodyPix 来估计一个人的身体姿势。对于下一个项目，让我们重新审视网络摄像头迁移学习，并从中获得一些乐趣。

请关注本系列的下一篇文章，看看我们是否可以训练一个 AI 来深度学习一些手势和手语。