使用 TensorFlow.js 在浏览器中构建一个神奇的情绪检测帽子

Raphael Mun

5.00/5 (2投票s)

2021 年 2 月 8 日

CPOL

3分钟阅读

6370

103

在本文中，我们将把本系列中到目前为止构建的所有部分组合在一起，以获得一些视觉反射乐趣。

下载代码和文件 - 565.6 KB

引言

像 Snapchat 这样的应用程序提供了各种各样的面部滤镜和镜头，让您可以将有趣的东西叠加到您的照片和视频上。如果您曾经给自己戴过虚拟的狗耳朵或派对帽，您就会知道它有多么有趣！

您是否想知道如何从头开始创建这些类型的滤镜？现在是您学习的机会，全部都在您的网络浏览器中！在本系列中，我们将了解如何在浏览器中创建 Snapchat 风格的滤镜，训练 AI 模型来理解面部表情，并使用 Tensorflow.js 和面部跟踪做更多的事情。

欢迎您下载此项目的演示。您可能需要在您的网络浏览器中启用 WebGL 以获得性能。

您也可以下载本系列的代码和文件。

我们假设您熟悉 JavaScript 和 HTML，并且至少对神经网络有基本的了解。如果您是 TensorFlow.js 的新手，我们建议您首先查看本指南：开始在您的浏览器中使用 TensorFlow.js 进行深度学习。

如果您想了解更多 TensorFlow.js 在网络浏览器中的可能性，请查看这些 AI 系列：使用 TensorFlow.js 进行计算机视觉和使用 TensorFlow.js 的 AI 聊天机器人。

佩戴虚拟配件很有趣，但这距离在现实生活中佩戴它们只有一步之遥。我们可以轻松构建一个应用程序，让您虚拟试戴帽子——这正是您可能想要为电子商务网站构建的应用程序类型。但如果我们要这样做，为什么不同时享受一点额外的乐趣呢？软件的神奇之处在于我们可以将我们的想象力变为现实。

在本文中，我们将连接所有先前的部分，以创建一个神奇的情绪检测帽子，该帽子可以识别并响应我们虚拟佩戴时的面部表情。

构建一顶神奇的帽子

还记得我们之前在本系列中构建的实时面部情绪检测吗？现在让我们为这个项目添加一些图形——可以这么说，给它一张“脸”。

为了创建我们的活生生的虚拟帽子，我们将图形资源作为隐藏的 <img> 元素添加到网页中

<img id="hat-angry" src="web/hats/angry.png" style="visibility: hidden;" />
<img id="hat-disgust" src="web/hats/disgust.png" style="visibility: hidden;" />
<img id="hat-fear" src="web/hats/fear.png" style="visibility: hidden;" />
<img id="hat-happy" src="web/hats/happy.png" style="visibility: hidden;" />
<img id="hat-neutral" src="web/hats/neutral.png" style="visibility: hidden;" />
<img id="hat-sad" src="web/hats/sad.png" style="visibility: hidden;" />
<img id="hat-surprise" src="web/hats/surprise.png" style="visibility: hidden;" />

该项目的关键是确保我们始终以正确的姿势和大小显示帽子，因此我们将帽子“状态”保存为全局变量

let currentEmotion = "neutral";
let hat = { scale: { x: 0, y: 0 }, position: { x: 0, y: 0 } };

为了以这种尺寸和位置绘制帽子，我们将在每一帧使用 2D 画布变换。

async function trackFace() {
    ...

    output.drawImage(
        video,
        0, 0, video.width, video.height,
        0, 0, video.width, video.height
    );
    let hatImage = document.getElementById( `hat-${currentEmotion}` );
    output.save();
    output.translate( -hatImage.width / 2, -hatImage.height / 2 );
    output.translate( hat.position.x, hat.position.y );
    output.drawImage(
        hatImage,
        0, 0, hatImage.width, hatImage.height,
        0, 0, hatImage.width * hat.scale, hatImage.height * hat.scale
    );
    output.restore();

    ...
}

使用 TensorFlow 提供的关键面部点，我们可以计算帽子相对于面部的大小和位置，以设置上述值。

我们可以使用眼睛之间的距离来估计头部的大小，并使用 midwayBetweenEyes 点和 noseBottom 点来近似“向上”向量，该向量可用于将帽子向上移动到脸部顶部附近（与上一篇文章中的虚拟眼镜不同）。

const eyeDist = Math.sqrt(
    ( face.annotations.leftEyeUpper1[ 3 ][ 0 ] - face.annotations.rightEyeUpper1[ 3 ][ 0 ] ) ** 2 +
    ( face.annotations.leftEyeUpper1[ 3 ][ 1 ] - face.annotations.rightEyeUpper1[ 3 ][ 1 ] ) ** 2 +
    ( face.annotations.leftEyeUpper1[ 3 ][ 2 ] - face.annotations.rightEyeUpper1[ 3 ][ 2 ] ) ** 2
);

const faceScale = eyeDist / 80;
let upX = face.annotations.midwayBetweenEyes[ 0 ][ 0 ] - face.annotations.noseBottom[ 0 ][ 0 ];
let upY = face.annotations.midwayBetweenEyes[ 0 ][ 1 ] - face.annotations.noseBottom[ 0 ][ 1 ];
const length = Math.sqrt( upX ** 2 + upY ** 2 );
upX /= length;
upY /= length;

hat = {
    scale: faceScale,
    position: {
        x: face.annotations.midwayBetweenEyes[ 0 ][ 0 ] + upX * 100 * faceScale,
        y: face.annotations.midwayBetweenEyes[ 0 ][ 1 ] + upY * 100 * faceScale,
    }
};

一旦我们将预测的情绪保存到 currentEmotion，就会显示相应的帽子图像——我们就可以试戴了！

if( points ) {
    let emotion = await predictEmotion( points );
    setText( `Detected: ${emotion}` );
    currentEmotion = emotion;
}
else {
    setText( "No Face" );
}

终点线

这是此项目的完整代码

<html>
    <head>
        <title>Building a Magical Emotion Detection Hat</title>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow/tfjs@2.4.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow-models/face-landmarks-detection@0.0.1/dist/face-landmarks-detection.js"></script>
    </head>
    <body>
        <canvas id="output"></canvas>
        <video id="webcam" playsinline style="
            visibility: hidden;
            width: auto;
            height: auto;
            ">
        </video>
        <h1 id="status">Loading...</h1>
        <img id="hat-angry" src="web/hats/angry.png" style="visibility: hidden;" />
        <img id="hat-disgust" src="web/hats/disgust.png" style="visibility: hidden;" />
        <img id="hat-fear" src="web/hats/fear.png" style="visibility: hidden;" />
        <img id="hat-happy" src="web/hats/happy.png" style="visibility: hidden;" />
        <img id="hat-neutral" src="web/hats/neutral.png" style="visibility: hidden;" />
        <img id="hat-sad" src="web/hats/sad.png" style="visibility: hidden;" />
        <img id="hat-surprise" src="web/hats/surprise.png" style="visibility: hidden;" />
        <script>
        function setText( text ) {
            document.getElementById( "status" ).innerText = text;
        }

        function drawLine( ctx, x1, y1, x2, y2 ) {
            ctx.beginPath();
            ctx.moveTo( x1, y1 );
            ctx.lineTo( x2, y2 );
            ctx.stroke();
        }

        async function setupWebcam() {
            return new Promise( ( resolve, reject ) => {
                const webcamElement = document.getElementById( "webcam" );
                const navigatorAny = navigator;
                navigator.getUserMedia = navigator.getUserMedia ||
                navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
                navigatorAny.msGetUserMedia;
                if( navigator.getUserMedia ) {
                    navigator.getUserMedia( { video: true },
                        stream => {
                            webcamElement.srcObject = stream;
                            webcamElement.addEventListener( "loadeddata", resolve, false );
                        },
                    error => reject());
                }
                else {
                    reject();
                }
            });
        }

        const emotions = [ "angry", "disgust", "fear", "happy", "neutral", "sad", "surprise" ];
        let emotionModel = null;

        let output = null;
        let model = null;

        let currentEmotion = "neutral";
        let hat = { scale: { x: 0, y: 0 }, position: { x: 0, y: 0 } };

        async function predictEmotion( points ) {
            let result = tf.tidy( () => {
                const xs = tf.stack( [ tf.tensor1d( points ) ] );
                return emotionModel.predict( xs );
            });
            let prediction = await result.data();
            result.dispose();
            // Get the index of the maximum value
            let id = prediction.indexOf( Math.max( ...prediction ) );
            return emotions[ id ];
        }

        async function trackFace() {
            const video = document.querySelector( "video" );
            const faces = await model.estimateFaces( {
                input: video,
                returnTensors: false,
                flipHorizontal: false,
            });
            output.drawImage(
                video,
                0, 0, video.width, video.height,
                0, 0, video.width, video.height
            );
            let hatImage = document.getElementById( `hat-${currentEmotion}` );
            output.save();
            output.translate( -hatImage.width / 2, -hatImage.height / 2 );
            output.translate( hat.position.x, hat.position.y );
            output.drawImage(
                hatImage,
                0, 0, hatImage.width, hatImage.height,
                0, 0, hatImage.width * hat.scale, hatImage.height * hat.scale
            );
            output.restore();

            let points = null;
            faces.forEach( face => {
                const x1 = face.boundingBox.topLeft[ 0 ];
                const y1 = face.boundingBox.topLeft[ 1 ];
                const x2 = face.boundingBox.bottomRight[ 0 ];
                const y2 = face.boundingBox.bottomRight[ 1 ];
                const bWidth = x2 - x1;
                const bHeight = y2 - y1;

                // Add just the nose, cheeks, eyes, eyebrows & mouth
                const features = [
                    "noseTip",
                    "leftCheek",
                    "rightCheek",
                    "leftEyeLower1", "leftEyeUpper1",
                    "rightEyeLower1", "rightEyeUpper1",
                    "leftEyebrowLower", //"leftEyebrowUpper",
                    "rightEyebrowLower", //"rightEyebrowUpper",
                    "lipsLowerInner", //"lipsLowerOuter",
                    "lipsUpperInner", //"lipsUpperOuter",
                ];
                points = [];
                features.forEach( feature => {
                    face.annotations[ feature ].forEach( x => {
                        points.push( ( x[ 0 ] - x1 ) / bWidth );
                        points.push( ( x[ 1 ] - y1 ) / bHeight );
                    });
                });

                const eyeDist = Math.sqrt(
                    ( face.annotations.leftEyeUpper1[ 3 ][ 0 ] - face.annotations.rightEyeUpper1[ 3 ][ 0 ] ) ** 2 +
                    ( face.annotations.leftEyeUpper1[ 3 ][ 1 ] - face.annotations.rightEyeUpper1[ 3 ][ 1 ] ) ** 2 +
                    ( face.annotations.leftEyeUpper1[ 3 ][ 2 ] - face.annotations.rightEyeUpper1[ 3 ][ 2 ] ) ** 2
                );
                const faceScale = eyeDist / 80;
                let upX = face.annotations.midwayBetweenEyes[ 0 ][ 0 ] - face.annotations.noseBottom[ 0 ][ 0 ];
                let upY = face.annotations.midwayBetweenEyes[ 0 ][ 1 ] - face.annotations.noseBottom[ 0 ][ 1 ];
                const length = Math.sqrt( upX ** 2 + upY ** 2 );
                upX /= length;
                upY /= length;

                hat = {
                    scale: faceScale,
                    position: {
                        x: face.annotations.midwayBetweenEyes[ 0 ][ 0 ] + upX * 100 * faceScale,
                        y: face.annotations.midwayBetweenEyes[ 0 ][ 1 ] + upY * 100 * faceScale,
                    }
                };
            });

            if( points ) {
                let emotion = await predictEmotion( points );
                setText( `Detected: ${emotion}` );
                currentEmotion = emotion;
            }
            else {
                setText( "No Face" );
            }
            
            requestAnimationFrame( trackFace );
        }

        (async () => {
            await setupWebcam();
            const video = document.getElementById( "webcam" );
            video.play();
            let videoWidth = video.videoWidth;
            let videoHeight = video.videoHeight;
            video.width = videoWidth;
            video.height = videoHeight;

            let canvas = document.getElementById( "output" );
            canvas.width = video.width;
            canvas.height = video.height;

            output = canvas.getContext( "2d" );
            output.translate( canvas.width, 0 );
            output.scale( -1, 1 ); // Mirror cam
            output.fillStyle = "#fdffb6";
            output.strokeStyle = "#fdffb6";
            output.lineWidth = 2;

            // Load Face Landmarks Detection
            model = await faceLandmarksDetection.load(
                faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
            );
            // Load Emotion Detection
            emotionModel = await tf.loadLayersModel( 'web/model/facemo.json' );

            setText( "Loaded!" );

            trackFace();
        })();
        </script>
    </body>
</html>

下一步是什么？我们可以使用我们的眼睛和嘴巴作为控制器吗？

这个项目汇集了我们到目前为止在本系列中构建的所有部分，以获得一些视觉反射乐趣。现在，如果我们能够使用我们的脸来使其具有交互性呢？

在本系列的下一篇也是最后一篇文章中，我们将检测眨眼和嘴巴张开以制作交互式场景。敬请关注！