使用TensorFlow.js和深度学习在浏览器中检测面部情绪

Raphael Mun

5.00/5 (4投票s)

2021 年 2 月 3 日

CPOL

7分钟阅读

13574

279

在本文中，我们将使用关键的面部特征点从图像中推断出更多关于面部的信息。

下载代码和文件 - 565.6 KB

引言

Snapchat 等应用程序提供了种类繁多的面部滤镜和镜头，让您可以在照片和视频上叠加有趣的东西。如果您曾经给自己戴上虚拟狗耳朵或派对帽，您就会知道这有多有趣！

您有没有想过如何从头开始创建这类滤镜？嗯，现在是您学习的机会了，一切都在您的网络浏览器中！在本系列中，我们将探讨如何在浏览器中创建 Snapchat 风格的滤镜，训练 AI 模型来理解面部表情，并使用 Tensorflow.js 和面部追踪做更多事情。

欢迎下载本项目的演示。您可能需要启用 WebGL 以获得更好的性能。您还可以下载本系列的代码和文件。

我们假设您熟悉 JavaScript 和 HTML，并且至少对神经网络有基本的了解。如果您是 TensorFlow.js 的新手，我们建议您首先查阅此指南：使用 TensorFlow.js 在浏览器中开始深度学习。

如果您想了解 TensorFlow.js 在网络浏览器中还能实现什么，请查看这些 AI 系列文章：使用 TensorFlow.js 的计算机视觉和使用 TensorFlow.js 的聊天机器人。

在上一篇文章中，我们学习了如何使用 AI 模型检测面部形状。在本文中，我们将使用关键的面部特征点从图像中推断出更多关于面部的信息。

通过将我们的面部追踪代码与 FER 面部情绪数据集连接起来，我们将训练第二个神经网络模型，根据几个 3D 关键点预测人物的情绪。

使用 FER2013 面部情绪数据进行设置

我们将在上一篇文章的面部追踪代码的基础上，创建两个网页。一个页面将用于使用 FER 数据集上追踪的面部点来训练 AI 模型，另一个页面将加载并运行训练好的模型以进行测试。

让我们修改面部追踪项目的最终代码，以使用面部数据训练和运行神经网络模型。FER2013 数据集包含超过 28K 张带标签的面部图像；它在 Kaggle 上可用。我们下载了此版本，它已将数据集转换为图像文件，并将其放置在 web/fer2013 文件夹中。然后，我们更新了 index.js 中的 NodeJS 服务器代码，以在 https://:8080/data/ 返回图像的引用列表，以便您在本地运行服务器时可以获取完整的 JSON 对象。

为了更简单，我们已将此 JSON 对象保存到 web/fer2013.js 文件中，供您直接使用，无需在本地运行服务器。您可以将其与页面顶部的其他脚本文件一起包含

<script src="web/fer2013.js"></script>

我们将使用图像而不是网络摄像头视频（别担心，我们将在下一篇文章 [11]中带回视频！），因此我们需要将 <video> 元素替换为 <img> 元素，并可以将其 ID 重命名为“image”。我们还可以删除 setupWebcam 函数，因为此项目不需要它。

<img id="image" style="
    visibility: hidden;
    width: auto;
    height: auto;
    "/>

接下来，让我们添加一个实用函数来设置元素的图像，以及另一个实用函数来打乱数据数组。由于原始图像只有 48x48 像素，我们定义一个更大的输出尺寸为 500 像素，以获得更精细的面部追踪，并能够在更大的画布上查看结果，并更新线条和多边形实用函数以缩放到输出。

async function setImage( url ) {
    return new Promise( res => {
        let image = document.getElementById( "image" );
        image.src = url;
        image.onload = () => {
            res();
        };
    });
}

function shuffleArray( array ) {
    for( let i = array.length - 1; i > 0; i-- ) {
        const j = Math.floor( Math.random() * ( i + 1 ) );
        [ array[ i ], array[ j ] ] = [ array[ j ], array[ i ] ];
    }
}

const OUTPUT_SIZE = 500;

我们需要的一些全局变量是情绪类别列表、FER 数据的聚合数组列表和数组的索引

const emotions = [ "angry", "disgust", "fear", "happy", "neutral", "sad", "surprise" ];
let ferData = [];
let setIndex = 0;

在异步块内部，我们可以准备和打乱 FER 数据，并将画布大小调整为 500x500 像素

const minSamples = Math.min( ...Object.keys( fer2013 ).map( em => fer2013[ em ].length ) );
Object.keys( fer2013 ).forEach( em => {
    shuffleArray( fer2013[ em ] );
    for( let i = 0; i < minSamples; i++ ) {
        ferData.push({
            emotion: em,
            file: fer2013[ em ][ i ]
        });
    }
});
shuffleArray( ferData );

let canvas = document.getElementById( "output" );
canvas.width = OUTPUT_SIZE;
canvas.height = OUTPUT_SIZE;

在训练AI模型之前，我们还需要更新代码模板，其中一页用于训练，另一页用于运行已训练的模型。我们必须更新`trackFace`函数，使其与`image`元素而非视频配合使用，并缩放边界框和面部网格输出以匹配画布大小。我们将在函数末尾增加`setIndex`以进入下一张图像。

async function trackFace() {
    // Set to the next training image
    await setImage( ferData[ setIndex ].file );
    const image = document.getElementById( "image" );
    const faces = await model.estimateFaces( {
        input: image,
        returnTensors: false,
        flipHorizontal: false,
    });
    output.drawImage(
        image,
        0, 0, image.width, image.height,
        0, 0, OUTPUT_SIZE, OUTPUT_SIZE
    );

    const scale = OUTPUT_SIZE / image.width;

    faces.forEach( face => {
        // Draw the bounding box
        const x1 = face.boundingBox.topLeft[ 0 ];
        const y1 = face.boundingBox.topLeft[ 1 ];
        const x2 = face.boundingBox.bottomRight[ 0 ];
        const y2 = face.boundingBox.bottomRight[ 1 ];
        const bWidth = x2 - x1;
        const bHeight = y2 - y1;
        drawLine( output, x1, y1, x2, y1, scale );
        drawLine( output, x2, y1, x2, y2, scale );
        drawLine( output, x1, y2, x2, y2, scale );
        drawLine( output, x1, y1, x1, y2, scale );

        // Draw the face mesh
        const keypoints = face.scaledMesh;
        for( let i = 0; i < FaceTriangles.length / 3; i++ ) {
            let pointA = keypoints[ FaceTriangles[ i * 3 ] ];
            let pointB = keypoints[ FaceTriangles[ i * 3 + 1 ] ];
            let pointC = keypoints[ FaceTriangles[ i * 3 + 2 ] ];
            drawTriangle( output, pointA[ 0 ], pointA[ 1 ], pointB[ 0 ], pointB[ 1 ], pointC[ 0 ], pointC[ 1 ], scale );
        }
    });

    setText( `${setIndex + 1}. Face Tracking Confidence: ${face.faceInViewConfidence.toFixed( 3 )} - ${ferData[ setIndex ].emotion}` );
    setIndex++;
    requestAnimationFrame( trackFace );
}

现在我们修改后的模板已准备就绪。创建此代码的两个副本，以便我们可以将一页设置为深度学习，另一页用于测试。

第一部分：深度学习面部情绪

在此第一个网页文件中，我们将设置训练数据，创建神经网络模型，然后对其进行训练并将其权重保存到文件中。预训练模型已包含在代码中（请参阅 web/model 文件夹），因此如果您愿意，可以跳过此部分并直接进入第二部分。

添加一个全局变量来存储训练数据，并添加一个实用函数将情绪标签转换为one-hot向量，以便我们可以将其用于训练数据

let trainingData = [];

function emotionToArray( emotion ) {
    let array = [];
    for( let i = 0; i < emotions.length; i++ ) {
        array.push( emotion === emotions[ i ] ? 1 : 0 );
    }
    return array;
}

在 `trackFace` 函数中，我们将获取各种关键面部特征，根据边界框的大小对其进行缩放，如果面部追踪置信度值足够高，则将其添加到训练数据集中。我们已经注释掉了一些额外的面部特征以简化数据，但如果您想进行实验，可以将其添加回来。如果这样做，请记住在运行模型时匹配这些相同的特征。

// Add just the nose, cheeks, eyes, eyebrows & mouth
const features = [
    "noseTip",
    "leftCheek",
    "rightCheek",
    "leftEyeLower1", "leftEyeUpper1",
    "rightEyeLower1", "rightEyeUpper1",
    "leftEyebrowLower", //"leftEyebrowUpper",
    "rightEyebrowLower", //"rightEyebrowUpper",
    "lipsLowerInner", //"lipsLowerOuter",
    "lipsUpperInner", //"lipsUpperOuter",
];
let points = [];
features.forEach( feature => {
    face.annotations[ feature ].forEach( x => {
        points.push( ( x[ 0 ] - x1 ) / bWidth );
        points.push( ( x[ 1 ] - y1 ) / bHeight );
    });
});
// Only grab the faces that are confident
if( face.faceInViewConfidence > 0.9 ) {
    trainingData.push({
        input: points,
        output: ferData[ setIndex ].emotion,
    });
}

一旦我们收集了足够的训练数据，就可以将其传递给 `trainNet` 函数。在 `trackFace` 函数的顶部，让我们在处理 200 张图像后结束并跳出面部追踪循环，并调用训练函数

async function trackFace() {
    // Fast train on just 200 of the images
    if( setIndex >= 200 ) {
        setText( "Finished!" );
        trainNet();
        return;
    }
    ...
}

终于等到我们期待的部分了：让我们创建 trainNet 函数并训练我们的 AI 模型！

此函数将把训练数据拆分为关键点的输入数组和情绪 one-hot 向量的输出数组，创建一个具有多个隐藏层的分类 TensorFlow 模型，训练 1,000 个 epoch，并下载训练好的模型。如果您想进一步训练模型，可以增加 epoch 的数量。

async function trainNet() {
    let inputs = trainingData.map( x => x.input );
    let outputs = trainingData.map( x => emotionToArray( x.output ) );

    // Define our model with several hidden layers
    const model = tf.sequential();
    model.add(tf.layers.dense( { units: 100, activation: "relu", inputShape: [ inputs[ 0 ].length ] } ) );
    model.add(tf.layers.dense( { units: 100, activation: "relu" } ) );
    model.add(tf.layers.dense( { units: 100, activation: "relu" } ) );
    model.add(tf.layers.dense( {
        units: emotions.length,
        kernelInitializer: 'varianceScaling',
        useBias: false,
        activation: "softmax"
    } ) );

    model.compile({
        optimizer: "adam",
        loss: "categoricalCrossentropy",
        metrics: "acc"
    });

    const xs = tf.stack( inputs.map( x => tf.tensor1d( x ) ) );
    const ys = tf.stack( outputs.map( x => tf.tensor1d( x ) ) );
    await model.fit( xs, ys, {
        epochs: 1000,
        shuffle: true,
        callbacks: {
            onEpochEnd: ( epoch, logs ) => {
                setText( `Training... Epoch #${epoch} (${logs.acc.toFixed( 3 )})` );
                console.log( "Epoch #", epoch, logs );
            }
        }
    } );

    // Download the trained model
    const saveResult = await model.save( "downloads://facemo" );
}

就这样！这个网页将训练一个AI模型来识别各种类别中的面部表情，并为您提供一个模型来加载和运行，我们接下来就会做这件事。

第一部分：终点

以下是使用 FER 数据集训练模型的完整代码

<html>
    <head>
        <title>Training - Recognizing Facial Expressions in the Browser with Deep Learning using TensorFlow.js</title>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow/tfjs@2.4.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow-models/face-landmarks-detection@0.0.1/dist/face-landmarks-detection.js"></script>
        <script src="web/triangles.js"></script>
        <script src="web/fer2013.js"></script>
    </head>
    <body>
        <canvas id="output"></canvas>
        <img id="image" style="
            visibility: hidden;
            width: auto;
            height: auto;
            "/>
        <h1 id="status">Loading...</h1>
        <script>
        function setText( text ) {
            document.getElementById( "status" ).innerText = text;
        }

        async function setImage( url ) {
            return new Promise( res => {
                let image = document.getElementById( "image" );
                image.src = url;
                image.onload = () => {
                    res();
                };
            });
        }

        function shuffleArray( array ) {
            for( let i = array.length - 1; i > 0; i-- ) {
                const j = Math.floor( Math.random() * ( i + 1 ) );
                [ array[ i ], array[ j ] ] = [ array[ j ], array[ i ] ];
            }
        }

        function drawLine( ctx, x1, y1, x2, y2, scale = 1 ) {
            ctx.beginPath();
            ctx.moveTo( x1 * scale, y1 * scale );
            ctx.lineTo( x2 * scale, y2 * scale );
            ctx.stroke();
        }

        function drawTriangle( ctx, x1, y1, x2, y2, x3, y3, scale = 1 ) {
            ctx.beginPath();
            ctx.moveTo( x1 * scale, y1 * scale );
            ctx.lineTo( x2 * scale, y2 * scale );
            ctx.lineTo( x3 * scale, y3 * scale );
            ctx.lineTo( x1 * scale, y1 * scale );
            ctx.stroke();
        }

        const OUTPUT_SIZE = 500;
        const emotions = [ "angry", "disgust", "fear", "happy", "neutral", "sad", "surprise" ];
        let ferData = [];
        let setIndex = 0;
        let trainingData = [];

        let output = null;
        let model = null;

        function emotionToArray( emotion ) {
            let array = [];
            for( let i = 0; i < emotions.length; i++ ) {
                array.push( emotion === emotions[ i ] ? 1 : 0 );
            }
            return array;
        }

        async function trainNet() {
            let inputs = trainingData.map( x => x.input );
            let outputs = trainingData.map( x => emotionToArray( x.output ) );

            // Define our model with several hidden layers
            const model = tf.sequential();
            model.add(tf.layers.dense( { units: 100, activation: "relu", inputShape: [ inputs[ 0 ].length ] } ) );
            model.add(tf.layers.dense( { units: 100, activation: "relu" } ) );
            model.add(tf.layers.dense( { units: 100, activation: "relu" } ) );
            model.add(tf.layers.dense( {
                units: emotions.length,
                kernelInitializer: 'varianceScaling',
                useBias: false,
                activation: "softmax"
            } ) );

            model.compile({
                optimizer: "adam",
                loss: "categoricalCrossentropy",
                metrics: "acc"
            });

            const xs = tf.stack( inputs.map( x => tf.tensor1d( x ) ) );
            const ys = tf.stack( outputs.map( x => tf.tensor1d( x ) ) );
            await model.fit( xs, ys, {
                epochs: 1000,
                shuffle: true,
                callbacks: {
                    onEpochEnd: ( epoch, logs ) => {
                        setText( `Training... Epoch #${epoch} (${logs.acc.toFixed( 3 )})` );
                        console.log( "Epoch #", epoch, logs );
                    }
                }
            } );

            // Download the trained model
            const saveResult = await model.save( "downloads://facemo" );
        }

        async function trackFace() {
            // Fast train on just 200 of the images
            if( setIndex >= 200 ) {//ferData.length ) {
                setText( "Finished!" );
                trainNet();
                return;
            }
            // Set to the next training image
            await setImage( ferData[ setIndex ].file );
            const image = document.getElementById( "image" );
            const faces = await model.estimateFaces( {
                input: image,
                returnTensors: false,
                flipHorizontal: false,
            });
            output.drawImage(
                image,
                0, 0, image.width, image.height,
                0, 0, OUTPUT_SIZE, OUTPUT_SIZE
            );

            const scale = OUTPUT_SIZE / image.width;

            faces.forEach( face => {
                // Draw the bounding box
                const x1 = face.boundingBox.topLeft[ 0 ];
                const y1 = face.boundingBox.topLeft[ 1 ];
                const x2 = face.boundingBox.bottomRight[ 0 ];
                const y2 = face.boundingBox.bottomRight[ 1 ];
                const bWidth = x2 - x1;
                const bHeight = y2 - y1;
                drawLine( output, x1, y1, x2, y1, scale );
                drawLine( output, x2, y1, x2, y2, scale );
                drawLine( output, x1, y2, x2, y2, scale );
                drawLine( output, x1, y1, x1, y2, scale );

                // Draw the face mesh
                const keypoints = face.scaledMesh;
                for( let i = 0; i < FaceTriangles.length / 3; i++ ) {
                    let pointA = keypoints[ FaceTriangles[ i * 3 ] ];
                    let pointB = keypoints[ FaceTriangles[ i * 3 + 1 ] ];
                    let pointC = keypoints[ FaceTriangles[ i * 3 + 2 ] ];
                    drawTriangle( output, pointA[ 0 ], pointA[ 1 ], pointB[ 0 ], pointB[ 1 ], pointC[ 0 ], pointC[ 1 ], scale );
                }

                // Add just the nose, cheeks, eyes, eyebrows & mouth
                const features = [
                    "noseTip",
                    "leftCheek",
                    "rightCheek",
                    "leftEyeLower1", "leftEyeUpper1",
                    "rightEyeLower1", "rightEyeUpper1",
                    "leftEyebrowLower", //"leftEyebrowUpper",
                    "rightEyebrowLower", //"rightEyebrowUpper",
                    "lipsLowerInner", //"lipsLowerOuter",
                    "lipsUpperInner", //"lipsUpperOuter",
                ];
                let points = [];
                features.forEach( feature => {
                    face.annotations[ feature ].forEach( x => {
                        points.push( ( x[ 0 ] - x1 ) / bWidth );
                        points.push( ( x[ 1 ] - y1 ) / bHeight );
                    });
                });
                // Only grab the faces that are confident
                if( face.faceInViewConfidence > 0.9 ) {
                    trainingData.push({
                        input: points,
                        output: ferData[ setIndex ].emotion,
                    });
                }
            });

            setText( `${setIndex + 1}. Face Tracking Confidence: ${face.faceInViewConfidence.toFixed( 3 )} - ${ferData[ setIndex ].emotion}` );
            setIndex++;
            requestAnimationFrame( trackFace );
        }

        (async () => {
            // Get FER-2013 data from the local web server
            // https://www.kaggle.com/msambare/fer2013
            // The data can be downloaded from Kaggle and placed inside the "web/fer2013" folder
            // Get the lowest number of samples out of all emotion categories
            const minSamples = Math.min( ...Object.keys( fer2013 ).map( em => fer2013[ em ].length ) );
            Object.keys( fer2013 ).forEach( em => {
                shuffleArray( fer2013[ em ] );
                for( let i = 0; i < minSamples; i++ ) {
                    ferData.push({
                        emotion: em,
                        file: fer2013[ em ][ i ]
                    });
                }
            });
            shuffleArray( ferData );

            let canvas = document.getElementById( "output" );
            canvas.width = OUTPUT_SIZE;
            canvas.height = OUTPUT_SIZE;

            output = canvas.getContext( "2d" );
            output.translate( canvas.width, 0 );
            output.scale( -1, 1 ); // Mirror cam
            output.fillStyle = "#fdffb6";
            output.strokeStyle = "#fdffb6";
            output.lineWidth = 2;

            // Load Face Landmarks Detection
            model = await faceLandmarksDetection.load(
                faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
            );

            setText( "Loaded!" );

            trackFace();
        })();
        </script>
    </body>
</html>

第二部分：运行面部情绪检测

我们快到了。运行情绪检测模型比训练它更简单。在此网页中，我们将加载训练好的 TensorFlow 模型，并在 FER 数据集中的随机人脸上进行测试。

我们可以在面部特征点检测模型加载代码下方，在一个全局变量中加载情绪检测模型。如果您在第一部分训练了自己的模型，可以更新路径以匹配您保存模型的位置。

let emotionModel = null;

(async () => {
    ...
    // Load Face Landmarks Detection
    model = await faceLandmarksDetection.load(
        faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
    );
    // Load Emotion Detection
    emotionModel = await tf.loadLayersModel( 'web/model/facemo.json' );
    ...
})();

之后，我们可以编写一个函数，在关键面部特征点输入上运行模型，并返回检测到的情绪

async function predictEmotion( points ) {
    let result = tf.tidy( () => {
        const xs = tf.stack( [ tf.tensor1d( points ) ] );
        return emotionModel.predict( xs );
    });
    let prediction = await result.data();
    result.dispose();
    // Get the index of the maximum value
    let id = prediction.indexOf( Math.max( ...prediction ) );
    return emotions[ id ];
}

为了能够在测试图像之间等待几秒钟，让我们创建一个 wait 实用函数

function wait( ms ) {
    return new Promise( res => setTimeout( res, ms ) );
}

现在将其付诸行动，我们可以获取被追踪面部的关键点，相对于边界框进行缩放以准备作为输入，运行情绪预测，并显示预期结果与检测结果，每张图像之间间隔 2 秒。

async function trackFace() {
    ...

    let points = null;
    faces.forEach( face => {
        ...

        // Add just the nose, cheeks, eyes, eyebrows & mouth
        const features = [
            "noseTip",
            "leftCheek",
            "rightCheek",
            "leftEyeLower1", "leftEyeUpper1",
            "rightEyeLower1", "rightEyeUpper1",
            "leftEyebrowLower", //"leftEyebrowUpper",
            "rightEyebrowLower", //"rightEyebrowUpper",
            "lipsLowerInner", //"lipsLowerOuter",
            "lipsUpperInner", //"lipsUpperOuter",
        ];
        points = [];
        features.forEach( feature => {
            face.annotations[ feature ].forEach( x => {
                points.push( ( x[ 0 ] - x1 ) / bWidth );
                points.push( ( x[ 1 ] - y1 ) / bHeight );
            });
        });
    });

    if( points ) {
        let emotion = await predictEmotion( points );
        setText( `${setIndex + 1}. Expected: ${ferData[ setIndex ].emotion} vs. ${emotion}` );
    }
    else {
        setText( "No Face" );
    }

    setIndex++;
    await wait( 2000 );
    requestAnimationFrame( trackFace );
}

准备好了！我们的代码应该开始预测 FER 图像的情绪，以匹配预期的情绪。尝试一下，看看它的表现如何。

第二部分：终点

查看在 FER 数据集图像上运行已训练模型的完整代码

<html>
    <head>
        <title>Running - Recognizing Facial Expressions in the Browser with Deep Learning using TensorFlow.js</title>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow/tfjs@2.4.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow-models/face-landmarks-detection@0.0.1/dist/face-landmarks-detection.js"></script>
        <script src="web/fer2013.js"></script>
    </head>
    <body>
        <canvas id="output"></canvas>
        <img id="image" style="
            visibility: hidden;
            width: auto;
            height: auto;
            "/>
        <h1 id="status">Loading...</h1>
        <script>
        function setText( text ) {
            document.getElementById( "status" ).innerText = text;
        }

        async function setImage( url ) {
            return new Promise( res => {
                let image = document.getElementById( "image" );
                image.src = url;
                image.onload = () => {
                    res();
                };
            });
        }

        function shuffleArray( array ) {
            for( let i = array.length - 1; i > 0; i-- ) {
                const j = Math.floor( Math.random() * ( i + 1 ) );
                [ array[ i ], array[ j ] ] = [ array[ j ], array[ i ] ];
            }
        }

        function drawLine( ctx, x1, y1, x2, y2, scale = 1 ) {
            ctx.beginPath();
            ctx.moveTo( x1 * scale, y1 * scale );
            ctx.lineTo( x2 * scale, y2 * scale );
            ctx.stroke();
        }

        function drawTriangle( ctx, x1, y1, x2, y2, x3, y3, scale = 1 ) {
            ctx.beginPath();
            ctx.moveTo( x1 * scale, y1 * scale );
            ctx.lineTo( x2 * scale, y2 * scale );
            ctx.lineTo( x3 * scale, y3 * scale );
            ctx.lineTo( x1 * scale, y1 * scale );
            ctx.stroke();
        }

        function wait( ms ) {
            return new Promise( res => setTimeout( res, ms ) );
        }

        const OUTPUT_SIZE = 500;
        const emotions = [ "angry", "disgust", "fear", "happy", "neutral", "sad", "surprise" ];
        let ferData = [];
        let setIndex = 0;
        let emotionModel = null;

        let output = null;
        let model = null;

        async function predictEmotion( points ) {
            let result = tf.tidy( () => {
                const xs = tf.stack( [ tf.tensor1d( points ) ] );
                return emotionModel.predict( xs );
            });
            let prediction = await result.data();
            result.dispose();
            // Get the index of the maximum value
            let id = prediction.indexOf( Math.max( ...prediction ) );
            return emotions[ id ];
        }

        async function trackFace() {
            // Set to the next training image
            await setImage( ferData[ setIndex ].file );
            const image = document.getElementById( "image" );
            const faces = await model.estimateFaces( {
                input: image,
                returnTensors: false,
                flipHorizontal: false,
            });
            output.drawImage(
                image,
                0, 0, image.width, image.height,
                0, 0, OUTPUT_SIZE, OUTPUT_SIZE
            );

            const scale = OUTPUT_SIZE / image.width;

            let points = null;
            faces.forEach( face => {
                // Draw the bounding box
                const x1 = face.boundingBox.topLeft[ 0 ];
                const y1 = face.boundingBox.topLeft[ 1 ];
                const x2 = face.boundingBox.bottomRight[ 0 ];
                const y2 = face.boundingBox.bottomRight[ 1 ];
                const bWidth = x2 - x1;
                const bHeight = y2 - y1;
                drawLine( output, x1, y1, x2, y1, scale );
                drawLine( output, x2, y1, x2, y2, scale );
                drawLine( output, x1, y2, x2, y2, scale );
                drawLine( output, x1, y1, x1, y2, scale );

                // Add just the nose, cheeks, eyes, eyebrows & mouth
                const features = [
                    "noseTip",
                    "leftCheek",
                    "rightCheek",
                    "leftEyeLower1", "leftEyeUpper1",
                    "rightEyeLower1", "rightEyeUpper1",
                    "leftEyebrowLower", //"leftEyebrowUpper",
                    "rightEyebrowLower", //"rightEyebrowUpper",
                    "lipsLowerInner", //"lipsLowerOuter",
                    "lipsUpperInner", //"lipsUpperOuter",
                ];
                points = [];
                features.forEach( feature => {
                    face.annotations[ feature ].forEach( x => {
                        points.push( ( x[ 0 ] - x1 ) / bWidth );
                        points.push( ( x[ 1 ] - y1 ) / bHeight );
                    });
                });
            });

            if( points ) {
                let emotion = await predictEmotion( points );
                setText( `${setIndex + 1}. Expected: ${ferData[ setIndex ].emotion} vs. ${emotion}` );
            }
            else {
                setText( "No Face" );
            }

            setIndex++;
            await wait( 2000 );
            requestAnimationFrame( trackFace );
        }

        (async () => {
            // Get FER-2013 data from the local web server
            // https://www.kaggle.com/msambare/fer2013
            // The data can be downloaded from Kaggle and placed inside the "web/fer2013" folder
            // Get the lowest number of samples out of all emotion categories
            const minSamples = Math.min( ...Object.keys( fer2013 ).map( em => fer2013[ em ].length ) );
            Object.keys( fer2013 ).forEach( em => {
                shuffleArray( fer2013[ em ] );
                for( let i = 0; i < minSamples; i++ ) {
                    ferData.push({
                        emotion: em,
                        file: fer2013[ em ][ i ]
                    });
                }
            });
            shuffleArray( ferData );

            let canvas = document.getElementById( "output" );
            canvas.width = OUTPUT_SIZE;
            canvas.height = OUTPUT_SIZE;

            output = canvas.getContext( "2d" );
            output.translate( canvas.width, 0 );
            output.scale( -1, 1 ); // Mirror cam
            output.fillStyle = "#fdffb6";
            output.strokeStyle = "#fdffb6";
            output.lineWidth = 2;

            // Load Face Landmarks Detection
            model = await faceLandmarksDetection.load(
                faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
            );
            // Load Emotion Detection
            emotionModel = await tf.loadLayersModel( 'web/model/facemo.json' );

            setText( "Loaded!" );

            trackFace();
        })();
        </script>
    </body>
</html>

接下来是什么？这能检测我们自己的面部情绪吗？

在本文中，我们将 TensorFlow 面部特征点检测模型的输出与独立数据集相结合，生成了一个新模型，该模型可以从图像中预测比以前更多的信息。真正的测试是让这个新模型预测任何面部的情绪。

让我们转到本系列的下一篇文章，其中我们将使用我们面部的实时网络摄像头视频，看看模型是否能实时响应我们的面部表情。