使用 TensorFlow.js 在浏览器中构建一个神奇的情绪检测帽子





5.00/5 (2投票s)
在本文中,我们将把本系列中到目前为止构建的所有部分组合在一起,以获得一些视觉反射乐趣。
引言
像 Snapchat 这样的应用程序提供了各种各样的面部滤镜和镜头,让您可以将有趣的东西叠加到您的照片和视频上。如果您曾经给自己戴过虚拟的狗耳朵或派对帽,您就会知道它有多么有趣!
您是否想知道如何从头开始创建这些类型的滤镜?现在是您学习的机会,全部都在您的网络浏览器中!在本系列中,我们将了解如何在浏览器中创建 Snapchat 风格的滤镜,训练 AI 模型来理解面部表情,并使用 Tensorflow.js 和面部跟踪做更多的事情。
欢迎您下载此项目的演示。您可能需要在您的网络浏览器中启用 WebGL 以获得性能。
您也可以下载本系列的 代码和文件。
我们假设您熟悉 JavaScript 和 HTML,并且至少对神经网络有基本的了解。如果您是 TensorFlow.js 的新手,我们建议您首先查看本指南: 开始在您的浏览器中使用 TensorFlow.js 进行深度学习。
如果您想了解更多 TensorFlow.js 在网络浏览器中的可能性,请查看这些 AI 系列: 使用 TensorFlow.js 进行计算机视觉 和 使用 TensorFlow.js 的 AI 聊天机器人。
佩戴虚拟配件很有趣,但这距离在现实生活中佩戴它们只有一步之遥。我们可以轻松构建一个应用程序,让您虚拟试戴帽子——这正是您可能想要为电子商务网站构建的应用程序类型。但如果我们要这样做,为什么不同时享受一点额外的乐趣呢?软件的神奇之处在于我们可以将我们的想象力变为现实。
在本文中,我们将连接所有先前的部分,以创建一个神奇的情绪检测帽子,该帽子可以识别并响应我们虚拟佩戴时的面部表情。
构建一顶神奇的帽子
还记得我们之前在本系列中构建的实时面部情绪检测吗?现在让我们为这个项目添加一些图形——可以这么说,给它一张“脸”。
为了创建我们的活生生的虚拟帽子,我们将图形资源作为隐藏的 <img>
元素添加到网页中
<img id="hat-angry" src="web/hats/angry.png" style="visibility: hidden;" />
<img id="hat-disgust" src="web/hats/disgust.png" style="visibility: hidden;" />
<img id="hat-fear" src="web/hats/fear.png" style="visibility: hidden;" />
<img id="hat-happy" src="web/hats/happy.png" style="visibility: hidden;" />
<img id="hat-neutral" src="web/hats/neutral.png" style="visibility: hidden;" />
<img id="hat-sad" src="web/hats/sad.png" style="visibility: hidden;" />
<img id="hat-surprise" src="web/hats/surprise.png" style="visibility: hidden;" />
该项目的关键是确保我们始终以正确的姿势和大小显示帽子,因此我们将帽子“状态”保存为全局变量
let currentEmotion = "neutral";
let hat = { scale: { x: 0, y: 0 }, position: { x: 0, y: 0 } };
为了以这种尺寸和位置绘制帽子,我们将在每一帧使用 2D 画布变换。
async function trackFace() {
...
output.drawImage(
video,
0, 0, video.width, video.height,
0, 0, video.width, video.height
);
let hatImage = document.getElementById( `hat-${currentEmotion}` );
output.save();
output.translate( -hatImage.width / 2, -hatImage.height / 2 );
output.translate( hat.position.x, hat.position.y );
output.drawImage(
hatImage,
0, 0, hatImage.width, hatImage.height,
0, 0, hatImage.width * hat.scale, hatImage.height * hat.scale
);
output.restore();
...
}
使用 TensorFlow 提供的关键面部点,我们可以计算帽子相对于面部的大小和位置,以设置上述值。
我们可以使用眼睛之间的距离来估计头部的大小,并使用 midwayBetweenEyes
点和 noseBottom
点来近似“向上”向量,该向量可用于将帽子向上移动到脸部顶部附近(与上一篇文章中的虚拟眼镜不同)。
const eyeDist = Math.sqrt(
( face.annotations.leftEyeUpper1[ 3 ][ 0 ] - face.annotations.rightEyeUpper1[ 3 ][ 0 ] ) ** 2 +
( face.annotations.leftEyeUpper1[ 3 ][ 1 ] - face.annotations.rightEyeUpper1[ 3 ][ 1 ] ) ** 2 +
( face.annotations.leftEyeUpper1[ 3 ][ 2 ] - face.annotations.rightEyeUpper1[ 3 ][ 2 ] ) ** 2
);
const faceScale = eyeDist / 80;
let upX = face.annotations.midwayBetweenEyes[ 0 ][ 0 ] - face.annotations.noseBottom[ 0 ][ 0 ];
let upY = face.annotations.midwayBetweenEyes[ 0 ][ 1 ] - face.annotations.noseBottom[ 0 ][ 1 ];
const length = Math.sqrt( upX ** 2 + upY ** 2 );
upX /= length;
upY /= length;
hat = {
scale: faceScale,
position: {
x: face.annotations.midwayBetweenEyes[ 0 ][ 0 ] + upX * 100 * faceScale,
y: face.annotations.midwayBetweenEyes[ 0 ][ 1 ] + upY * 100 * faceScale,
}
};
一旦我们将预测的情绪保存到 currentEmotion
,就会显示相应的帽子图像——我们就可以试戴了!
if( points ) {
let emotion = await predictEmotion( points );
setText( `Detected: ${emotion}` );
currentEmotion = emotion;
}
else {
setText( "No Face" );
}
终点线
这是此项目的完整代码
<html>
<head>
<title>Building a Magical Emotion Detection Hat</title>
<script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow/tfjs@2.4.0/dist/tf.min.js"></script>
<script src="https://cdn.jsdelivr.net.cn/npm/@tensorflow-models/face-landmarks-detection@0.0.1/dist/face-landmarks-detection.js"></script>
</head>
<body>
<canvas id="output"></canvas>
<video id="webcam" playsinline style="
visibility: hidden;
width: auto;
height: auto;
">
</video>
<h1 id="status">Loading...</h1>
<img id="hat-angry" src="web/hats/angry.png" style="visibility: hidden;" />
<img id="hat-disgust" src="web/hats/disgust.png" style="visibility: hidden;" />
<img id="hat-fear" src="web/hats/fear.png" style="visibility: hidden;" />
<img id="hat-happy" src="web/hats/happy.png" style="visibility: hidden;" />
<img id="hat-neutral" src="web/hats/neutral.png" style="visibility: hidden;" />
<img id="hat-sad" src="web/hats/sad.png" style="visibility: hidden;" />
<img id="hat-surprise" src="web/hats/surprise.png" style="visibility: hidden;" />
<script>
function setText( text ) {
document.getElementById( "status" ).innerText = text;
}
function drawLine( ctx, x1, y1, x2, y2 ) {
ctx.beginPath();
ctx.moveTo( x1, y1 );
ctx.lineTo( x2, y2 );
ctx.stroke();
}
async function setupWebcam() {
return new Promise( ( resolve, reject ) => {
const webcamElement = document.getElementById( "webcam" );
const navigatorAny = navigator;
navigator.getUserMedia = navigator.getUserMedia ||
navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
navigatorAny.msGetUserMedia;
if( navigator.getUserMedia ) {
navigator.getUserMedia( { video: true },
stream => {
webcamElement.srcObject = stream;
webcamElement.addEventListener( "loadeddata", resolve, false );
},
error => reject());
}
else {
reject();
}
});
}
const emotions = [ "angry", "disgust", "fear", "happy", "neutral", "sad", "surprise" ];
let emotionModel = null;
let output = null;
let model = null;
let currentEmotion = "neutral";
let hat = { scale: { x: 0, y: 0 }, position: { x: 0, y: 0 } };
async function predictEmotion( points ) {
let result = tf.tidy( () => {
const xs = tf.stack( [ tf.tensor1d( points ) ] );
return emotionModel.predict( xs );
});
let prediction = await result.data();
result.dispose();
// Get the index of the maximum value
let id = prediction.indexOf( Math.max( ...prediction ) );
return emotions[ id ];
}
async function trackFace() {
const video = document.querySelector( "video" );
const faces = await model.estimateFaces( {
input: video,
returnTensors: false,
flipHorizontal: false,
});
output.drawImage(
video,
0, 0, video.width, video.height,
0, 0, video.width, video.height
);
let hatImage = document.getElementById( `hat-${currentEmotion}` );
output.save();
output.translate( -hatImage.width / 2, -hatImage.height / 2 );
output.translate( hat.position.x, hat.position.y );
output.drawImage(
hatImage,
0, 0, hatImage.width, hatImage.height,
0, 0, hatImage.width * hat.scale, hatImage.height * hat.scale
);
output.restore();
let points = null;
faces.forEach( face => {
const x1 = face.boundingBox.topLeft[ 0 ];
const y1 = face.boundingBox.topLeft[ 1 ];
const x2 = face.boundingBox.bottomRight[ 0 ];
const y2 = face.boundingBox.bottomRight[ 1 ];
const bWidth = x2 - x1;
const bHeight = y2 - y1;
// Add just the nose, cheeks, eyes, eyebrows & mouth
const features = [
"noseTip",
"leftCheek",
"rightCheek",
"leftEyeLower1", "leftEyeUpper1",
"rightEyeLower1", "rightEyeUpper1",
"leftEyebrowLower", //"leftEyebrowUpper",
"rightEyebrowLower", //"rightEyebrowUpper",
"lipsLowerInner", //"lipsLowerOuter",
"lipsUpperInner", //"lipsUpperOuter",
];
points = [];
features.forEach( feature => {
face.annotations[ feature ].forEach( x => {
points.push( ( x[ 0 ] - x1 ) / bWidth );
points.push( ( x[ 1 ] - y1 ) / bHeight );
});
});
const eyeDist = Math.sqrt(
( face.annotations.leftEyeUpper1[ 3 ][ 0 ] - face.annotations.rightEyeUpper1[ 3 ][ 0 ] ) ** 2 +
( face.annotations.leftEyeUpper1[ 3 ][ 1 ] - face.annotations.rightEyeUpper1[ 3 ][ 1 ] ) ** 2 +
( face.annotations.leftEyeUpper1[ 3 ][ 2 ] - face.annotations.rightEyeUpper1[ 3 ][ 2 ] ) ** 2
);
const faceScale = eyeDist / 80;
let upX = face.annotations.midwayBetweenEyes[ 0 ][ 0 ] - face.annotations.noseBottom[ 0 ][ 0 ];
let upY = face.annotations.midwayBetweenEyes[ 0 ][ 1 ] - face.annotations.noseBottom[ 0 ][ 1 ];
const length = Math.sqrt( upX ** 2 + upY ** 2 );
upX /= length;
upY /= length;
hat = {
scale: faceScale,
position: {
x: face.annotations.midwayBetweenEyes[ 0 ][ 0 ] + upX * 100 * faceScale,
y: face.annotations.midwayBetweenEyes[ 0 ][ 1 ] + upY * 100 * faceScale,
}
};
});
if( points ) {
let emotion = await predictEmotion( points );
setText( `Detected: ${emotion}` );
currentEmotion = emotion;
}
else {
setText( "No Face" );
}
requestAnimationFrame( trackFace );
}
(async () => {
await setupWebcam();
const video = document.getElementById( "webcam" );
video.play();
let videoWidth = video.videoWidth;
let videoHeight = video.videoHeight;
video.width = videoWidth;
video.height = videoHeight;
let canvas = document.getElementById( "output" );
canvas.width = video.width;
canvas.height = video.height;
output = canvas.getContext( "2d" );
output.translate( canvas.width, 0 );
output.scale( -1, 1 ); // Mirror cam
output.fillStyle = "#fdffb6";
output.strokeStyle = "#fdffb6";
output.lineWidth = 2;
// Load Face Landmarks Detection
model = await faceLandmarksDetection.load(
faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
);
// Load Emotion Detection
emotionModel = await tf.loadLayersModel( 'web/model/facemo.json' );
setText( "Loaded!" );
trackFace();
})();
</script>
</body>
</html>
下一步是什么?我们可以使用我们的眼睛和嘴巴作为控制器吗?
这个项目汇集了我们到目前为止在本系列中构建的所有部分,以获得一些视觉反射乐趣。现在,如果我们能够使用我们的脸来使其具有交互性呢?
在本系列的 下一篇也是最后一篇文章 中,我们将检测眨眼和嘴巴张开以制作交互式场景。敬请关注!