Continuing my groundbreaking work in the field of emoji masks from last week; this week I wrote an improved mask based on an open API for facial expression classification called face-api.js (you can just tell when a developer named something).
Since the tutorial I took most of my code from is in plain ol’ JavaScript, I challenged myself to write the whole program without p5. Like they used to do with charcoal on the walls of caves.
👾The code: script.js
The teachable machine emoji mask was trained on a limited set of ~500 images of me in the same room. Therefore its functionality was super limited. It could barely recognize if I was smiling or frowning, and different people in different environments would make it even less reliable. Lesson about the limits of datasets impose on the models learnt.
I still thought it was a fun idea, and figured there would probably be an API to a facial expression recognition model floating around. The face-api.js is based on tensorflow, and is trained on various datasets even Ellen Nickles(!) couldn’t track down. This allows the program to well… actually work — and with much more expressions!
The tutorial on how to create a basic webcam feed and connect it to the model made this all much easier. Everything that was left for me was to replace the tracking box around the face with an emoji of the right dimensions, and adjust the code to the new version of the api (original was written five years ago).
This was also a cool opportunity to practice writing a program in JavaScript, and the first time I tried using async functions and arrow functions. It was also good practice for linking HTML elements to scripts.
// This saved the DOM element with the id of "videoTag"
// in the JS variable video
const video = document.getElementById("videoTag");
let emojiValue = "😐";
// Calling these inside Promise.all will make them load
// the models in parallel, which is faster
Promise.all([
faceapi.nets.ssdMobilenetv1.loadFromUri('./models'),
faceapi.nets.faceLandmark68Net.loadFromUri('./models'),
faceapi.nets.faceRecognitionNet.loadFromUri('./models'),
faceapi.nets.faceExpressionNet.loadFromUri('./models'),
]).then(startVideo) // Ensures the video is started after the models are loaded
.catch(err => console.error('Error loading models:', err)); // Would tell me if there was a bug in the model
// The navigator object gets info about the user’s environment from the browsers web API.
// mediaDevices is a property of the object with the users camera and audio
// getUserMedia is a method that provides access to the camera and audio
function startVideo(){
navigator.mediaDevices.getUserMedia({ video: true }) // This is the promise
.then(stream => video.srcObject = stream) // When access to webcam is granted, save srcObject (the active media stream) into the stream variable
.catch(err => console.error(err)); // Handles errors
}
// This adds an event listener to the video variable that holds
// the videoDiv DOM element. The "playing" event is built into JS.
// When the video is playing (once models are loaded) it calls
// the callback function which is the whole body of the function.
video.addEventListener('playing', ()=>{
// Create canvas variable and connect with canvas class
const emoji = document.querySelector("#emoji")
// Creates the canvas DOM element with a "id=canvas"
document.body.append(emoji)
// Display size for the canvas
const displaySize = {width: video.width, height: video.height}
// Runs an annonymous async arrow function every 100ms
setInterval(async () => {
// Once this detects all faces it stores their info in detections
const detections = await faceapi.detectAllFaces(video)
.withFaceLandmarks()
.withFaceExpressions()
setEmoji(detections[0])
const box = detections[0].detection.box;
emoji.textContent = emojiValue;
emoji.style.fontSize = (box.width + box.height) + "px";
emoji.style.left = box.x + "px";
emoji.style.top = box.y - 50 + "px";
//console.log(detections[0])
//console.log(emojiValue)
}, 100)
})
function setEmoji(face){
if (face.expressions.angry >= 0.90){
emojiValue = "😡"
} else if (face.expressions.disgusted >= 0.90){
emojiValue = "🤢"
} else if (face.expressions.fearful >= 0.90){
emojiValue = "😱"
} else if (face.expressions.happy >= 0.90){
emojiValue = "🙂"
} else if (face.expressions.sad >= 0.90){
emojiValue = "😭"
} else if (face.expressions.surprised >= 0.90){
emojiValue = "😧"
} else {
emojiValue = "😐"
}
}
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Super Emoji Mask</title>
<!--The defer ensures the scripts are loaded AFTER the HTML is parrsed-->
<script defer src ="face-api.min.js"></script>
<script defer src ="script.js"></script>
<!-- This style tag is to center the video feed in the body -->
<style>
body {
margin: 0;
padding: 0;
display: flex;
justify-content: left;
align-items: center;
}
#emoji {
position: absolute;
}
</style>
</head>
<body>
<!-- This is where we render the webcam -->
<video id="videoTag" width="720" height="560" autoplay muted></video>
<div id="emoji"></div>
</body>
</html>
// Basic syntax
parameter => expression;
or
(param1, param2) => expression;
// Example from code
// getUserMedia is a method that provides access to the camera and audio
navigator.mediaDevices.getUserMedia({ video: true }) // This is the promise
// When access to webcam is granted, srcObject (the active media stream) is saved into the stream variable
.then(stream => video.srcObject = stream)
// Handles errors
.catch(err => console.error(err));
}
defer: Instructs the browser to defer the script’s execution until after the HTML is fully parsed.
<script defer src ="script.js"></script>