Detecting Facial Emotions in the Browser with Deep Learning Using TensorFlow.js

Raphael Mun

5.00/5 (4 votes)

Feb 3, 2021

CPOL

7 min read

13524

276

In this article we'll use the key facial landmarks to infer more information about the face from the images.

Download code and files - 565.6 KB

Introduction

Apps like Snapchat offer an amazing variety of face filters and lenses that let you overlay interesting things on your photos and videos. If you’ve ever given yourself virtual dog ears or a party hat, you know how much fun it can be!

Have you wondered how you’d create these kinds of filters from scratch? Well, now’s your chance to learn, all within your web browser! In this series, we’re going to see how to create Snapchat-style filters in the browser, train an AI model to understand facial expressions, and do even more using Tensorflow.js and face tracking.

You are welcome to download the demo of this project. You may need to enable WebGL in your web browser for performance. You can also download the code and files for this series.

We are assuming that you are familiar with JavaScript and HTML and have at least a basic understanding of neural networks. If you are new to TensorFlow.js, we recommend that you first check out this guide: Getting Started with Deep Learning in Your Browser Using TensorFlow.js.

If you would like to see more of what is possible in the web browser with TensorFlow.js, check out these AI series: Computer Vision with TensorFlow.js and Chatbots using TensorFlow.js.

In the previous article, we learned how to use AI models to detect the shape of faces. In this one, we’ll use the key facial landmarks to infer more information about the face from the images.

By connecting our face tracking code with the FER facial emotion dataset, we will train a second neural network model to predict the person’s emotion based on several 3D key points.

Setting Up with FER2013 Face Emotion Data

We’ll build on the face tracking code of the previous article to create two web pages. One page will be used to train the AI model with the tracked facial points on the FER dataset, and the other one will load and run the trained model on a test dataset.

Let’s modify the final code from the face tracking project to train and run a neural network model with face data. The FER2013 dataset consists of over 28K labeled face images; it is available on Kaggle. We downloaded this version, which had the dataset already converted to image files, and placed it in the web/fer2013 folder. We then updated the NodeJS server code in index.js to return a reference list of the images at http://localhost:8080/data/, so that you can get the full JSON object if you run the server locally.

To make it a bit easier, we saved this JSON object to the web/fer2013.js file for you to use directly, without the need to run the server locally. You can include it with the other script files at the top of the page:

<script src="web/fer2013.js"></script>

We are going to work with images rather than the webcam video (don’t worry, we will bring video back in the next article [11] !), so we need to replace the <video> element with the <img> element, and can rename its ID to “image.” We can also remove the setupWebcam function because we do not need it for this project.

<img id="image" style="
    visibility: hidden;
    width: auto;
    height: auto;
    "/>

Next, let’s add a utility function to set the image for the element, and another one to shuffle the data array. Because the original images are only 48x48 pixels, let’s define a larger output size of 500 pixels to get more granular face tracking and be able to see the result in a bigger canvas, and let’s update the line and polygon utility functions to scale to the output.

async function setImage( url ) {
    return new Promise( res => {
        let image = document.getElementById( "image" );
        image.src = url;
        image.onload = () => {
            res();
        };
    });
}

function shuffleArray( array ) {
    for( let i = array.length - 1; i > 0; i-- ) {
        const j = Math.floor( Math.random() * ( i + 1 ) );
        [ array[ i ], array[ j ] ] = [ array[ j ], array[ i ] ];
    }
}

const OUTPUT_SIZE = 500;

Some global variables we will need are the list of emotion categories, an aggregated array list of FER data, and an index for the array:

const emotions = [ "angry", "disgust", "fear", "happy", "neutral", "sad", "surprise" ];
let ferData = [];
let setIndex = 0;

inside the async block, we can prepare and shuffle the FER data and resize the canvas to 500x500 pixels:

const minSamples = Math.min( ...Object.keys( fer2013 ).map( em => fer2013[ em ].length ) );
Object.keys( fer2013 ).forEach( em => {
    shuffleArray( fer2013[ em ] );
    for( let i = 0; i < minSamples; i++ ) {
        ferData.push({
            emotion: em,
            file: fer2013[ em ][ i ]
        });
    }
});
shuffleArray( ferData );

let canvas = document.getElementById( "output" );
canvas.width = OUTPUT_SIZE;
canvas.height = OUTPUT_SIZE;

There is one last update we need in the code template before training the AI model on one page and running the trained model on the second page. We have to update the trackFace function to work with the image element instead of the video, as well as scale the bounding box and face mesh output to match the canvas size. We’ll increment setIndex at the end of the function to move onto the next image.

async function trackFace() {
    // Set to the next training image
    await setImage( ferData[ setIndex ].file );
    const image = document.getElementById( "image" );
    const faces = await model.estimateFaces( {
        input: image,
        returnTensors: false,
        flipHorizontal: false,
    });
    output.drawImage(
        image,
        0, 0, image.width, image.height,
        0, 0, OUTPUT_SIZE, OUTPUT_SIZE
    );

    const scale = OUTPUT_SIZE / image.width;

    faces.forEach( face => {
        // Draw the bounding box
        const x1 = face.boundingBox.topLeft[ 0 ];
        const y1 = face.boundingBox.topLeft[ 1 ];
        const x2 = face.boundingBox.bottomRight[ 0 ];
        const y2 = face.boundingBox.bottomRight[ 1 ];
        const bWidth = x2 - x1;
        const bHeight = y2 - y1;
        drawLine( output, x1, y1, x2, y1, scale );
        drawLine( output, x2, y1, x2, y2, scale );
        drawLine( output, x1, y2, x2, y2, scale );
        drawLine( output, x1, y1, x1, y2, scale );

        // Draw the face mesh
        const keypoints = face.scaledMesh;
        for( let i = 0; i < FaceTriangles.length / 3; i++ ) {
            let pointA = keypoints[ FaceTriangles[ i * 3 ] ];
            let pointB = keypoints[ FaceTriangles[ i * 3 + 1 ] ];
            let pointC = keypoints[ FaceTriangles[ i * 3 + 2 ] ];
            drawTriangle( output, pointA[ 0 ], pointA[ 1 ], pointB[ 0 ], pointB[ 1 ], pointC[ 0 ], pointC[ 1 ], scale );
        }
    });

    setText( `${setIndex + 1}. Face Tracking Confidence: ${face.faceInViewConfidence.toFixed( 3 )} - ${ferData[ setIndex ].emotion}` );
    setIndex++;
    requestAnimationFrame( trackFace );
}

Now our modified template is ready. Create two copies of this code so that we can set one page for Deep Learning and the other page for testing.

Part 1: Deep Learning Facial Emotion

In this first web page file, we are going to set up the training data, create the neural network model, and then train it and save the weights to a file. The pretrained model is included in the code (see the web/model folder), so you can skip this part and move ahead to Part 2 if you wish.

Add a global variable to store the training data and a utility function to convert an emotion label to a one-hot vector so we can use it for the training data:

let trainingData = [];

function emotionToArray( emotion ) {
    let array = [];
    for( let i = 0; i < emotions.length; i++ ) {
        array.push( emotion === emotions[ i ] ? 1 : 0 );
    }
    return array;
}

Inside the trackFace function, we’ll take the various key facial features, scale them relative to the size of the bounding box, and add them into the training dataset if the face tracking confidence value is high enough. We’ve commented out some of the additional facial features to simplify the data, but you can add them back in if you would like to experiment. If you do so, remember to match these same features when running the model.

// Add just the nose, cheeks, eyes, eyebrows & mouth
const features = [
    "noseTip",
    "leftCheek",
    "rightCheek",
    "leftEyeLower1", "leftEyeUpper1",
    "rightEyeLower1", "rightEyeUpper1",
    "leftEyebrowLower", //"leftEyebrowUpper",
    "rightEyebrowLower", //"rightEyebrowUpper",
    "lipsLowerInner", //"lipsLowerOuter",
    "lipsUpperInner", //"lipsUpperOuter",
];
let points = [];
features.forEach( feature => {
    face.annotations[ feature ].forEach( x => {
        points.push( ( x[ 0 ] - x1 ) / bWidth );
        points.push( ( x[ 1 ] - y1 ) / bHeight );
    });
});
// Only grab the faces that are confident
if( face.faceInViewConfidence > 0.9 ) {
    trainingData.push({
        input: points,
        output: ferData[ setIndex ].emotion,
    });
}

Once we have compiled enough training data, we can pass it off to the trainNet function. At the top of the trackFace function, let’s finish and break out of the face tracking loop after 200 images and call the training function:

async function trackFace() {
    // Fast train on just 200 of the images
    if( setIndex >= 200 ) {
        setText( "Finished!" );
        trainNet();
        return;
    }
    ...
}

Finally the part we have been waiting for: let’s create the trainNet function and train our AI model!

This function will split the training data into an input array of the key points and an output array of the emotion one-hot vectors, create a categorical TensorFlow model with multiple hidden layers, train for 1,000 epochs, and download the trained model. You can increase the number of epochs if you would like to train the model more.

async function trainNet() {
    let inputs = trainingData.map( x => x.input );
    let outputs = trainingData.map( x => emotionToArray( x.output ) );

    // Define our model with several hidden layers
    const model = tf.sequential();
    model.add(tf.layers.dense( { units: 100, activation: "relu", inputShape: [ inputs[ 0 ].length ] } ) );
    model.add(tf.layers.dense( { units: 100, activation: "relu" } ) );
    model.add(tf.layers.dense( { units: 100, activation: "relu" } ) );
    model.add(tf.layers.dense( {
        units: emotions.length,
        kernelInitializer: 'varianceScaling',
        useBias: false,
        activation: "softmax"
    } ) );

    model.compile({
        optimizer: "adam",
        loss: "categoricalCrossentropy",
        metrics: "acc"
    });

    const xs = tf.stack( inputs.map( x => tf.tensor1d( x ) ) );
    const ys = tf.stack( outputs.map( x => tf.tensor1d( x ) ) );
    await model.fit( xs, ys, {
        epochs: 1000,
        shuffle: true,
        callbacks: {
            onEpochEnd: ( epoch, logs ) => {
                setText( `Training... Epoch #${epoch} (${logs.acc.toFixed( 3 )})` );
                console.log( "Epoch #", epoch, logs );
            }
        }
    } );

    // Download the trained model
    const saveResult = await model.save( "downloads://facemo" );
}

And that’s it! This web page will train an AI model to recognize facial expressions in the various categories and give you a model to load and run, which we will do next.

Part 1: Finish Line

Here is what the full code for training the model with the FER dataset:

<html>
    <head>
        <title>Training - Recognizing Facial Expressions in the Browser with Deep Learning using TensorFlow.js</title>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.4.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/face-landmarks-detection@0.0.1/dist/face-landmarks-detection.js"></script>
        <script src="web/triangles.js"></script>
        <script src="web/fer2013.js"></script>
    </head>
    <body>
        <canvas id="output"></canvas>
        <img id="image" style="
            visibility: hidden;
            width: auto;
            height: auto;
            "/>
        <h1 id="status">Loading...</h1>
        <script>
        function setText( text ) {
            document.getElementById( "status" ).innerText = text;
        }

        async function setImage( url ) {
            return new Promise( res => {
                let image = document.getElementById( "image" );
                image.src = url;
                image.onload = () => {
                    res();
                };
            });
        }

        function shuffleArray( array ) {
            for( let i = array.length - 1; i > 0; i-- ) {
                const j = Math.floor( Math.random() * ( i + 1 ) );
                [ array[ i ], array[ j ] ] = [ array[ j ], array[ i ] ];
            }
        }

        function drawLine( ctx, x1, y1, x2, y2, scale = 1 ) {
            ctx.beginPath();
            ctx.moveTo( x1 * scale, y1 * scale );
            ctx.lineTo( x2 * scale, y2 * scale );
            ctx.stroke();
        }

        function drawTriangle( ctx, x1, y1, x2, y2, x3, y3, scale = 1 ) {
            ctx.beginPath();
            ctx.moveTo( x1 * scale, y1 * scale );
            ctx.lineTo( x2 * scale, y2 * scale );
            ctx.lineTo( x3 * scale, y3 * scale );
            ctx.lineTo( x1 * scale, y1 * scale );
            ctx.stroke();
        }

        const OUTPUT_SIZE = 500;
        const emotions = [ "angry", "disgust", "fear", "happy", "neutral", "sad", "surprise" ];
        let ferData = [];
        let setIndex = 0;
        let trainingData = [];

        let output = null;
        let model = null;

        function emotionToArray( emotion ) {
            let array = [];
            for( let i = 0; i < emotions.length; i++ ) {
                array.push( emotion === emotions[ i ] ? 1 : 0 );
            }
            return array;
        }

        async function trainNet() {
            let inputs = trainingData.map( x => x.input );
            let outputs = trainingData.map( x => emotionToArray( x.output ) );

            // Define our model with several hidden layers
            const model = tf.sequential();
            model.add(tf.layers.dense( { units: 100, activation: "relu", inputShape: [ inputs[ 0 ].length ] } ) );
            model.add(tf.layers.dense( { units: 100, activation: "relu" } ) );
            model.add(tf.layers.dense( { units: 100, activation: "relu" } ) );
            model.add(tf.layers.dense( {
                units: emotions.length,
                kernelInitializer: 'varianceScaling',
                useBias: false,
                activation: "softmax"
            } ) );

            model.compile({
                optimizer: "adam",
                loss: "categoricalCrossentropy",
                metrics: "acc"
            });

            const xs = tf.stack( inputs.map( x => tf.tensor1d( x ) ) );
            const ys = tf.stack( outputs.map( x => tf.tensor1d( x ) ) );
            await model.fit( xs, ys, {
                epochs: 1000,
                shuffle: true,
                callbacks: {
                    onEpochEnd: ( epoch, logs ) => {
                        setText( `Training... Epoch #${epoch} (${logs.acc.toFixed( 3 )})` );
                        console.log( "Epoch #", epoch, logs );
                    }
                }
            } );

            // Download the trained model
            const saveResult = await model.save( "downloads://facemo" );
        }

        async function trackFace() {
            // Fast train on just 200 of the images
            if( setIndex >= 200 ) {//ferData.length ) {
                setText( "Finished!" );
                trainNet();
                return;
            }
            // Set to the next training image
            await setImage( ferData[ setIndex ].file );
            const image = document.getElementById( "image" );
            const faces = await model.estimateFaces( {
                input: image,
                returnTensors: false,
                flipHorizontal: false,
            });
            output.drawImage(
                image,
                0, 0, image.width, image.height,
                0, 0, OUTPUT_SIZE, OUTPUT_SIZE
            );

            const scale = OUTPUT_SIZE / image.width;

            faces.forEach( face => {
                // Draw the bounding box
                const x1 = face.boundingBox.topLeft[ 0 ];
                const y1 = face.boundingBox.topLeft[ 1 ];
                const x2 = face.boundingBox.bottomRight[ 0 ];
                const y2 = face.boundingBox.bottomRight[ 1 ];
                const bWidth = x2 - x1;
                const bHeight = y2 - y1;
                drawLine( output, x1, y1, x2, y1, scale );
                drawLine( output, x2, y1, x2, y2, scale );
                drawLine( output, x1, y2, x2, y2, scale );
                drawLine( output, x1, y1, x1, y2, scale );

                // Draw the face mesh
                const keypoints = face.scaledMesh;
                for( let i = 0; i < FaceTriangles.length / 3; i++ ) {
                    let pointA = keypoints[ FaceTriangles[ i * 3 ] ];
                    let pointB = keypoints[ FaceTriangles[ i * 3 + 1 ] ];
                    let pointC = keypoints[ FaceTriangles[ i * 3 + 2 ] ];
                    drawTriangle( output, pointA[ 0 ], pointA[ 1 ], pointB[ 0 ], pointB[ 1 ], pointC[ 0 ], pointC[ 1 ], scale );
                }

                // Add just the nose, cheeks, eyes, eyebrows & mouth
                const features = [
                    "noseTip",
                    "leftCheek",
                    "rightCheek",
                    "leftEyeLower1", "leftEyeUpper1",
                    "rightEyeLower1", "rightEyeUpper1",
                    "leftEyebrowLower", //"leftEyebrowUpper",
                    "rightEyebrowLower", //"rightEyebrowUpper",
                    "lipsLowerInner", //"lipsLowerOuter",
                    "lipsUpperInner", //"lipsUpperOuter",
                ];
                let points = [];
                features.forEach( feature => {
                    face.annotations[ feature ].forEach( x => {
                        points.push( ( x[ 0 ] - x1 ) / bWidth );
                        points.push( ( x[ 1 ] - y1 ) / bHeight );
                    });
                });
                // Only grab the faces that are confident
                if( face.faceInViewConfidence > 0.9 ) {
                    trainingData.push({
                        input: points,
                        output: ferData[ setIndex ].emotion,
                    });
                }
            });

            setText( `${setIndex + 1}. Face Tracking Confidence: ${face.faceInViewConfidence.toFixed( 3 )} - ${ferData[ setIndex ].emotion}` );
            setIndex++;
            requestAnimationFrame( trackFace );
        }

        (async () => {
            // Get FER-2013 data from the local web server
            // https://www.kaggle.com/msambare/fer2013
            // The data can be downloaded from Kaggle and placed inside the "web/fer2013" folder
            // Get the lowest number of samples out of all emotion categories
            const minSamples = Math.min( ...Object.keys( fer2013 ).map( em => fer2013[ em ].length ) );
            Object.keys( fer2013 ).forEach( em => {
                shuffleArray( fer2013[ em ] );
                for( let i = 0; i < minSamples; i++ ) {
                    ferData.push({
                        emotion: em,
                        file: fer2013[ em ][ i ]
                    });
                }
            });
            shuffleArray( ferData );

            let canvas = document.getElementById( "output" );
            canvas.width = OUTPUT_SIZE;
            canvas.height = OUTPUT_SIZE;

            output = canvas.getContext( "2d" );
            output.translate( canvas.width, 0 );
            output.scale( -1, 1 ); // Mirror cam
            output.fillStyle = "#fdffb6";
            output.strokeStyle = "#fdffb6";
            output.lineWidth = 2;

            // Load Face Landmarks Detection
            model = await faceLandmarksDetection.load(
                faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
            );

            setText( "Loaded!" );

            trackFace();
        })();
        </script>
    </body>
</html>

Part 2: Running Facial Emotion Detection

We are almost there. Running the emotion detector model is simpler than training it. In this web page, we are going to load the trained TensorFlow model and test it on random faces from the FER dataset.

We can load the emotion detection model right below the Face Landmarks Detection model loading code, in a global variable. If you have trained your own model in Part 1, you can update the path to match the location where you saved your model.

let emotionModel = null;

(async () => {
    ...
    // Load Face Landmarks Detection
    model = await faceLandmarksDetection.load(
        faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
    );
    // Load Emotion Detection
    emotionModel = await tf.loadLayersModel( 'web/model/facemo.json' );
    ...
})();

After that, we can write a function to run the model on an input of the key facial points and return the detected emotion:

async function predictEmotion( points ) {
    let result = tf.tidy( () => {
        const xs = tf.stack( [ tf.tensor1d( points ) ] );
        return emotionModel.predict( xs );
    });
    let prediction = await result.data();
    result.dispose();
    // Get the index of the maximum value
    let id = prediction.indexOf( Math.max( ...prediction ) );
    return emotions[ id ];
}

Just so that we can wait a few seconds between test images, let’s create a wait utility function:

function wait( ms ) {
    return new Promise( res => setTimeout( res, ms ) );
}

Now to put it into action, we can take the key points of the tracked face, scale it relative to the bounding box to prepare as input, run the emotion prediction, and show the expected vs. detected result, with 2 seconds between images.

async function trackFace() {
    ...

    let points = null;
    faces.forEach( face => {
        ...

        // Add just the nose, cheeks, eyes, eyebrows & mouth
        const features = [
            "noseTip",
            "leftCheek",
            "rightCheek",
            "leftEyeLower1", "leftEyeUpper1",
            "rightEyeLower1", "rightEyeUpper1",
            "leftEyebrowLower", //"leftEyebrowUpper",
            "rightEyebrowLower", //"rightEyebrowUpper",
            "lipsLowerInner", //"lipsLowerOuter",
            "lipsUpperInner", //"lipsUpperOuter",
        ];
        points = [];
        features.forEach( feature => {
            face.annotations[ feature ].forEach( x => {
                points.push( ( x[ 0 ] - x1 ) / bWidth );
                points.push( ( x[ 1 ] - y1 ) / bHeight );
            });
        });
    });

    if( points ) {
        let emotion = await predictEmotion( points );
        setText( `${setIndex + 1}. Expected: ${ferData[ setIndex ].emotion} vs. ${emotion}` );
    }
    else {
        setText( "No Face" );
    }

    setIndex++;
    await wait( 2000 );
    requestAnimationFrame( trackFace );
}

It’s ready! Our code should start predicting the emotions of the FER images to match the expected emotion. Try it and see how it performs.

Part 2: Finish Line

Take a look at the full code to run the trained model on the FER dataset images:

<html>
    <head>
        <title>Running - Recognizing Facial Expressions in the Browser with Deep Learning using TensorFlow.js</title>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.4.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/face-landmarks-detection@0.0.1/dist/face-landmarks-detection.js"></script>
        <script src="web/fer2013.js"></script>
    </head>
    <body>
        <canvas id="output"></canvas>
        <img id="image" style="
            visibility: hidden;
            width: auto;
            height: auto;
            "/>
        <h1 id="status">Loading...</h1>
        <script>
        function setText( text ) {
            document.getElementById( "status" ).innerText = text;
        }

        async function setImage( url ) {
            return new Promise( res => {
                let image = document.getElementById( "image" );
                image.src = url;
                image.onload = () => {
                    res();
                };
            });
        }

        function shuffleArray( array ) {
            for( let i = array.length - 1; i > 0; i-- ) {
                const j = Math.floor( Math.random() * ( i + 1 ) );
                [ array[ i ], array[ j ] ] = [ array[ j ], array[ i ] ];
            }
        }

        function drawLine( ctx, x1, y1, x2, y2, scale = 1 ) {
            ctx.beginPath();
            ctx.moveTo( x1 * scale, y1 * scale );
            ctx.lineTo( x2 * scale, y2 * scale );
            ctx.stroke();
        }

        function drawTriangle( ctx, x1, y1, x2, y2, x3, y3, scale = 1 ) {
            ctx.beginPath();
            ctx.moveTo( x1 * scale, y1 * scale );
            ctx.lineTo( x2 * scale, y2 * scale );
            ctx.lineTo( x3 * scale, y3 * scale );
            ctx.lineTo( x1 * scale, y1 * scale );
            ctx.stroke();
        }

        function wait( ms ) {
            return new Promise( res => setTimeout( res, ms ) );
        }

        const OUTPUT_SIZE = 500;
        const emotions = [ "angry", "disgust", "fear", "happy", "neutral", "sad", "surprise" ];
        let ferData = [];
        let setIndex = 0;
        let emotionModel = null;

        let output = null;
        let model = null;

        async function predictEmotion( points ) {
            let result = tf.tidy( () => {
                const xs = tf.stack( [ tf.tensor1d( points ) ] );
                return emotionModel.predict( xs );
            });
            let prediction = await result.data();
            result.dispose();
            // Get the index of the maximum value
            let id = prediction.indexOf( Math.max( ...prediction ) );
            return emotions[ id ];
        }

        async function trackFace() {
            // Set to the next training image
            await setImage( ferData[ setIndex ].file );
            const image = document.getElementById( "image" );
            const faces = await model.estimateFaces( {
                input: image,
                returnTensors: false,
                flipHorizontal: false,
            });
            output.drawImage(
                image,
                0, 0, image.width, image.height,
                0, 0, OUTPUT_SIZE, OUTPUT_SIZE
            );

            const scale = OUTPUT_SIZE / image.width;

            let points = null;
            faces.forEach( face => {
                // Draw the bounding box
                const x1 = face.boundingBox.topLeft[ 0 ];
                const y1 = face.boundingBox.topLeft[ 1 ];
                const x2 = face.boundingBox.bottomRight[ 0 ];
                const y2 = face.boundingBox.bottomRight[ 1 ];
                const bWidth = x2 - x1;
                const bHeight = y2 - y1;
                drawLine( output, x1, y1, x2, y1, scale );
                drawLine( output, x2, y1, x2, y2, scale );
                drawLine( output, x1, y2, x2, y2, scale );
                drawLine( output, x1, y1, x1, y2, scale );

                // Add just the nose, cheeks, eyes, eyebrows & mouth
                const features = [
                    "noseTip",
                    "leftCheek",
                    "rightCheek",
                    "leftEyeLower1", "leftEyeUpper1",
                    "rightEyeLower1", "rightEyeUpper1",
                    "leftEyebrowLower", //"leftEyebrowUpper",
                    "rightEyebrowLower", //"rightEyebrowUpper",
                    "lipsLowerInner", //"lipsLowerOuter",
                    "lipsUpperInner", //"lipsUpperOuter",
                ];
                points = [];
                features.forEach( feature => {
                    face.annotations[ feature ].forEach( x => {
                        points.push( ( x[ 0 ] - x1 ) / bWidth );
                        points.push( ( x[ 1 ] - y1 ) / bHeight );
                    });
                });
            });

            if( points ) {
                let emotion = await predictEmotion( points );
                setText( `${setIndex + 1}. Expected: ${ferData[ setIndex ].emotion} vs. ${emotion}` );
            }
            else {
                setText( "No Face" );
            }

            setIndex++;
            await wait( 2000 );
            requestAnimationFrame( trackFace );
        }

        (async () => {
            // Get FER-2013 data from the local web server
            // https://www.kaggle.com/msambare/fer2013
            // The data can be downloaded from Kaggle and placed inside the "web/fer2013" folder
            // Get the lowest number of samples out of all emotion categories
            const minSamples = Math.min( ...Object.keys( fer2013 ).map( em => fer2013[ em ].length ) );
            Object.keys( fer2013 ).forEach( em => {
                shuffleArray( fer2013[ em ] );
                for( let i = 0; i < minSamples; i++ ) {
                    ferData.push({
                        emotion: em,
                        file: fer2013[ em ][ i ]
                    });
                }
            });
            shuffleArray( ferData );

            let canvas = document.getElementById( "output" );
            canvas.width = OUTPUT_SIZE;
            canvas.height = OUTPUT_SIZE;

            output = canvas.getContext( "2d" );
            output.translate( canvas.width, 0 );
            output.scale( -1, 1 ); // Mirror cam
            output.fillStyle = "#fdffb6";
            output.strokeStyle = "#fdffb6";
            output.lineWidth = 2;

            // Load Face Landmarks Detection
            model = await faceLandmarksDetection.load(
                faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
            );
            // Load Emotion Detection
            emotionModel = await tf.loadLayersModel( 'web/model/facemo.json' );

            setText( "Loaded!" );

            trackFace();
        })();
        </script>
    </body>
</html>

What’s Next? Can This Detect Our Own Facial Emotions?

In this article, we combined the output of the TensorFlow Face Landmarks Detection model with an independent dataset to generate a new model, which can predict more information from an image than before. The real test would be to have this new model predict emotions on any face.

Let’s go to the next article of this series, in which we’ll use the live webcam video of our face and see if the model can react to our facial expressions in real time.