How to Develop a Face Recognition System Using FaceNet in Keras

Last Updated on

Face recognition is a computer vision task of identifying and verifying a person based on a photograph of their face.

FaceNet is a face recognition system developed in 2015 by researchers at Google that achieved then state-of-the-art results on a range of face recognition benchmark datasets. The FaceNet system can be used broadly thanks to multiple third-party open source implementations of the model and the availability of pre-trained models.

The FaceNet system can be used to extract high-quality features from faces, called face embeddings, that can then be used to train a face identification system.

In this tutorial, you will discover how to develop a face detection system using FaceNet and an SVM classifier to identify people from photographs.

After completing this tutorial, you will know:

  • About the FaceNet face recognition system developed by Google and open source implementations and pre-trained models.
  • How to prepare a face detection dataset including first extracting faces via a face detection system and then extracting face features via face embeddings.
  • How to fit, evaluate, and demonstrate an SVM model to predict identities from faces embeddings.

Discover how to build models for photo classification, object detection, face recognition, and more in my new computer vision book, with 30 step-by-step tutorials and full source code.

Let’s get started.

  • Note: This tutorial requires TensorFlow version 1.14 or higher. It currently does not work with TensorFlow 2 because some third-party libraries have not been updated at the time of writing.
How to Develop a Face Recognition System Using FaceNet in Keras and an SVM Classifier

How to Develop a Face Recognition System Using FaceNet in Keras and an SVM Classifier
Photo by Peter Valverde, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. Face Recognition
  2. FaceNet Model
  3. How to Load a FaceNet Model in Keras
  4. How to Detect Faces for Face Recognition
  5. How to Develop a Face Classification System

Face Recognition

Face recognition is the general task of identifying and verifying people from photographs of their face.

The 2011 book on face recognition titled “Handbook of Face Recognition” describes two main modes for face recognition, as:

  • Face Verification. A one-to-one mapping of a given face against a known identity (e.g. is this the person?).
  • Face Identification. A one-to-many mapping for a given face against a database of known faces (e.g. who is this person?).

A face recognition system is expected to identify faces present in images and videos automatically. It can operate in either or both of two modes: (1) face verification (or authentication), and (2) face identification (or recognition).

— Page 1, Handbook of Face Recognition. 2011.

We will focus on the face identification task in this tutorial.

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

FaceNet Model

FaceNet is a face recognition system that was described by Florian Schroff, et al. at Google in their 2015 paper titled “FaceNet: A Unified Embedding for Face Recognition and Clustering.”

It is a system that, given a picture of a face, will extract high-quality features from the face and predict a 128 element vector representation these features, called a face embedding.

FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity.

FaceNet: A Unified Embedding for Face Recognition and Clustering, 2015.

The model is a deep convolutional neural network trained via a triplet loss function that encourages vectors for the same identity to become more similar (smaller distance), whereas vectors for different identities are expected to become less similar (larger distance). The focus on training a model to create embeddings directly (rather than extracting them from an intermediate layer of a model) was an important innovation in this work.

Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches.

FaceNet: A Unified Embedding for Face Recognition and Clustering, 2015.

These face embeddings were then used as the basis for training classifier systems on standard face recognition benchmark datasets, achieving then-state-of-the-art results.

Our system cuts the error rate in comparison to the best published result by 30% …

FaceNet: A Unified Embedding for Face Recognition and Clustering, 2015.

The paper also explores other uses of the embeddings, such as clustering to group like-faces based on their extracted features.

It is a robust and effective face recognition system, and the general nature of the extracted face embeddings lends the approach to a range of applications.

How to Load a FaceNet Model in Keras

There are a number of projects that provide tools to train FaceNet-based models and make use of pre-trained models.

Perhaps the most prominent is called OpenFace that provides FaceNet models built and trained using the PyTorch deep learning framework. There is a port of OpenFace to Keras, called Keras OpenFace, but at the time of writing, the models appear to require Python 2, which is quite limiting.

Another prominent project is called FaceNet by David Sandberg that provides FaceNet models built and trained using TensorFlow. The project looks mature, although at the time of writing does not provide a library-based installation nor clean API. Usefully, David’s project provides a number of high-performing pre-trained FaceNet models and there are a number of projects that port or convert these models for use in Keras.

A notable example is Keras FaceNet by Hiroki Taniai. His project provides a script for converting the Inception ResNet v1 model from TensorFlow to Keras. He also provides a pre-trained Keras model ready for use.

We will use the pre-trained Keras FaceNet model provided by Hiroki Taniai in this tutorial. It was trained on MS-Celeb-1M dataset and expects input images to be color, to have their pixel values whitened (standardized across all three channels), and to have a square shape of 160×160 pixels.

The model can be downloaded from here:

Download the model file and place it in your current working directory with the filename ‘facenet_keras.h5‘.

We can load the model directly in Keras using the load_model() function; for example:

Running the example loads the model and prints the shape of the input and output tensors.

We can see that the model indeed expects square color images as input with the shape 160×160, and will output a face embedding as a 128 element vector.

Now that we have a FaceNet model, we can explore using it.

How to Detect Faces for Face Recognition

Before we can perform face recognition, we need to detect faces.

Face detection is the process of automatically locating faces in a photograph and localizing them by drawing a bounding box around their extent.

In this tutorial, we will also use the Multi-Task Cascaded Convolutional Neural Network, or MTCNN, for face detection, e.g. finding and extracting faces from photos. This is a state-of-the-art deep learning model for face detection, described in the 2016 paper titled “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks.”

We will use the implementation provided by Iván de Paz Centeno in the ipazc/mtcnn project. This can also be installed via pip as follows:

We can confirm that the library was installed correctly by importing the library and printing the version; for example:

Running the example prints the current version of the library.

We can use the mtcnn library to create a face detector and extract faces for our use with the FaceNet face detector models in subsequent sections.

The first step is to load an image as a NumPy array, which we can achieve using the PIL library and the open() function. We will also convert the image to RGB, just in case the image has an alpha channel or is black and white.

Next, we can create an MTCNN face detector class and use it to detect all faces in the loaded photograph.

The result is a list of bounding boxes, where each bounding box defines a lower-left-corner of the bounding box, as well as the width and height.

If we assume there is only one face in the photo for our experiments, we can determine the pixel coordinates of the bounding box as follows. Sometimes the library will return a negative pixel index, and I think this is a bug. We can fix this by taking the absolute value of the coordinates.

We can use these coordinates to extract the face.

We can then use the PIL library to resize this small image of the face to the required size; specifically, the model expects square input faces with the shape 160×160.

Tying all of this together, the function extract_face() will load a photograph from the loaded filename and return the extracted face. It assumes that the photo contains one face and will return the first face detected.

We can use this function to extract faces as needed in the next section that can be provided as input to the FaceNet model.

How to Develop a Face Classification System

In this section, we will develop a face detection system to predict the identity of a given face.

The model will be trained and tested using the ‘5 Celebrity Faces Dataset‘ that contains many photographs of five different celebrities.

We will use an MTCNN model for face detection, the FaceNet model will be used to create a face embedding for each detected face, then we will develop a Linear Support Vector Machine (SVM) classifier model to predict the identity of a given face.

5 Celebrity Faces Dataset

The 5 Celebrity Faces Dataset is a small dataset that contains photographs of celebrities.

It includes photos of: Ben Affleck, Elton John, Jerry Seinfeld, Madonna, and Mindy Kaling.

The dataset was prepared and made available by Dan Becker and provided for free download on Kaggle. Note, a Kaggle account is required to download the dataset.

Download the dataset (this may require a Kaggle login), data.zip (2.5 megabytes), and unzip it in your local directory with the folder name ‘5-celebrity-faces-dataset‘.

You should now have a directory with the following structure (note, there are spelling mistakes in some directory names, and they were left as-is in this example):

We can see that there is a training dataset and a validation or test dataset.

Looking at some of the photos in the directories, we can see that the photos provide faces with a range of orientations, lighting, and in various sizes. Importantly, each photo contains one face of the person.

We will use this dataset as the basis for our classifier, trained on the ‘train‘ dataset only and classify faces in the ‘val‘ dataset. You can use this same structure to develop a classifier with your own photographs.

Detect Faces

The first step is to detect the face in each photograph and reduce the dataset to a series of faces only.

Let’s test out our face detector function defined in the previous section, specifically extract_face().

Looking in the ‘5-celebrity-faces-dataset/train/ben_afflek/‘ directory, we can see that there are 14 photographs of Ben Affleck in the training dataset. We can detect the face in each photograph, and create a plot with 14 faces, with two rows of seven images each.

The complete example is listed below.

Running the example takes a moment and reports the progress of each loaded photograph along the way and the shape of the NumPy array containing the face pixel data.

A figure is created containing the faces detected in the Ben Affleck directory.

We can see that each face was correctly detected and that we have a range of lighting, skin tones, and orientations in the detected faces.

Plot of 14 Faces of Ben Affleck Detected From the Training Dataset of the 5 Celebrity Faces Dataset

Plot of 14 Faces of Ben Affleck Detected From the Training Dataset of the 5 Celebrity Faces Dataset

So far, so good.

Next, we can extend this example to step over each subdirectory for a given dataset (e.g. ‘train‘ or ‘val‘), extract the faces, and prepare a dataset with the name as the output label for each detected face.

The load_faces() function below will load all of the faces into a list for a given directory, e.g. ‘5-celebrity-faces-dataset/train/ben_afflek/‘.

We can call the load_faces() function for each subdirectory in the ‘train‘ or ‘val‘ folders. Each face has one label, the name of the celebrity, which we can take from the directory name.

The load_dataset() function below takes a directory name such as ‘5-celebrity-faces-dataset/train/‘ and detects faces for each subdirectory (celebrity), assigning labels to each detected face.

It returns the X and y elements of the dataset as NumPy arrays.

We can then call this function for the ‘train’ and ‘val’ folders to load all of the data, then save the results in a single compressed NumPy array file via the savez_compressed() function.

Tying all of this together, the complete example of detecting all of the faces in the 5 Celebrity Faces Dataset is listed below.

Running the example may take a moment.

First, all of the photos in the ‘train‘ dataset are loaded, then faces are extracted, resulting in 93 samples with square face input and a class label string as output. Then the ‘val‘ dataset is loaded, providing 25 samples that can be used as a test dataset.

Both datasets are then saved to a compressed NumPy array file called ‘5-celebrity-faces-dataset.npz‘ that is about three megabytes and is stored in the current working directory.

This dataset is ready to be provided to a face detection model.

Create Face Embeddings

The next step is to create a face embedding.

A face embedding is a vector that represents the features extracted from the face. This can then be compared with the vectors generated for other faces. For example, another vector that is close (by some measure) may be the same person, whereas another vector that is far (by some measure) may be a different person.

The classifier model that we want to develop will take a face embedding as input and predict the identity of the face. The FaceNet model will generate this embedding for a given image of a face.

The FaceNet model can be used as part of the classifier itself, or we can use the FaceNet model to pre-process a face to create a face embedding that can be stored and used as input to our classifier model. This latter approach is preferred as the FaceNet model is both large and slow to create a face embedding.

We can, therefore, pre-compute the face embeddings for all faces in the train and test (formally ‘val‘) sets in our 5 Celebrity Faces Dataset.

First, we can load our detected faces dataset using the load() NumPy function.

Next, we can load our FaceNet model ready for converting faces into face embeddings.

We can then enumerate each face in the train and test datasets to predict an embedding.

To predict an embedding, first the pixel values of the image need to be suitably prepared to meet the expectations of the FaceNet model. This specific implementation of the FaceNet model expects that the pixel values are standardized.

In order to make a prediction for one example in Keras, we must expand the dimensions so that the face array is one sample.

We can then use the model to make a prediction and extract the embedding vector.

The get_embedding() function defined below implements these behaviors and will return a face embedding given a single image of a face and the loaded FaceNet model.

Tying all of this together, the complete example of converting each face into a face embedding in the train and test datasets is listed below.

Running the example reports progress along the way.

We can see that the face dataset was loaded correctly and so was the model. The train dataset was then transformed into 93 face embeddings, each comprised of a 128 element vector. The 25 examples in the test dataset were also suitably converted to face embeddings.

The resulting datasets were then saved to a compressed NumPy array that is about 50 kilobytes with the name ‘5-celebrity-faces-embeddings.npz‘ in the current working directory.

We are now ready to develop our face classifier system.

Perform Face Classification

In this section, we will develop a model to classify face embeddings as one of the known celebrities in the 5 Celebrity Faces Dataset.

First, we must load the face embeddings dataset.

Next, the data requires some minor preparation prior to modeling.

First, it is a good practice to normalize the face embedding vectors. It is a good practice because the vectors are often compared to each other using a distance metric.

In this context, vector normalization means scaling the values until the length or magnitude of the vectors is 1 or unit length. This can be achieved using the Normalizer class in scikit-learn. It might even be more convenient to perform this step when the face embeddings are created in the previous step.

Next, the string target variables for each celebrity name need to be converted to integers.

This can be achieved via the LabelEncoder class in scikit-learn.

Next, we can fit a model.

It is common to use a Linear Support Vector Machine (SVM) when working with normalized face embedding inputs. This is because the method is very effective at separating the face embedding vectors. We can fit a linear SVM to the training data using the SVC class in scikit-learn and setting the ‘kernel‘ attribute to ‘linear‘. We may also want probabilities later when making predictions, which can be configured by setting ‘probability‘ to ‘True‘.

Next, we can evaluate the model.

This can be achieved by using the fit model to make a prediction for each example in the train and test datasets and then calculating the classification accuracy.

Tying all of this together, the complete example of fitting a Linear SVM on the face embeddings for the 5 Celebrity Faces Dataset is listed below.

Running the example first confirms that the number of samples in the train and test datasets is as we expect

Next, the model is evaluated on the train and test dataset, showing perfect classification accuracy. This is not surprising given the size of the dataset and the power of the face detection and face recognition models used.

We can make it more interesting by plotting the original face and the prediction.

First, we need to load the face dataset, specifically the faces in the test dataset. We could also load the original photos to make it even more interesting.

The rest of the example is the same up until we fit the model.

First, we need to select a random example from the test set, then get the embedding, face pixels, expected class prediction, and the corresponding name for the class.

Next, we can use the face embedding as an input to make a single prediction with the fit model.

We can predict both the class integer and the probability of the prediction.

We can then get the name for the predicted class integer, and the probability for this prediction.

We can then print this information.

We can also plot the face pixels along with the predicted name and probability.

Tying all of this together, the complete example for predicting the identity for a given unseen photo in the test dataset is listed below.

A different random example from the test dataset will be selected each time the code is run.

Try running it a few times.

In this case, a photo of Jerry Seinfeld is selected and correctly predicted.

A plot of the chosen face is also created, showing the predicted name and probability in the image title.

Detected Face of Jerry Seinfeld, Correctly Identified by the SVM Classifier

Detected Face of Jerry Seinfeld, Correctly Identified by the SVM Classifier

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Papers

Books

Projects

APIs

Summary

In this tutorial, you discovered how to develop a face detection system using FaceNet and an SVM classifier to identify people from photographs.

Specifically, you learned:

  • About the FaceNet face recognition system developed by Google and open source implementations and pre-trained models.
  • How to prepare a face detection dataset including first extracting faces via a face detection system and then extracting face features via face embeddings.
  • How to fit, evaluate, and demonstrate an SVM model to predict identities from faces embeddings.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning Models for Vision Today!

Deep Learning for Computer Vision

Develop Your Own Vision Models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Computer Vision

It provides self-study tutorials on topics like:
classification, object detection (yolo and rcnn), face recognition (vggface and facenet), data preparation and much more...

Finally Bring Deep Learning to your Vision Projects

Skip the Academics. Just Results.

See What's Inside

119 Responses to How to Develop a Face Recognition System Using FaceNet in Keras

  1. Abkul June 7, 2019 at 6:28 am #

    Great tutorial.

    Was looking at whether Transfer learning, Siamese network and triplet loss approaches are applicable to animal face(eg a sheep, goat etc) recognition particularly mobileNet(or otherwise) when your crystal clear blog came up.

    Kindly shed more light on its applicability and any other auxiliary hints.

  2. Shravan Kumar June 7, 2019 at 3:16 pm #

    Hi Jason,

    This is fantastic, thanks for sharing.

    What do you suggest when we have tens of thousands of classes.

    A Facenet model itself as a classifier or a specific classifier model is to be trained. In terms of scalability and performance which is the preferred method.

    Referring to:
    “The FaceNet model can be used as part of the classifier itself, or we can use the FaceNet model to pre-process a face to create a face embedding that can be stored and used as input to our classifier model. This latter approach is preferred as the FaceNet model is both large and slow to create a face embedding.”

    • Jason Brownlee June 8, 2019 at 6:34 am #

      Good question, the facenet embedding approach is a good starting point, but perhaps check the literature for more scalable approaches.

  3. Anand June 21, 2019 at 11:06 pm #

    Hi jason,
    As per my understanding The triplet loss is used so that the model can also learn the dissimilarity between the classes rather than only learning similarity between same class.
    But here we are not training our model on the actual image dataset on which we need our classification to be done. Rather we are using SVM for that.
    So, how can we make use of triplet loss in this method of face recognition?

  4. Karan Sharma June 24, 2019 at 2:36 pm #

    Hi Jason,

    I want to try this on Cat and Dog dataset. What do you thing the pre-trained networks face embeddings will work in this case?

    • Jason Brownlee June 25, 2019 at 6:08 am #

      No, I don’t think it will work for animals.

      Interesting idea though.

      • Karan Sharma June 25, 2019 at 2:40 pm #

        Thanks for your reply.

        What do you think how much effort will it take to train facenet from scratch?

        And certainly how much data?

  5. Karan Sharma June 26, 2019 at 3:19 pm #

    Thanks for the response Jason.

  6. Karan Sharma June 28, 2019 at 4:27 pm #

    Hi Jason,

    Can MTCNN detect faces of cats and dogs from image?

    • Jason Brownlee June 29, 2019 at 6:35 am #

      I don’t see why not, but the model would have to be trained on dogs and cats.

  7. Thinh July 8, 2019 at 5:35 pm #

    Hi Jason,
    Thanks for a very nice tutor. But i cant set up mtcnn in my python2? Is there a way to install mtcnn for python2?

  8. Thinh July 10, 2019 at 7:47 pm #

    Hi Jason. nice tutor!
    But i wonder that, if i want to identify who is stranger. should i make a folder for ‘stranger’ contains alot of stranger faces???(exclude your 5 Celebrity Faces Dataset ??)

    • Jason Brownlee July 11, 2019 at 9:47 am #

      Good question.

      No, if a face does not match any known faces, it would be “unknown”. E.g. has a low probability for all known faces.

      • Thinh July 11, 2019 at 12:25 pm #

        Thank you very much. Keep writting tutorials to help thoudsands of thoudsands people like me a round the world learning ML,DL. <3

  9. Thinh July 10, 2019 at 7:48 pm #

    yeah! I figured out that MTCNN only require Python3.
    Follow this bellow: https://pypi.org/project/mtcnn/

  10. Nghia July 16, 2019 at 6:57 pm #

    Thanks for your tutorial.
    But when i flollow you, i have a warning :
    embeddings.npz is not UTF-8 encoded
    UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
    Do you know how to fix it.

    • Jason Brownlee July 17, 2019 at 8:22 am #

      You can safely ignore these warnings.

      • Nghia July 17, 2019 at 6:20 pm #

        Thank you so much.
        But when I have a new image to recognize, do I need to put it to the validate folder and rerun the code ?
        And how can we use this to recognite face in a video ?

        • Jason Brownlee July 18, 2019 at 8:23 am #

          You can use the model directly, e.g. in memory.

          Perhaps you can process each frame of video as an image with the model?

  11. Karim July 21, 2019 at 8:06 pm #

    Hello Jason,
    Thanks for your wonderful tutorial, I’d like to know what is the best solution to apply recognition part if I have a very small dataset -only one face per each identity- in this case, I think SVM wouldn’t help.

    • Jason Brownlee July 22, 2019 at 8:24 am #

      I think a model that uses a face embedding as an input would be a great starting point.

  12. Esha July 27, 2019 at 10:16 pm #

    Hello Jason,
    Thank you for this amazing tutorial, I used python 3 to run this code. I would like to know why am i getting this error (No module named ‘mtcnn’) and how can I correct it?

    • Jason Brownlee July 28, 2019 at 6:46 am #

      The error suggests you must install the mtcnn library:

  13. Sabbir July 28, 2019 at 4:39 pm #

    I want to use transfer learning for masked face recognition. But i didn’t found any better masked face recognition dataset. I need a masked face dataset with proper labeling of each candidate. So is there any better masked face dataset available? where can i find this dataset?

    • Jason Brownlee July 29, 2019 at 6:10 am #

      Perhaps you can take an existing face dataset and mask the faces using opencv or similar?

      • Sabbir July 29, 2019 at 9:04 pm #

        Thanks for response. Can you refer any work or blog like your for doing mask face using opencv or similar?

        • Jason Brownlee July 30, 2019 at 6:11 am #

          Sorry, I do not have a tutorial on this topic, perhaps in the future.

  14. Al August 10, 2019 at 3:25 pm #

    Hello Jason, great tutorial.
    Im beginner in python.

    I try to understand your code, and little bit confusing when you choice random example from dataset when doing classification

    in line 28. selection = choice([i for i in range(testX.shape[0])]),
    its choose random vector value in testX.shape[0] from embeddings.npz right?

    so how if we want using spesific image from val folder?, Can you refer any work or blog to doing this

    Thanks.

  15. Al August 12, 2019 at 3:41 pm #

    Thank you so much for the respons,

    well I tried and it worked.

    But I have another question, because sometime when i run the code, all worked perfectly and when I run the code again, sometime i have this error warning in load_model part although the face recognition still work

    UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
    warnings.warn(‘No training configuration found in save file: ‘

    why did this happen?

    Thanks.

    • Jason Brownlee August 13, 2019 at 6:05 am #

      Well done!

      You can safely ignore that warning message.

  16. Jack August 15, 2019 at 7:55 pm #

    Hi Jason, while I was executing the code “load_model(‘facenet_keras.h5’)”, the exception “tuple index out of range” is thrown, can you tell me why? thanks in advance.

  17. Steve August 21, 2019 at 6:11 pm #

    Hi Jason, again wonderful article and tutorial you provided to us. I wonder how I can customize dataset for my needs such as my friends dataset and perform training on it?

  18. Hamed August 27, 2019 at 11:53 am #

    Thanks Jason, really helpful as always but I got a weird “invalid argument” error. But I fixed it by changing ‘f’ to ‘F’ in facenet_keras.h5 because I notice it couldn’t recognize character ‘f’. Maybe because it’s trained on Ubuntu but I run your code on Windows 10. I don’t know!

    • Jason Brownlee August 27, 2019 at 2:16 pm #

      Nice work!

      • Hamed August 29, 2019 at 1:39 am #

        Thank you! Dear Jason, could you please tell me how I can get access to other properties of model. I mean I don’t need model.predict. I need other properties. Is there a way to list all of them such as different convs or avgpool. I tried __dict__ and dir() but they don’t give what I want. For example, how did you know model has a property called “.predict”? Where can I find all of them? Thank you!

        • Jason Brownlee August 29, 2019 at 6:15 am #

          You can access all the weights via get_weights()

  19. Akash August 28, 2019 at 6:02 pm #

    Jason Can you Please post a tutorial on how to convert David sandberg tensorflow model in keras using Hiroki Tanai script to convert it into keras

  20. Wajeeha Jamil August 28, 2019 at 9:37 pm #

    How can I convert this script to tensorflow lite format in order to be used in an android applicaton?? Pleaseeee helpp !!

    • Jason Brownlee August 29, 2019 at 6:06 am #

      Sorry, I don’t have experience with that transform.

  21. Jahir August 29, 2019 at 1:48 am #

    This will work for many hundreds of people?

  22. Saurabh September 4, 2019 at 7:46 pm #

    Hello Jason,

    Thanks for sharing the interesting article!

    I have read your two articles on Face Verification: 1) this one and 2) https://machinelearningmastery.com/how-to-perform-face-recognition-with-vggface2-convolutional-neural-network-in-keras/

    Which one would you suggest? If I have to develop Face Verification system then there are few approaches (listing two approaches from your article):

    Approach 1: Detect face using MTCNN, train VGGFACE2 on the collected dataset which helps to predict the probability of a given face belonging to a particular class

    Approach 2: Detect face using MTCNN, get face embedding vector using facenet keras model and then apply SVM or Neural Network to predict classes

    Which approach would you recommend? Can you please explain?

    Thanks for sharing views.

    • Jason Brownlee September 5, 2019 at 6:53 am #

      Perhaps prototype a few approaches and see what works well for your specific project and requirements?

      • Saurabh September 5, 2019 at 5:49 pm #

        It means, I can try both approaches and have a look at efficiency, and select an approach with the best accuracy.

        Thank you!

      • Saurabh September 5, 2019 at 5:55 pm #

        Hi,

        I am looking for Speech recognition tutorial on Deep Learning using Keras.

        I have gone through your this URL: https://machinelearningmastery.com/category/deep-learning/ but I couldn’t find any tutorial.

        Could you please point to the tutorial link (if you have)?

        Thank you!

        • Jason Brownlee September 6, 2019 at 4:52 am #

          Sorry, I don’t have tutorials on that topic, I hope to cover it in the future.

  23. Alexander September 8, 2019 at 9:11 pm #

    Thanks for the tutorial.
    Unit length normalization isn’t for SVM. For SVM you typically use range scaling – MinMaxScaler, or standardization – StandardScaler. The goal is to make different features uniform. Actually, it’s a surprise that unit length normalization produced 100% accuracy in your case. That’s probably due to small data. It does not work for SVM in general and didn’t work for me.

    • Jason Brownlee September 9, 2019 at 5:15 am #

      Thanks for your note.

      I followed best practices when using face embeddings from the literature.

  24. Ahmad September 9, 2019 at 8:32 pm #

    Hi Jason,

    Great article. You have explained all the necessary steps to implement a face recognition system. I am working on a similar problem but in a bigger scale. I am in a belief that a classification based face identification is not a scalable solution. Please give me your opinion.

    If I want to recognise a thousand faces in real time manner then, what type of changes do I need to make to your implementation.

    I believe it would be really helpful if you create an article about large scale face recognition.

    • Jason Brownlee September 10, 2019 at 5:45 am #

      Good question, perhaps an efficient data structure like a kdtree for the pre-computed embeddings?

  25. Wajeeha Jamil September 9, 2019 at 11:22 pm #

    Can we extract eyes part out of the extracted face using mtcnn detector?? Any help..

    • Jason Brownlee September 10, 2019 at 5:49 am #

      I don’t see why not.

      It will find them with a point, you can draw a circle around that point and roughly extract the eyes.

      You might have to scale the faces/images to the same pixel size prior to modeling and extraction.

      Let me know how you go.

  26. azouz September 10, 2019 at 1:14 am #

    bonsoir monsieur pouvez vous me dire cette application peut fonctionner avec une interface tkinter qui affiche le nom et prénom et la photo reconnu

  27. Abhijit Kumar September 16, 2019 at 5:41 pm #

    Hi Jason,
    1 . Here we are using 5 faces, what if we have thousands of faces, how to get the identity or index of those faces.

    2. If we have fingerprints or voice which pretrained model would be most suitable.

    • Jason Brownlee September 17, 2019 at 6:24 am #

      I don’t see why not.

      You may need a different model for fingerprints/voice.

      • Abhijit Kumar September 18, 2019 at 3:05 pm #

        If I have thousands of faces, SVM takes a lot of time. What do I do to get a quick result?

        • Jason Brownlee September 19, 2019 at 5:49 am #

          Perhaps try as simpler linear model?
          Perhaps try running on a large EC2 instance?
          Perhaps try running the code in parallel?

  28. arundev September 24, 2019 at 3:07 am #

    Once i have trained the model on 5 class (each class having 50 images). Now i use the model to detect images that it has not seen, it correctly guesses that the person in the image is class A for example with an accuracy ( prediction ) 65%. Is it possible to now add such image back to training and expect to get better results ?

  29. Anna September 25, 2019 at 4:59 pm #

    Awesome post, thanks for sharing.

  30. Abhinav September 25, 2019 at 7:21 pm #

    Hi Jason

    Thanks for this tutorial. Its really helpful. I wanted to know why you used train and val dataset. I mean are these two used for training purpose. What is the use of val here.?

    In the face classification, I am not able to understand where are you selecting the random photo to test against your dataset. How can I add my jpg photo to test again the dataset. Can you explain please. Thanks

    • Jason Brownlee September 26, 2019 at 6:33 am #

      Here, the validation set is a hold out or test set.

      We fit on the train set, then confirm the model has skill by evaluating it on the test set (val).

      You could add a new directory to train and val sets, to train and evaluate the model for new people.

      • Abhinav September 26, 2019 at 6:23 pm #

        Got it. Thanks
        What I have seen is that in train dataset I put my pictures more than 30 images and in val dataset I put 1 image of mine for testing. So it was recognizing me fine. But when put some other person pic in val dataset, it was still recognizing it as me

        Any idea how can this be solved

        • arun September 27, 2019 at 12:08 am #

          Yes, even i was wondering this.

          I have trained on 30 classes with
          45 images in Train folder and
          15 images in Test folder (val)

          after this upon testing with a new image which belongs to a class
          im getting good results:
          Image A – class A (99.996 %) which is correct
          Image X – class A (99.996 %) it belongs to an unkown class to the model but still it says that it belongs to class A with extremely high confidence.
          Any thoughts on why this occurs ??

          • Jason Brownlee September 27, 2019 at 8:03 am #

            You must define the problem and model the way you intend to use it.

            If you want the model to classify unknown people as unknown, you must give examples during training.

        • Jason Brownlee September 27, 2019 at 7:48 am #

          You might need to train the model on “you” vs “not-you”, or people vs unknown.

          • arun September 27, 2019 at 4:11 pm #

            Thanks for your reply.
            Could you please explain or guide to towards the direction of
            “””You might need to train the model on “you” vs “not-you”, or people vs unknown.”””

            So when we train the model, do i put a unkown folder?
            like train folder :
            class A (30 images)
            class B (30 images)
            unkown ???.

            Sorry if this doesnt make sense, its a bit hard to understand what you mean by train the model on “you” vs “not-you”.

            Help would be appriciated.

            Thanks

          • Jason Brownlee September 28, 2019 at 6:11 am #

            Yes, if your goal is to fit a model that can detect you, and when you show it photos of other people, it knows it’s not you, then you need a dataset of lots of photos of you and photos of other people (unknown group) so the model can learn the difference.

  31. Parikshit September 28, 2019 at 8:20 pm #

    Hi Jason,

    Thanks for the code. it is very helpful.

    For a few images i am getting a error as follows

    AttributeError: ‘JpegImageFile’ object has no attribute ‘getexif’ or
    AttributeError: module ‘PIL.Image’ has no attribute ‘Exif’

    This error occurs when i use the Image.open command to import the image.

    few examples for the images i am getting an error are as follows:

    httpssmediacacheakpinimgcomxfecfecaefaadfebejpg.jpg (traing data for elton john)
    httpwwwjohnpauljonesarenacomeventimagesEltonCalendarVjpg.jpg (training data fro elton john)

    I tried searching for this issue online but was not able to find any helpful solution. Do you have any idea how i may solve this issue?

    Thanks

  32. Debanik October 7, 2019 at 7:41 am #

    is it possible in the real-time after training?

    • Jason Brownlee October 7, 2019 at 8:34 am #

      I don’t see why not.

      • Debanik Roy October 7, 2019 at 5:12 pm #

        please write a programme about how it works in real-world after training?

        • Jason Brownlee October 8, 2019 at 7:54 am #

          Do you mean making a prediction after it is trained?

          If so, the tutorial above shows that.

          Otherwise, can you please elaborate what you mean exactly?

  33. Gabriel October 7, 2019 at 11:51 pm #

    Hi Jason,
    Your content is so much helpful.

    What about a system where new faces can be registered? Would I have to retrain the model, containing the new classes (new faces)?

    Thanks.

    • Jason Brownlee October 8, 2019 at 8:05 am #

      No, just create the face embeddings and fit the classification model.

  34. Abhishek Gupte October 9, 2019 at 7:24 am #

    Hey Jason,
    First of all thank you so much for putting out the effort and organizing this tutorial! You’re truly awesome! 🙂
    So I extracted facial embeddings of 3 people(6 high-quality high-resolution 5MP+ images per person as dataset )and trained them using SVM and used my built-in DELL WEBCAM(need I mention it generates a mirror image , ie my left hand appears right on the screen; also it’s a 0.3 MP 640×480 resolution feed) to identify faces.
    So my problem is that the probabilities are always different for the same trained face by sometimes a difference as great as 20% under the same lighting conditions! It’s mostly around 71% but sometimes dwindles to 51% for a trained face. For a stranger face it varies between 40% and 68% hence because of this variation, I can’t set a single probability value as a threshold and it’s really cumbersome.
    Can these differences be because of the difference in webcam quality and the dataset quality, that the algorithm has a tough time identifying the faces and varies the probability all the time, given the embeddings generated by the dataset are of much higher quality than those of the feed and also that the feed generates a mirror image of the faces in the dataset?

    Hope this isn’t too much trouble 🙂

    • Abhishek Gupte October 9, 2019 at 7:26 am #

      forgot to mention, the variations in probability happen whenever I run the program on different occasions

      • Jason Brownlee October 9, 2019 at 8:21 am #

        Could also be light level, etc.

        Try data prep that counters these issues?

    • Jason Brownlee October 9, 2019 at 8:20 am #

      Yes, it is likely the image quality.

      Perhaps downsample the quality of all images to the same baseline level before fitting/using models?

      • Abhishek Gupte October 9, 2019 at 8:30 am #

        I’ll do just that. Any idea how to downsample? A friend tried with same dataset but with Logitech C310 HD webcam and got a consistent probability score .It’s unlikely it’s the light level in my case as it shows variations in probability at the exact same light conditions.

          • Abhishek Gupte October 10, 2019 at 4:20 am #

            Thank you for your prompt replies!
            Also, can mirroring the feed be the cause as I’ve mentioned my webcam does that?

          • Jason Brownlee October 10, 2019 at 7:04 am #

            Probably not related.

          • Abhishek Gupte October 10, 2019 at 7:44 am #

            Okay. One quick thing.
            Does both the dataset and webCam feed have to be of the same quality? Cuz I trained my face, Emma Watson and Daniel Radcliffe’s faces (their images size around 5 kb) and my image quality around 70 kb and there’s still some variation in probability

          • Jason Brownlee October 10, 2019 at 2:16 pm #

            Generally yes, but you can achieve the desired result by using data augmentation either during training, during testing, or both to get the model used to data of varying quality – or to downsample all images to the same quality prior to training/using the model.

  35. Debanik Roy October 9, 2019 at 8:02 pm #

    sir, My problem is my model is not able to distinguish between a known and unknown person in the real world.
    Do you have any idea about how to identify an unknown person in the real world?

    • Jason Brownlee October 10, 2019 at 6:56 am #

      You must train the model on known and unknown people for it to learn this distinction.

  36. Abhishek Gupte October 10, 2019 at 10:42 pm #

    I guess it’s not really “data augmentation” when 5 out of the 6 images for Daniel radcliffe are 6KB, the last 67…and my face image quality are on average 120 KB, whereas for Emma Watson 2 out of the 3 images are 7 kb and the last 70. (The images generated by the webcam feed, the “test set” are 70 kb.). I guess both the dataset and feed should be same baseline image quality right?

  37. Akhil Kumar October 10, 2019 at 11:01 pm #

    Thanks for the great tutorial,

    By creating a data set of 500 people of 50 images each and train the model, can I expect good accuracy regarding detection?
    Can I try deploying the same model on a Raspberry Pi with a Pi Camera?
    Can you suggest any idea about adding a new person’s face to the model once it is deployed?

    • Jason Brownlee October 11, 2019 at 6:21 am #

      Perhaps try it and see?

      Yes, compute face embeddings for new people as needed and re-fit the model.

  38. RK October 11, 2019 at 5:38 am #

    Hi!
    My question is that you train the network every time you want to recognise a face set.

    How can we train once and run it multiple times ,say like a new face set every day.

    Is it possible to implement this in the example you have coded?

    • Jason Brownlee October 11, 2019 at 6:25 am #

      No, the network is trained once. We use face embeddings from the network as inputs to a classifier that has to be updated when new people are added.

  39. Hai October 11, 2019 at 12:40 pm #

    Hi Jason
    i run the code and add two of my own photos in both train and val dataset, SVM predict show correct class_index for my own photos in val dataset, but the SVM predict_proba show probability as below: class_index is 2(0.01181915, 0.01283217), it is the smallest value.
    [0.14074676 0.21515797 0.01181915 0.10075247 0.15657367 0.37494997]
    [0.1056123 0.20128499 0.01283217 0.1100254 0.23492927 0.33531586]

    i see documents saying that SVM predict_proba show meanlingless result on small dataset, is it caused by that? How can i detect one-face class probability?

    second question: can you show more code on how to train unknown people class?

  40. Saurabh October 21, 2019 at 6:45 am #

    Hello Jason,

    Thank you again for sharing the nice blog.

    I went through your tutorial and I got 100 train and test efficiency. Till now everything is clear.

    But the problem arises when I apply the developed model (through your tutorial) to live frames through a webcam (my train ~700 images/class and test images ~300 images/class are captured using webcam).

    The model does more misclassification when I apply the trained model to a frame from a webcam.

    I am not sure how to normalize the face embedding vector in this case?

    Could you please guide me?

    Thanking you,
    Saurabh

  41. Saurabh October 21, 2019 at 5:20 pm #

    Hello Jason,

    Thanks for the reply. It means I should normalize the input image rather than the embedding. If the input image is normalized then I don’t need to normalize the embedding.

    Please feel free to correct me!

    Thanking you,
    Saurabh

    • Jason Brownlee October 22, 2019 at 5:44 am #

      Perhaps try scaling pixels prior to using the embedding.

  42. jada October 22, 2019 at 12:08 am #

    Hello Jason, Thank you for your tutorials, is there a method i can use to implement on the raspberry pi 3 ?

    kind regards

    jada

    • Jason Brownlee October 22, 2019 at 5:51 am #

      Sorry, I don’t know about “raspberry pi 3”.

  43. HB October 22, 2019 at 12:35 am #

    hi, thanks for the tutorial! it was really helpful!!

    I have followed the tutorial and got successful result with my own data set of pictures.

    Let’s say I used person A, B, C to trained the model

    Now I’m trying to introduce a unsorted pictures of the above 3 people(A, B, C) in one folder and sort them
    based on the code from your tutorial.

    However, I can’t figure out how to introduce the new and unsorted pictures into the above code.

    please help?

    Thank you in advance!

    • Jason Brownlee October 22, 2019 at 5:53 am #

      Sounds straightforward, what is the specific problem you’re having?

      • HB October 22, 2019 at 9:21 am #

        I can’t figure out how to introduce the new unsorted pictures into the code. I tried making an npz file using the new pictures in one folder and loading them into the classifier(# load faces
        data = load(‘5-celebrity-faces-dataset.npz’), but the classification result was pretty bad so im assuming what i did is not correct.

  44. RK October 22, 2019 at 2:12 am #

    Hey Jason!
    I am trying to develop an attendance collector from video footage.
    My problem arises during the classification part,it constantly interchanges the output label names.

    Say X and Y are two people then X is always identified as Y and vice versa.
    The code is error free and there is no error in the input labels.

    How can correct this.Will this be solved if use something apart from SVM. If so what?
    Or should i do some fine tuning as specified in one of your earlier answers?
    Please guide me.

    Awaiting your reply.

    • Jason Brownlee October 22, 2019 at 5:56 am #

      Perhaps some of the labels for these people were swapped in the training data?

Leave a Reply