Here we’ll create a dataset parser/processor and run it on the Yale Face dataset, which contains 165 grayscale images of 15 different people. This dataset is small but sufficient for our purpose – learning.
If you’ve seen the Minority Report movie, you probably remember the scene where Tom Cruise walks into a Gap store. A retinal scanner reads his eyes, and plays a customized ad for him. Well, this is 2020. We don’t need retinal scanners, because we have Artificial Intelligence (AI) and Machine Learning (ML)!
In this series, we’ll show you how to use Deep Learning to perform facial recognition, and then – based on the face that was recognized – use a Neural Network Text-to-Speech (TTS) engine to play a customized ad. You are welcome to browse the code here on CodeProject or download the .zip file to browse the code on your own machine.
We assume that you are familiar with the basic concepts of AI/ML, and that you can find your way around Python.
The series is built of five articles:
Get a Dataset
In the previous article, we described the process of detecting faces in an image. Now that we know how to obtain a cropped face image from a larger picture or a video, let’s assume that we’ve gone through this exercise and ended up with a dataset (face set) to train our CNN on. Before training, however, we need to process this dataset to categorize and normalize the data. In this article, we’ll create a dataset parser/processor and run it on the Yale Face dataset, which contains 165 grayscale images of 15 different people. This dataset is small but sufficient for our purpose – learning.
Prepare a Parser
The dataset parser will reside in two classes – an abstract and more general one, and one handling specifics of the selected dataset. Let’s look at the constructor of the parent class.
def __init__(self, path, extension_list, n_classes):
self.path = path
self.ext_list = extension_list
self.n_classes = n_classes
self.objects = 
self.labels = 
self.obj_validation = 
self.labels_validation = 
self.number_labels = 0
The constructor parameters are:
path: the path to the folder containing dataset samples (images)
extension_list: extensions of files to look for in the
path-defined folder (one or more)
n_classes: the number of classes to categorize the dataset into; for the Yale dataset, this will be 15 because this is the number of people in the dataset
We also create the next class objects:
objects: the images to use for CNN training
labels: the labels (subject numbers) that classify the images (objects)
obj_validation: a subset of the images used to validate the CNN after training
labels_validation: classifiers (labels) for the
number_labels: the total number of labels in the dataset
get_data() method is the one we’ll call after instantiating the
img_path_list = os.listdir(self.path)
self.objects, self.labels = self.fetch_img_path(img_path_list, self.path, vgg_img_processing)
The method is composed of two main calls: fetching the images from the defined path and processing them. To fetch the images, we loop through the files in the
path-defined folder. We then use SK-Image to load these files as grayscale images. This call returns a NumPy array containing every pixel in the image.
def fetch_img_path(self, img_path_list, path, vgg_img_processing):
images = 
labels = 
for img_path in img_path_list:
img_abs_path = os.path.abspath(os.path.join(path, img_path))
image = io.imread(img_abs_path, as_gray=True)
label = self.process_label(img_path)
return images, labels
def __check_ext(self, file_path):
for ext in self.ext_list:
process_label() is an abstract method in the
FaceDataSet class; its implementation happens in the
YaleDataSet class, where we parse the name of the image file from the dataset. The file names are in the "subjectXX.*" format. The method extracts the "XX" number from the file name and assigns it to the image.
def __init__(self, path, ext_list, n_classes):
super().__init__(path, ext_list, n_classes)
def process_label(self, img_path):
val = int(os.path.split(img_path).split(".").replace("subject", "")) - 1
if val not in self.labels:
process_data() method looks like this:
return train_test_split(self.objects, self.labels, test_size=0.3,
def process_data(self, vgg_img_processing):
self.objects, self.img_obj_validation, self.labels, self.img_labels_validation = \
self.labels = np_utils.to_categorical(self.labels, self.n_classes)
self.labels_validation = np_utils.to_categorical(self.img_labels_validation, self.n_classes)
self.objects = Common.reshape_transform_data(self.objects)
self.obj_validation = Common.reshape_transform_data(self.img_obj_validation)
In this method, we split the dataset into two parts. The second part contains images for validation of the training results. We use the
train_test_split() method from Scikit-Learn, and we transform the labels into categorical variables. If an image has classification "2" (from subject02), its categorical variable will be [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0] – a vector of 15th dimension (number of classes) with 1 in the 2nd component.
data = numpy.array(data)
result = Common.reshape_data(data)
return data.reshape(data.shape, constant.IMG_WIDTH, constant.IMG_HEIGHT, 1)
reshape_transform_data() method reshapes the data to fit the grayscale mode. In image processing, color images are considered as 3-channel grids; in other words, they are divided into 3 colors (RGB). Gray images have only one channel. Therefore, the initially color images need to be reshaped with "1" at the end.
to_float() method normalizes the data by dividing each pixel value by 255 (pixel values are between 0 and 255), which takes the entire pixel matrix to 0-1 space for better numerical input and faster convergence. Now we can set up our dataset in the main.py file, which will serve as the entry point of our application.
ext_list = ['gif', 'centerlight', 'glasses', 'happy', 'sad', 'leflight',
'wink', 'noglasses', 'normal', 'sleepy', 'surprised', 'rightlight']
n_classes = 15
dataSet = YaleFaceDataSet(constant.FACE_DATA_PATH, ext_list, n_classes)
Categorize the Dataset
Now we have a processed, categorized dataset ready to be used for CNN training. In the next article, we’ll put together our CNN and train it for face recognition. Stay tuned!