Recently I’ve been thinking about classification again, this time with more of an emphasis on object detection. Even surprising complex and diverse objects can be detected using machine learning algorithms: trees, people, fruit…
First off, let’s draw a line between detection and recognition. For example, when it comes to people, facial detection is simply telling you whether there is a face, any face, present in the data; facial recognition is telling you whether there is a specific face present in the data, i.e. a particular person.
The two things are also done in different ways. This great diagram by Shiqiao Du summarises the Python approaches for the two objectives.
Here I’m just going to talk about detection.
Detecting faces in images is something that happens for a variety of purposes in a range of places. Like on Facebook when they ask you to tag your friends in photos and they highlight faces to help you.
To do it in Python one of the simplest routes is to use the OpenCV library. The Python version is pip installable using the following:
pip install opencv-python
How Does It Work?
The OpenCV face detection algorithm is based on the method in this paper. But, in a nutshell, it uses a Haar basis like this:
to decompose your image on multiple scales and then uses the co-efficients from this decomposition as machine learning features. It then takes these feature values and feeds them to a machine learning algorithm called Adaboost to classify regions in your image as “a face” or “not a face”. For this to work, the algorithm will need to have been trained on the same kind of ML features using a training dataset (i.e. a large number of pictures of faces).
If you have a very specific application in mind then you’ll probably need to provide your own training dataset, but to get started there are a variety of pre-trained classifier parameters out there already. You can find a bunch of them on github here.
Here are the libraries I’m going to use:
import cv2 import numpy as np import matplotlib.pyplot as pl import matplotlib.patches as patches from PIL import Image
Pick A Scene
To give this a whirl let’s pick a busy scene. Here’s a picture of Oxford Street in London:
I’m going to use this as my input image.
And I’m going to use one of the default classifier parameter sets for OpenCV. These are provided as XML files:
which we load into the OpenCV classifier:
If you grabbed the XML file from github be careful to download the actual XML file and not the HTML link to the file… if you do the latter by accident then you will see an error like this:
(-49) Input file is empty in function cvOpenFileStorage
To download a single file from github: click on the file to open it, then hit “raw” in the top right, select all and paste into your text editor then save.
# read the imagefile into OpenCV: image=cv2.imread(imagefile) # convert the RGB image to a greyscale image: image_grey = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Now we can run the detection classifier. The classifier is run on our greyscale image:
faces = faceCascade.detectMultiScale( image_grey, scaleFactor=1.04, minNeighbors=5, minSize=(30, 30), #flags = cv2.cv.CV_HAAR_SCALE_IMAGE )
There are some variable parameters in this call.
The first is
scaleFactor. This controls the reduction factor between levels in the scale pyramid. If you go for a small value (like 1.01) your code will run more slowly but more thoroughly than if you chose a higher number.
The second is
minNeighbors, which affects the quality of the detected faces. Think of it as reducing your false positives. If you make it big (>5) you’ll make fewer detections, but it’s more likely that they’ll be actual faces.
The last one is
minSize, basically a limit on the allowed pixel dimensions of your smallest face.
So, how many faces did we find?
print "Found ",len(faces)," faces!"
We can draw a box around each detected face using the PIL Image library and matplotlib patches:
# create a PIL Image: im = np.array(Image.open(imagefile), dtype=np.uint8) # create a figure and axes: fig,ax = pl.subplots(1) # display the image: ax.imshow(im) # extract the position (x,y) and the width & height # of the detection: for (x, y, w, h) in faces: # make a rectangular patch: rect = patches.Rectangle((x,y),w,h,linewidth=1,edgecolor='r',facecolor='none') # add the patch to your axes: ax.add_patch(rect) # save the annotated image: pl.savefig("detection1.png") # display the annotated image: pl.show()
Things to notice…
(1) We’re really just getting the front facing faces;
(2) We’re only getting white or asian faces.
The first of these maybe isn’t surprising since we used a set of trained parameters called “frontalface”. The second is perhaps more surprising and probably represents a lack of diversity in the training data that produced the classifier parameters. This kind of bias in facial detection has been documented in the press before.
If you want to improve on the basic Haar-based face detection algorithm a popular dataset for testing your approach is the Labeled Faces in the Wild dataset.