Oooo Face-recognition – iPhoto and all the cool kids are doing it, but how is it done? I thought I’d find out. I thought it might take a long time.
I found the OpenCV project. I stopped looking.
Turns out that many of the algorithms and techniques for working with computer vision have already been implemented in this handy project. In fact, the broad area it covers it one of its own worst enemies. With 500+ functions relating to image processing, filtering, feature detection, machine learning, object recognition and so on, it can be very hard to work out how to get started.
Luckily there are python bindings to the C functions, which make experimenting with it much easier.
I used the older SWIG based bindings at #dev8d which I installed using apt-get install python-opencv (I may have had to go and grab the iulib package manually from the debian archives however.) There is a newer version out, with a much cleaner (and actually documented) API binding – http://opencv.willowgarage.com/wiki/PythonInterface but you might find that you have to build that one yourself.
So, we have an undocumented python binding to a really powerful toolset. Fun. The key is guessing how SWIG autogenerates the binding name in the python module based on the C name, so that you can use the C documentation to work out what the hell is going on.
By far the simplest way, is to just look at some code that works – grab the code and resources from this github repository:
(I’ve included the code at the end of this post for lazy people)
How Face (or Object) recognition typically works
From my research, it seems that a technique called the Haar cascade is the easiest and most widespread hobbyist method for spotting faces. It can be understood to work in a similar fashion to a spam filter, in that you require ‘spam’ and ‘ham’ pictures to train it, so that it learns what a face looks like.
It’s more complex than that, as it returns the regions in which it believes it has found a face, but the basic principle of machine learning is the same. This technique has a number of benefits and drawbacks:
- Given enough training, it can spot most things that have noticiable features, not just faces, but bodies, tables, etc.
- It can be trained for specific purposes.
- You don’t need to model or mathematically understand the object it is trying to recognise to have it learn how to do so.
- BUT it is not useful at discriminating: You will need to do something extra to try to tell the difference between the faces
- A training run can take 2-6 days on a beefy machine to get back a cascade (effectively a brain) which may or may not be any good at the task.
Luckily, Naotoshi Seo has been good enough to share both their Haar cascades and write up a really excellent post on how to perform the haar training steps and I wholeheartedly recommend reading that. But if you are impatient, you can skip along.
So, we have some cascades, how do we actually use them? I wrote a 60 line python app during Dev8D that took webcam imagery, ran the cascade detection on the images and drew rectangles around the faces. I used PyGame to display the results (as I am familiar with how that works – I’ve used it for games and random demos a fair bit) and this is the code that is in that GitHub repository.
I’ll include a quick walkthrough of the detection routine, to take you step-by-step on how that works. In the code in the repo, there is also an example of how to load any image so that you can run the detection routine against it, which should be handy as a step 1 of a homebrew iPhoto face discriminator 🙂
import opencv # Magic convenience class from OpenCV with a method for grabbing images from opencv import highgui # This is how you get access to the first webcam your system finds. camera = highgui.cvCreateCameraCapture(-1) # This function grabs a frame from the webcam, runs 'detect' on the OpenCV image, and # converts the result to a PIL based image, suitable for reuse/blitting/storing def get_image(): im = highgui.cvQueryFrame(camera) detect(im) #convert Ipl image to PIL image return opencv.adaptors.Ipl2PIL(im) # The detection routine: def detect(image): # Find out how large the file is, as the underlying C-based code # needs to allocate memory in the following steps image_size = opencv.cvGetSize(image) # create grayscale version - this is also the point where the allegation about # facial recognition being racist might be most true. A caucasian face would have more # definition on a webcam image than an African face when greyscaled. # I would suggest that adding in a routine to overlay edge-detection enhancements may # help, but you would also need to do this to the training images as well. grayscale = opencv.cvCreateImage(image_size, 8, 1) opencv.cvCvtColor(image, grayscale, opencv.CV_BGR2GRAY) # create storage (It is C-based so you need to do this sort of thing) storage = opencv.cvCreateMemStorage(0) opencv.cvClearMemStorage(storage) # equalize histogram opencv.cvEqualizeHist(grayscale, grayscale) # detect objects - Haar cascade step # In this case, the code uses a frontal_face cascade - trained to spot faces that look directly # at the camera. In reality, I found that no bearded or hairy person must have been in the training # set of images, as the detection routine turned out to be beardist as well as a little racist! cascade = opencv.cvLoadHaarClassifierCascade('haarcascade_frontalface_alt.xml', opencv.cvSize(1,1)) faces = opencv.cvHaarDetectObjects(grayscale, cascade, storage, 1.2, 2, opencv.CV_HAAR_DO_CANNY_PRUNING, opencv.cvSize(50, 50)) if faces: for face in faces: # Hmm should I do a min-size check? # Draw a Chartreuse rectangle around the face - Chartruese rocks ;) opencv.cvRectangle(image, opencv.cvPoint( int(face.x), int(face.y)), opencv.cvPoint(int(face.x + face.width), int(face.y + face.height)), opencv.CV_RGB(127, 255, 0), 2) # RGB #7FFF00 width=2 [PyGame stuff - snip -] #demo image preparation aka how to load any image and run the detection routine on it cv_im = highgui.cvLoadImage("demo.jpg") detect(cv_im) pil_im = opencv.adaptors.Ipl2PIL(cv_im) def read_demo_image(): return pil_im while True: # Fixed demo for when you have no Webcam im = read_demo_image() # UNCOMMENT this and comment out the demo when you wish to use webcam #im = get_image() pil_img = pygame.image.frombuffer(im.tostring(), im.size, im.mode) screen.blit(pil_img, (0,0)) pygame.display.flip() pygame.time.delay(int(1000.0/fps))
Let me know if you have luck with this, and also if you have made progress on the harder second step of discrimination between faces 🙂
– Quick update –
Found this excellent cheatsheet for getting started with OpenCV (in C):