CIS 581: Computer Vision

Face Replacement

Aim:

Given an image, find a face if any and replace it with a face of your choice.

Overview of the Steps:

(1) Find a face: Could have used Viola Jones face detector, but implemented my own.

(2) Compute the locations of eyes, nose and mouth: Used the viola jones model.

(3) Compute the relative scale and 2D rotation of the detected face.

(4) Scale and rotate the eyes, nose and mouth of the face which will be used for replacement.

(5) Blend the transformed eyes, nose and mouth onto the target face using Poission-Siedel blending.

Detailed description of the approach:

This part was based on the face localization I had written a few years ago. The paper can be found here. The main idea in this paper is to find the features which repesent a generic human face, so that this template can be later used to find faces in any image. This paper uses a binary approach to solve this problem wherein it says if a pixel is important or not for detecting a face. So at the end we will be left with a binary map of which pixels are needed to detect a face. This is an optimization problem which needs to find the binary mapping which minimizes the l2 distance between many human faces (in this case 400 were used for training). Another key point to note is that we did not use any negative training which might have affected the performance. We rely on a binary particle swarm optimizer (BPSO) to solve this problem for us. To make training fast, BPSO is run recursively on smaller chuncks of data, in this case subject-wise or every 10 frames. At the end of training we get a binary map called gbest which represents the features needed to find a face.

Now, to find a face we use the average face from the training images. A simple correlation score shown below

One key thing to note is we are using only the pixels selected by gbest here. A hidden point is though this can be done on raw pixel values, we used DCT and extarcted the top fea features to find the template as it gave us better results.

The template and matching scores are shown below

This works very well if the training face scale matches with the testing image scale. To make the algorithm more generic, we ran the algorithm over multiple scales (resizing gbest with interpolation and thresholding). The algorithm was fairly robust to rotation till about 20 degrees because the training set had faces with minor rotation (training set was cropped manually from this dataset). We should have performed non-maximal suppression on multiple scales, however we ended up just combining all the outputs using binary masks. Some sample face detection outputs are shown below

The extracted eyes, nose and mouth to be replaced is shown below

Outputs:

Panorama Stitching

Aim:

Given, N number of images with atleast 30% overlap sticth the images into a panorama.

Overview of the steps:

(1) Perform Adaptive Non Maximal Suppression (ANMS) to find evenly spread feature points across the image.

(2) Find feature descriptors in each of these points and match the points across 2 images at a time.

(3) Perform RANSAC based outlier rejection on these matches.

(4) Estimate homography and warp images.

(5) Blend the images together for a good looking result.

Detailed Description of the approach:

A standard approach for performing ANMS is used and evenly spaced features points (harris corners in this case) across the image are found. The input image and corners superimposed on the image are shown below:

In the next step, we find feature descriptors around each feature point. The feature descriptor here is just a sub-sampled version of a 40X40 gaussian smoothed patch around the feature point. The feature descriptor is of size 64X1. Ratio of SSE of best match and second best match is used to find feature correspondences. However, at this step we have some wrong matches. To eliminate we use RANSAC. Matching before RANSAC is shown below:

In RANSAC, we pick 4 corresponding points at random, estimate the homography. The inliers are computed by applying this homography and reprojecting the points. After some RANSAC iterations we pick the points which had the most number of inliers. Matching output after RANSAC is shown below:

The estimate homography between 2 images is applied to transform one of the images. This is shown below:

Then the images are blended together. This is shown below:

This step is performed for every image which comes in a stream, i.e., stitching the new image with the output from previous 2 images. The ouput for 3 image panorama is shown below:

Outputs for other images are shown below:

Image Morphing

Aim:

Given images (faces in this case for fun) and corresponding control points to be morphed between them, morph one image onto other and make a video with different blending fraction to get a smooth transition of one image to other "like the movies". Use Delanuay Triangulation and TPS to achieve the desired result.

Overview of the steps for Morphing using Triangulation:

(1) Manually choose control points between 2 images.

(2) Estimate the intermediate shape using average of control points.

(3) Use barycentric co-ordinates and perform inverse mapping for all the pixels.

(4) Dissolve the 2 images after warping for a good looking result.

Overview of the steps for Morphing using Thin Plate Splines:

(1) Manually choose control points between 2 images.

(2) Estimate the intermediate shape using average of control points.

(3) Use TPS equations to compute inverse mapping for all the pixels.

(4) Dissolve the 2 images after warping for a good looking result.

The control points for the 2 images are shown in the first figure, then the next figure shows the delanuay triangulation. The third figure shows the morphed result for equal weights for both the images using delanuay triangulation method. The last image shows the morphed result for equal weights for both the images using TPS method.