3D Face Reconstruction From 2D Images - A Survey
3D Face Reconstruction From 2D Images - A Survey
366
Rasiwasia [13] took only two images into consideration eyes, tip of the nose and the center and end points of
- a frontal and a profile view. Since limitations in the in- the mouth would prove enough.
put image’s viewpoint cause inflexibility, researchers have
• Depth estimation
currently focused on reconstructing faces from single 2D
image where the image has no limitation in pose or expres-
For an accurate and realistic reconstruction, both
sion and it can be taken in an arbitrary viewpoint. Guan’s
location and depth of the facial features of the recon-
[9] approach provides useful groundwork in that region.
structed face should be equivalent to the real face.
Constructing the depth map of the input image will
3 Steps in a regular 3D face reconstruction assist in depth estimation.
approach
• 3D face reconstruction
After considering all these approaches, a set of general
After face components’ locations and depth are
steps can be derived which will included in a regular 3D
identified the 3D face can be reconstructed. A default
face reconstruction algorithm. The following is a list of the
3D model can be deformed according to the real
identified steps.
features to obtain the final 3D face. The texture should
• Repairing the damaged areas (caused by noise, occlu- be mapped onto the 3D face. This is an intricate
sion or shadows) process since the texture information gained from
2D space has to be mapped onto a 3D space. Some
The input image’s condition might not always be approaches project the frontal image directly onto
satisfactory; they may be damaged or corrupted. the 3D face but if the approach takes multiple input
Noise pixels of the image, if exist, might lead to images these images can be warped into the texture
inaccurate reconstructions. Shadows, poor lighting space to generate a more realistic effect. The above
conditions and occlusions prevent accurate fea- mentioned Microsoft’s approach [11] projects the
ture extraction of the face. Due to these reasons frontal image directly onto the 3D face while Birkbeck
these damaged areas need to be eliminated prior to et al. [3] warps the input images to the texture space.
reconstruction.
4 Difficulties in 3D face reconstruction from
• Face localization 2D images
Few approaches like Rasiwasia’s method [13] in- The uncertainty which lies in facial component detection
volve predefined restrictions in the input images. can be eliminated by using multiple images but it might not
Although these restrictions introduce inflexibility, always be possible to attain that many images. Even if mul-
they reduce the complexity and preclude other face tiple images are available, factors like noise, occlusion and
localization difficulties. shadows and/or lack of features in images might prevent the
Since input images in non-restricted approaches may system from using them. To make the matters worse, multi-
contain other background elements apart from the hu- ple images might make the problem of time and effort even
man face, the face region should be identified and more obvious. The time issue is mainly caused by the pre-
cropped. The distinctive color of the human skin can processing phase required.
be used as a guide in identifying the face region. This As a result most researchers’ attention has narrowed
process is labeled as face localization. down to single image based 3D face reconstructions. One
In approaches where multiple images are being taken image of a face does not provide sufficient information for
as input, each input image has to be cut and resized to a 3D reconstruction, even if it’s a frontal image. If the im-
obtain face regions. In addition, all these obtained im- plementation has limitations in viewpoint, the input image
age parts should be precisely aligned with each other. may not even contain all the facial components.
Human face belongs to a particular class of similar ob-
• Facial component detection jects. This class can be used in making inferences about the
human face to assist in generating other views of the face
After the face region is isolated, the components in the aforementioned circumstance. A database which is
of the face can be easily identified. Image-based maintained within the implementation can facilitate in mak-
techniques, silhouettes and feature points can be used ing these inferences.
to detect these facial components. In identifying these In maintaining a database the main dilemma lies in de-
facial components, recognizing the two corners of the ciding the size of it. Unless the input 2D image’s viewing
367
conditions are known in advance, images of each face taken replaced with more suitable 3D objects with better viewing
under different lighting and viewing conditions have to be conditions. As a result only a small relevant subset of the
stored but large storage requirements, increased probability database is accessible to a user at any given time.
of false matching and slower reconstructions makes this op- In performing the depth estimation of the face, parts of
tion rather impractical. Basri and Hassner [2] presented a the image are compared with the image parts in the database
novel solution which answers this problem. to match the intensity patterns (figure 2). The found inten-
‘Feature points’ is a well-liked method for facial com- sity patterns are taken as the initial guess for the face’s depth
ponent detection but using countless feature points in the and later a global optimization scheme is applied for depth
application can lead to inefficiencies in the computational refinement. When using a Pentium 4, 2.8GHz computer
time taken. Therefore approaches that involve a smaller with 2GB RAM for a 200 x 150 pixel image via 12 example
number of feature points have gained recognition. Blanz images at a given time, the running time of this application
et al. [4] approach is an example for such an approach. In is around 40 long minutes.
recovering 3D facial information from multiple images the The ability to handle a large database and being appli-
relationship between feature points in different viewpoints cable to a variety of objects irrelevant of their viewing and
should be maintained. lighting conditions makes this a successful approach.
5 Recent work
368
tex, color and camera are the three parameters of the pro- The distinctive color of the human skin is used in
jected polygon representation NN. identifying the face region within the image. The (R, G, B)
The Tsai-Shah SFS algorithm processes both input im- in the images is classified as skin if it satisfies the following
ages and NN output images in order to reconstruct the 3D conditions.
face based on the depth maps. These depth maps are con-
sidered as partial 3D shapes rather than images. R > 95 and G > 40 and B > 20 and
In Samaras et al. [14] approach, 3D shape is extracted max{R, G, B} - min{R, G, B} > 15 and
from Multi posed face images taken under arbitrary lighting |R - G| > 15 and
and the reconstruction process uses silhouette images. The R - G > 20 and R - B > 20
accuracy of this reconstruction process lies on the number
and location of cameras used to capture the input images.
A 3D face model is used as prior knowledge to assist in the
reconstruction process.
The 3D face model is constructed from a set of 3D faces
attained from 3D scanning technologies. The shape and
pose parameters are estimated by minimizing the difference
between the face model and input images. Later the illu-
mination and spherical harmonic basis parameters are ex-
tracted from the recovered 3D shape. Figure 4. Skin Detection [13]
369
The center of the mouth is identified by drawing a vertical
histogram in this localized mouth region.
Figure 6. Rectangular Region and the Hori- model which is 2D view-dependent but has no reference
zontal Histogram for Mouth [13] to 3D structures. They have used a Kernel PCA (Principal
Components Analysis) based on Support Vector Machines
for nonlinear shape model transformation.
Though all the 35 features can be automatically identi- This method has found remedies for two main drawbacks
fied, at the end of the extraction process, this method of- which occurred because of the large pose variations of hu-
fers the capability for any user modifications if required. man face. Nonlinear shape transformations across views us-
These feature points that are found are then used to deform ing Kernel PCA based on support vector machines is used
the generic model. This deformation is done in two steps to address the first problem which is highly nonlinear shape
- Globally and Locally. Finally the texturing of the face is variations across views. The second drawback of unreli-
performed using the frontal image in a manner that actual able relationships among feature points across views (based
features in the reconstructed face overlap with the features solely on local gray-levels) was addressed by improving a
in the frontal image. nonlinear 2D active shape model with pose constraint.
The following image (figure 7) presents some recon-
structed faces of this approach.
Recently an automatic reconstruction based on a 3D Darrell et al. [5] present a method based on cubical
generic face and a single image (irrelevant of pose and ex- ray projection. This algorithm uses a novel data struc-
pression) was presented by Guan [9]. The only condition ture named ‘linked voxel space’. A voxel space is used
required in the image of the face was for the head rotation to to maintain an intermediate representation of the final 3D
be in the interval +30 degrees to -30 degrees. This method model. Since connectivity of the meshes cannot be repre-
is said to reconstruct 3D faces with standard and low cost sented and converting a volumetric model to a mesh is dif-
equipments. The features extracted from the images serve ficult, a linked voxel space is used instead of a voxel space.
as geometric information which helps in deforming the 3D
generic face. The feature points are detected by using Eu- First the 3D views obtained from stereo cameras are
clidean angles. It is assumed that the head is not rotated registered based on a gradient-based registration algorithm.
with respect to the X axis. The result of this registration is a 3D mesh where each ver-
The texturing of the face (figure 8) is performed by tex corresponds to a valid image pixel. The location of each
orthogonally projecting the 2D images onto the 3D face. vertex in the mesh is calculated and mapped in to a voxel.
When the 2D image is orthogonally projected to form the This voxel space is reduced using a cubic ray projection
texture, some vertices contain no corresponding color since merging algorithm. This reduction is done by merging the
they are occluded. Those vertices generate blank areas in voxels which fall on the same projection ray.
the texture. As a result a thin-plate relaxation method is Since this method uses stereo cameras to get the synchro-
used in interpolating those blank areas with known colors. nized range and intensity 3D views texture alignment might
Gong et al. [8] put forth a multi-view nonlinear shape not be a necessity.
370
be extended to reconstruct a face with realistic hair and ears.
When an arbitrary image is given the system should be able
to draw out necessary inferences to obtain other views of
the face.
The topic 3D face reconstruction from 2D images has
retained its significance in the computer world and with
the recent development; applications like human expres-
sion analysis and video conferencing have been added to the
long list of its applications. Virtual hair and beauty salons
is one future application where 3D reconstructed faces will
Figure 10. Final 3D mesh viewed from differ- prove to be valuable. Having the opportunity of viewing the
ent directions [5] aftermath of a haircut or a facial before even getting it and
sometimes even viewing the face of a long-gone person is
without doubt a priceless reward. The 3D face reconstruc-
tion can be extended to produce aging software which have
6 Conclusion the capability to produce younger or the older face of the
input image.
The 2D image of a face is very sensitive to changes in
head pose and expressions so a successful reconstruction References
approach should be able to extract these face details in spite
of these changes. Approaches based on silhouettes and prior [1] S. Amin and D. Gillies. Analysis of 3d face reconstruction.
knowledge can be advantageous in addressing this prob- In Proceedings of the 14th IEEE International Conference
lem. When reconstructing 3D faces from 2D images the on Image Analysis and Processing, 2007.
[2] R. Basri and T. Hassner. Example based 3d reconstruction
key source of information is the intensity based features and from single 2d images.
landmarks of the image. But intensity alone is not enough [3] N. Birkbeck, D. Cobzas, M. Jagersand, A. Rachmielowski,
in case of low intensity, noise, occlusion, illumination vari- and K. Yerex. Quick and easy capture of 3d object models
ations and/or shadows being present in the input images. from 2d images.
The anatomical landmarks are argued to be a more accurate [4] V. Blanz, B. Hwang, S. Lee, and T. Vetter. Face reconstruc-
source of information, but they are rather thin and difficult tion from a small number of feature points.
[5] T. Darrell, L. Morency, and A. Rahimi. Fast 3d model ac-
to locate.
quisition from stereo images.
Most traditional face reconstructions require a special [6] E. Elyan and H. Ugail. Reconstruction of 3d human facial
setup, expensive hardware, predefined conditions and/or images using partial differential equations. Journal of Com-
manual labor which make them impractical for use in puters, 2(8), 2007.
general applications. Though recent approaches have tri- [7] M. Fanany, I. Kumazawa, and M. Ohno. Face reconstruction
umphed over some of these setbacks, quality and speed are from shading using smooth projected polygon representa-
not still up to the expected levels. More realistic 3D char- tion nn.
[8] S. Gong, A. Psarrou, and S. Romdhani. A multi-view non-
acter modeling software could be used in reconstructing the
linear active shape model using kernel pca. BMVC99, pages
final 3D face or the default 3D model can be created from 483–492.
such software. [9] Y. Guan. Automatic 3d face reconstruction based on single
Strategies like supervised learning and unsupervised 2d image. In Proceedings of the IEEE International Confer-
learning in neural networks can be applied in facial com- ence on Multimedia and Ubiquitous Engineering, 2007.
ponent identification. Fuzzy systems can be used in feature [10] F. Han and S. Zhu. Bayesian reconstruction of 3d shapes
extraction processes for a more fruitful result. and scenes from a single image.
[11] Y. Hu, D. Jiang, S. Yan, H. Zhang, and L. Zhang. Automatic
Prior knowledge of a face in different viewing and light- 3d reconstruction for face recognition. Journal of Pattern
ing conditions can be stored in the database with efficient Recognition.
update schemes which would eliminate the uncertainty in- [12] J. Lee, R. Machiraju, B. Moghaddam, and H. Pfister.
volved in reconstruction from a single arbitrary image. The Silhouette-based 3d face shape recovery. Graphics Inter-
recent successful approaches should be continued and re- face, 2003.
fined to adhere to the changing requirements of the modern [13] N. Rasiwasia. The avatar: 3-d face reconstruction from two
society. Limitations like not having a beard, not wearing orthogonal pictures with application to facial makeover.
[14] D. Samaras, S. Wang, and L. Zhang. Face reconstruction
earrings and glasses should also be eliminated.
across different poses and arbitrary illumination conditions.
Most present reconstructions are limited to reconstruct a AVBPA, LNCS, pages 91–101, 2005.
face with just the front area. These reconstructions should
371