Skip to main content

A Morphable Model For The Synthesis Of 3D Faces

Reading NotesResearchFace ModelAbout 4 minAbout 1316 words

[PDF] acm.orgopen in new window


  • focus on identity

  • Morphable Model — averages of shape Sˉ\bar{S} and texture Tˉ\bar{T} + eigenvectors sis_i and tit_i

  • FormulaSmod=Sˉ+i=1m1αisiS_{mod} = \bar{S} + \sum_{i = 1}^{m - 1} \alpha_i s_i

  • Building

    • Input — 3D Scans
    • Method — Optic Flow + PCA ( see Sec. 5 )
  • Matching / Register — minimize difference between images / 3D scans


  • new face images / new 3D face models <-- dense one-to-one correspondence to internal face model
  • derive morphable face model by transforming shape & texture into vector space representation
  • 3D face reconstructions from single images

1. Introduction

Human knowledge is critical for computer aided face modeling & separation of natural faces from non faces.

3. Morphable 3D Face Model

geometry of a face:

S=(X1,Y1,Z1,X2,,Yn,Zn)TR3n,X,Y,Z coordinates of its n verticesT=(R1,G1,B1,R2,,Gn,Bn)TR3n,R,G,B color values \begin{align*} S & = (X_1, Y_1, Z_1, X_2, \dots, Y_n, Z_n)^T \in \mathfrak{R}^{3 n} \qc & & \qq{$X, Y, Z$ coordinates of its $n$ vertices} \\ T & = (R_1, G_1, B_1, R_2, \dots, G_n, B_n)^T \in \mathfrak{R}^{3 n} \qc & & \qq{$R, G, B$ color values} \end{align*}

new shapes SmodS_{mod} & new textures TmodT_{mod}:

\vbSmod=i=1mai\vbSi,\vbTmod=i=1mbi\vbTi,i=1mai=i=1mbi=1 \vb{S}_{mod} = \sum_{i = 1}^m a_i \vb{S}_i \qc \vb{T}_{mod} = \sum_{i = 1}^m b_i \vb{T}_i \qc \sum_{i = 1}^m a_i = \sum_{i = 1}^m b_i = 1

morphable model: set of faces (Smod(a),Tmod(b))\pqty{S_{mod}\pqty{\vec{a}}, T_{mod}\pqty{\vec{b}}}, parameterized by coefficients a=(a1,a2,,am)T\vec{a} = (a_1, a_2, \dots, a_m)^T & b=(b1,b2,,bm)T\vec{b} = (b_1, b_2, \dots, b_m)^T

example set of faces --> probability distribution for coefficients aia_i & bib_i --> regulates likelihood of appearance of generated faces

data compression: PCA --> orthogonal coordinate system formed by eigenvectors sis_i & tit_i of covariance matrices

  • S,T\overline{S}, \overline{T} — averages of shape & texture
  • CS,CTC_S, C_T — covariance matrices computed over shape & texture differences ΔSi=SiS\Delta{S_i} = S_i - \overline{S}, ΔTi=TiT\Delta{T_i} = T_i - \overline{T}
  • σi2\sigma_i^2 — eigenvalues of shape covariance matrix CSC_S

Smod=S+i=1m1αisi,Tmod=T+i=1m1βiti,α,βRm1 S_{mod} = \overline{S} + \sum_{i = 1}^{m - 1} \alpha_i s_i \qc T_{mod} = \overline{T} + \sum_{i = 1}^{m - 1} \beta_i t_i \qc \vec{\alpha}, \vec{\beta} \in \mathfrak{R}^{m - 1}

propability for coefficientsp(α)exp\bqty12i=1m1(αi/σi)2 \qq{propability for coefficients} p\pqty{\vec{\alpha}} \sim \exp\bqty{- \frac{1}{2} \sum_{i = 1}^{m - 1} \pqty{\alpha_i / \sigma_i}^2}

Segmented morphable model

subdivide vector space of faces into independent subspaces

3.1. Facial attributes

manually assign labels μi\mu_i describing markedness of the attribute --> weighted sums:

ΔS=i=1mμi(SiS),ΔT=i=1mμi(TiT) \Delta{S} = \sum_{i = 1}^m \mu_i \pqty{S_i - \overline{S}} \qc \Delta{T} = \sum_{i = 1}^m \mu_i \pqty{T_i - \overline{T}}

Multiples of (ΔS,ΔT)\pqty{\Delta{S}, \Delta{T}} can now be added to or subtracted from any individual face. For binary attributes, such as gender, we assign constant values μA\mu_A for all mAm_A faces in class AA, and μBμA\mu_B \neq \mu_A for all mBm_B faces in BB. Affecting only the scaling of ΔS\Delta{S} and ΔT\Delta{T}, the choice of μA,μB\mu_A, \mu_B is arbitary.

4. Matching a morphable model to images

Model Parameters
  • αj\alpha_j — facial shape
  • βj\beta_j — texture
  • ρ\vec{\rho} — rendering = camera position + object scale + …

colored images:

\vbImod(x,y)=(Ir,mod(x,y),Ig,mod(x,y),Ib,mod(x,y))T \vb{I}_{mod}\pqty{x, y} = \pqty{I_{r, mod}\pqty{x, y}, I_{g, mod}\pqty{x, y}, I_{b, mod}\pqty{x, y}}^T

Euclidean distance:

EI=x,y\norm\vbIinput(x,y)\vbImod(x,y)2 E_I = \sum_{x, y} \norm{\vb{I}_{input}\pqty{x, y} - \vb{I}_{mod}\pqty{x, y}}^2

restrict by a tradeoff between matching quality and prior probabilities

For Gaussian noise with a standard deviation σN\sigma_N, the likelihood to observe IinputI_{input} is p(\vbIinputα,β,ρ)exp\bqty12σN2\vdotEIp\pqty{\vb{I}_{input} \vert \vec{\alpha}, \vec{\beta}, \vec{\rho}} \sim \exp\bqty{\frac{-1}{2 \sigma_N^2} \vdot E_I}

cost function:

E=1σN2EI+j=1m1αj2σS,j2+j=1m1βj2σT,j2+j(ρjρj)2σρ,j2 E = \frac{1}{\sigma_N^2} E_I + \sum_{j = 1}^{m - 1} \frac{\alpha_j^2}{\sigma_{S, j}^2} + \sum_{j = 1}^{m - 1} \frac{\beta_j^2}{\sigma_{T, j}^2} + \sum_j \frac{\pqty{\rho_j - \overline{\rho}_j}^2}{\sigma_{\rho, j}^2}

Phong illumination:

Ir,mod,k=(ir,amb+ir,dir\vdot(\vbnk\vbl))Rˉk+ir,dirs\vdot(\vbrk\vbvk)ν I_{r, mod, k} = \pqty{i_{r, amb} + i_{r, dir} \vdot \pqty{\vb{n}_k \vb{l}}} \bar{R}_k + i_{r, dir} s \vdot \pqty{\vb{r}_k \vb{v}_k}^{\nu}

  • kk — triangle
  • \vbl\vb{l} — direction of illumination
  • \vbvk\vb{v}_k — normalized difference of camera position and position of triangle’s center
  • \vbrk=2(\vbn\vbl)\vbn\vbl\vb{r}_k = 2 \pqty{\vb{n} \vb{l}} \vb{n} - \vb{l} — direction of reflected ray
  • ss — surface shininess
  • ν\nu — angular distribution of specular reflection

EIk=1ntak\vdot\norm\vbIinput(pˉx,k,pˉy,k)\vbImod,k2 E_I \approx \sum_{k = 1}^{n_t} a_k \vdot \norm{\vb{I}_{input}\pqty{\bar{p}_{x, k}, \bar{p}_{y, k}} - \vb{I}_{mod, k}}^2

  • aka_k — image area covered by triangle kk

EK=kK\norm\vbIinput(pˉx,k,pˉy,k)\vbImod,k2 E_{\mathcal{K}} = \sum_{k \in \mathcal{K}} \norm{\vb{I}_{input}\pqty{\bar{p}_{x, k}, \bar{p}_{y, k}} - \vb{I}_{mod, k}}^2

  • K\Bqty1,,nt\mathcal{K} \subset \Bqty{1, \dots, n_t} — random subset of 40 triangles kk

probability of selecting kp(kK)ak \qq{probability of selecting $k$} p\pqty{k \in \mathcal{K}} \sim a_k

  1. First set of iterations is performed with low resolution.
  2. In subsequent iterations, more and more principal components are added.
  3. Starting with a relatively large σN\sigma_N, we later reduce σN\sigma_N to obtain maximum matching quality.
  4. In the last iterations, the face model is broken down into segments. With parameters ρj\rho_j fixed, coefficients αj\alpha_j and βj\beta_j are optimized independently for each segment. This increased number of degrees of freedom significantly improves facial details.
Multiple Images
  • a separate set of ρj\rho_j for each input image
  • EIE_I is replaced by a sum of image distances for each pair of input and model images
Illumination-Corrected Texture Extraction

Subsequent to matching, we compare the prediction \vbImod,i\vb{I}_{mod, i} for each vertex ii with \vbIinput(px,i,py,i)\vb{I}_{input}\pqty{p_{x, i}, p_{y, i}}, and compute the change in texture (Ri,Gi,Bi)\pqty{R_i, G_i, B_i} that accounts for the difference.

4.1. Matching a morphable model to 3D scans

a scan can be represented as

\vbI(h,ϕ)=(R(h,ϕ),G(h,ϕ),B(h,ϕ),r(h,ϕ))T \vb{I}\pqty{h, \phi} = \pqty{R\pqty{h, \phi}, G\pqty{h, \phi}, B\pqty{h, \phi}, r\pqty{h, \phi}}^T

In a face (S,T)\pqty{S, T}, defined by shape and texture coefficients αj\alpha_j and βj\beta_j, vertex ii with texture values (Ri,Gi,Bi)\pqty{R_i, G_i, B_i} and cylindrical coordinates (ri,hi,ϕi)\pqty{r_i, h_i, \phi_i} is mapped to \vbImod(hi,ϕi)=(Ri,Gi,Bi,ri)T\vb{I}_{mod}\pqty{h_i, \phi_i} = \pqty{R_i, G_i, B_i, r_i}^T. The matching algorithm from the previous section now determines αi\alpha_i and βj\beta_j minimizing

E=h,ϕ\norm\vbIinput(h,ϕ)\vbImod(h,ϕ)2 E = \sum_{h, \phi} \norm{\vb{I}_{input}\pqty{h, \phi} - \vb{I}_{mod}\pqty{h, \phi}}^2

5. Building a morphable model

5.1. 3D correspondence using Optic Flow

  • The algorithm computes a flow field (δh(h,ϕ),δϕ(h,ϕ))\pqty{\delta{h}\pqty{h, \phi}, \delta{\phi}\pqty{h, \phi}} that minimizes differences of \norm\vbI1(h,ϕ)\vbI2(h+δh,ϕ+δϕ)\norm{\vb{I}_1\pqty{h, \phi} - \vb{I}_2\pqty{h + \delta{h}, \phi + \delta{\phi}}}.
  • Given a definition of shape and texture vectors SrefS_{ref} and TrefT_{ref} for the reference face, SS and TT for each face in the database can be obtained by means of the point-to-point correspondence provided by (δh(h,ϕ),δϕ(h,ϕ))\pqty{\delta{h}\pqty{h, \phi}, \delta{\phi}\pqty{h, \phi}}.

5.2. Bootstrapping the model

The basic recursive step: Suppose that an existing morphable model is not powerful enough to match a new face and thereby find correspondence with it. The idea is first to find rough correspondences to the novel face using the (inadequate) morphable model and then improve these correspondences by using an optic flow algorithm.