Skip to main content

FaceWarehouse: a 3D Facial Expression Database for Visual Computing

Reading NotesResearchFace ModelAbout 5 minAbout 1548 words

TL;DR

  • Building
    • Input — ( depth maps + color images ) \cp\cp expression \cp\cp identity
    • Morphable Model --> meshes
    • Active Shape Model ( ASM ) — Expression meshes --> individual-specific blendshape
    • Multilinear Model

Abstract

We present FaceWarehouse, a database of 3D facial expressions for visual computing applications. We use Kinect, an off-the-shelf RGBD camera, to capture 150 individuals aged 7–80 from various ethnic backgrounds. For each person, we captured the RGBD data of her different expressions, including the neutral expression and 19 other expressions such as mouth-opening, smile, kiss, etc. For every RGBD raw data record, a set of facial feature points on the color image such as eye corners, mouth contour and the nose tip are automatically localized, and manually adjusted if better accuracy is required. We then deform a template facial mesh to fit the depth data as closely as possible while matching the feature points on the color image to their corresponding points on the mesh. Starting from these fitted face meshes, we construct a set of individual-specific expression blendshapes for each person. These meshes with consistent topology are assembled as a rank-three tensor to build a bilinear face model with two attributes, identity and expression. Compared with previous 3D facial databases, for every person in our database, there is a much richer matching collection of expressions, enabling depiction of most human facial actions. We demonstrate the potential of FaceWarehouse for visual computing with four applications: facial image manipulation, face component transfer, real-time performance-based facial image animation, and facial animation retargeting from video to image.

3. FaceWarehouse

3.1. Data capture

  • 2D images + depth maps

3.2. Expression mesh and individual-specific blendshape generation

Active Shape Model ( ASM ) --> feature points on color image --> mim_i internal feature points + mcm_c contour feature points

Neutral expression

  • Blanz and Vetter’s morphable model + mesh deformation algorithm

V=F+i=1lαiFi V = \overline{F} + \sum_{i = 1}^l \alpha_i F_i

  • F\overline{F} — average face
  • FiF_i — the ii-th PCA vector

energy to be minimized for feature point matching:

Efea=j=1mi\norm\vbvij\vbcj2+k=1mc\norm\vbMproj\vbvck\vbsk2 E_{fea} = \sum_{j = 1}^{m_i} \norm{\vb{v}_{i_j} - \vb{c}_j}^2 + \sum_{k = 1}^{m_c} \norm{\vb{M}_{proj} \vb{v}_{c_k} - \vb{s}_k}^2

  • cjc_j — 3D position of the jj-th feature point
  • \vbvij\vb{v}_{i_j} — corresponding vertex on the mesh VV
  • \vbsk\vb{s}_k — 2D feature point on the color image
  • \vbvck\vb{v}_{c_k} — corresponding 3D feature vertex on the mesh VV
  • \vbMproj\vb{M}_{proj} — projection matrix of the camera

energy term for matching the depth map:

Epos=j=1nd\norm\vbvdj\vbpj2 E_{pos} = \sum_{j = 1}^{n_d} \norm{\vb{v}_{d_j} - \vb{p}_j}^2

  • \vbvdj\vb{v}_{d_j} — a mesh vertex
  • \vbpj\vb{p}_j — closest point to \vbvdj\vb{v}_{d_j} in the depth map
  • ndn_d — the number of the mesh vertices

Tikhonov regularization energy term:

based on the estimated probability distribution of a shape defined by αi\alpha_i

p(α)exp\bqty12(αi/σi)2 p\pqty{\alpha} \sim \exp\bqty{- \frac{1}{2} \sum \pqty{\alpha_i / \sigma_i}^2}

  • σi2\sigma_i^2 — eigenvalues of the face covariance matrix from PCA

Ecoef=12αTΛα E_{coef} = \frac{1}{2} \alpha^T \Lambda \alpha

  • Λ=diag(1/σ12,1/σ22,,1/σL2)\Lambda = diag\pqty{1 / \sigma_1^2, 1 / \sigma_2^2, \dots, 1 / \sigma_L^2}

total energy:

E1=ω1Efea+ω2Epos+ω3Ecoef E_1 = \omega_1 E_{fea} + \omega_2 E_{pos} + \omega_3 E_{coef}

Laplacian energy as the regularization term:

Elap=i=1n\normL\vbviδi\absL\vbviL\vbvi2 E_{lap} = \sum_{i = 1}^n \norm{L \vb{v}_i - \frac{\delta_i}{\abs{L \vb{v}_i}} L \vb{v}_i}^2

  • LL — discrete Laplacian operator
  • δi\delta_i — magnitude of the original Laplacian coordinate before deformation
  • nn — vertex number of the mesh

mesh deformation energy:

E2=ω1Efea+ω2Epos+ω3Elap E_2 = \omega_1' E_{fea} + \omega_2' E_{pos} + \omega_3' E_{lap}

Other expressions

  • deformation transfer algorithm: the deformation from face mesh S0S_0 to SiS_i mimics the deformation from the guide model G0G_0 to GiG_i
  • mesh deformation algorithm

Individual-specific expression blendshapes

example-based facial rigging algorithm

  • \vbB=\BqtyB0,B1,,B46\vb{B} = \Bqty{B_0, B_1, \dots, B_{46}} — a neutral face + 46 FACS blendshapes
  • H=B0+i=146αi(BiB0)H = B_0 + \sum_{i = 1}^{46} \alpha_i \pqty{B_i - B_0} — an expression
  • \vbA=\BqtyA0,A1,,A46\vb{A} = \Bqty{A_0, A_1, \dots, A_{46}} — generic blendshape model

minimize

  • difference between each expression mesh SjS_j and the linear combination of BiB_i with the known weights for expression jj
  • the difference between the relative deformation from B0B_0 to BiB_i and that from A0A_0 to AiA_i

3.3. Blinear Bilinear face model

assemble the dataset into a rank-three ( 3-mode ) data tensor TT ( 11K vertices ×\times 150 identities ×\times 47 expressions )

NN-mode SVD process:

T\cp2\vbUidT\cp3\vbUexpT=C T \cp_2 \vb{U}_{id}^T \cp_3 \vb{U}_{exp}^T = C

  • TT — data tensor
  • CC — core tensor
  • \vbUid,\vbUexp\vb{U}_{id}, \vb{U}_{exp} orthonormal transform matrices, which contain the left singular vectors of the 2nd mode ( identity ) space and 3rd mode ( expression ) space

approximate:

TCr\cp2\vbUˇid\cp3\vbUˇexp T \simeq C_r \cp_2 \check{\vb{U}}_{id} \cp_3 \check{\vb{U}}_{exp}

  • CrC_r — reduced core tensor produces by keeping the top-left corner of the original core tensor
  • \vbUˇid,\vbUˇexp\check{\vb{U}}_{id}, \check{\vb{U}}_{exp} — truncated matrices from \vbUid\vb{U}_{id} and \vbUexp\vb{U}_{exp} by removing the trailing columns

V=Cr\cp2\vbwidT\cp3\vbwexpT V = C_r \cp_2 \vb{w}_{id}^T \cp_3 \vb{w}_{exp}^T

  • CrC_r — bilinear face model for FaceWarehouse
  • \vbwid,\vbwexp\vb{w}_{id}, \vb{w}_{exp} — column vectors of identity

4. Applications

4.1. Facial image manipulation

  1. learn a linear regression model that maps a set of user-specified facial attributes to the identity attribute in the bilinear face model
  2. compute the identity and expression weights in bilinear face model for the input image
  3. reconstruct a new 3D face mesh based on these weights

Facial feature analysis

algorithm of multi-variate linear regression to map attributes to the identity attribute in bilinear face model

  • \vbwid\vb{w}_{id} — identity weights ( a kk-D vector )
  • \Bqtyf1,f2,,fl\Bqty{f_1, f_2, \dots, f_l}ll user-specified attributes
  • \vbMfea\vb{M}_{fea} — a k×(l+1)k \times \pqty{l + 1} matrix mapping user-specified attributes to the identity weights

Mfea\bqtyf1,,fl,1T=\vbwid M_{fea} \bqty{f_1, \dots, f_l, 1}^T = \vb{w}_{id}

assemble the vectors

  • \vbWid\vb{W}_{id} — ( k×150k \times 150 )
  • \vbF\vb{F} — ( (l+1)×150\pqty{l + 1} \times 150 )
  • \vbF+=\vbFT(\vbF\vbF)1\vb{F}^+ = \vb{F}^T \pqty{\vb{F} \vb{F}}^{-1} — left pseudoinverse of \vbF\vb{F}

\vbMfea=\vbWid\vbF+ \vb{M}_{fea} = \vb{W}_{id} \vb{F}^+

Fitting 3D face mesh into image

\vbpk=s\vbR\vbvk+\vbt \vb{p}_k = s \vb{R} \vb{v}_k + \vb{t}

  • \vbvk\vb{v}_k — scaling factor
  • \vbR\vb{R} — 3D rotation matrix
  • \vbt\vb{t} — translation vector
  • \vbvk\vb{v}_k — mesh vertex position
  • \vbpk\vb{p}_k — projected point position on the image

matching error:

Ek=12\norms\vbR(Cr\cp2\vbwidT\cp3\vbwexpT)(k)+\vbt\vbs(k)2 E_k = \frac{1}{2} \norm{s \vb{R} \pqty{C_r \cp_2 \vb{w}_{id}^T \cp_3 \vb{w}_{exp}^T}^{\pqty{k}} + \vb{t} - \vb{s}^{\pqty{k}}}^2

  • \vbs(k)\vb{s}^{\pqty{k}} — feature point positions on the image

4.2. Face component transfer

As the two input images represent the same person, their identity weights \vbwidT\vb{w}_{id}^T should be the same.

  • unified identity weights \vbwidT\vb{w}_{id}^T
  • expression weights ( \vbwexp1T\vb{w}_{exp-1}^T and \vbwexp2T\vb{w}_{exp-2}^T )

Ekjoint=12j=12\normsj\vbRj(Cr\cp2\vbwidT\cp3\vbwidT)(k)+\vbtj\vbsj(k)2 E_k^{joint} = \frac{1}{2} \sum_{j = 1}^2 \norm{s_j \vb{R}_j \pqty{C_r \cp_2 \vb{w}_{id}^T \cp_3 \vb{w}_{id}^T}^{\pqty{k}} + \vb{t}_j - \vb{s}_j^{\pqty{k}}}^2

  1. use the method described in the last section to compute an initial estimation of the identity and expression weights
  2. fix \vbwexp1\vb{w}_{exp-1} and \vbwexp2\vb{w}_{exp-2} and compute \vbwidT\vb{w}_{id}^T by minimizing EkjointE_k^{joint}
  3. \vbwexp1\vb{w}_{exp-1} and \vbwexp2\vb{w}_{exp-2} are solved separately with \vbwidT\vb{w}_{id}^T fixed
  4. step 2, 3 are performed iteratively until the fitting results converge
  • 2D expression flow: warp the target face to match the desired expression
  • 2D alignment flow: warp the reference face to an appropriate size and position for transferring
  • select a crop region from the warped reference image and blend it to the warped target image

4.3. Real-time performance-based facial image animation

construct the expression blendshapes for the person of ientity \vbwid\vb{w}_{id}

Bi=Cr\cp2\vbwid\cp3(\vbUˇexp\vbdi) B_i = C_r \cp_2 \vb{w}_{id} \cp_3 \pqty{\check{\vb{U}}_{exp} \vb{d}_i}

  • \vbUˇexp\check{\vb{U}}_{exp} — truncated transform matrix for the expression mode
  • \vbdi\vb{d}_i — the expression weight vector with value 1 for the ii-th element and 0 for other elements

generate new expressions

V=B0+i=146βi(BiB0) V = B_0 + \sum_{i = 1}^{46} \beta_i \pqty{B_i - B_0}

  • real-time performance-based facial animation system to capture the dynamic expressions of an arbitrary user
  • track the rigid transformation of the user’s head and the facial expressions expressed in the format of blendshapes coefficients βi\beta_i
  • hair, teeth

4.4 Facial animation retargeting from video to image

  • estimate face identity and expression of the image using the algorithm described in Section IV-A
  • fit a unified face identity for all frames using a simple extension of the joint fitting algorithm described in Section IV-B
  • construct expression blendshapes using the method described in Section IV-C