Face Transfer with Multilinear Models
TL;DR
- Multilinear Models
- Formula — See Sec. 4
- Building
- Face Transfer
- Input — Video Data
- Face Tracking ( See Sec. 5.1 )
- Initialization
Abstract
Face Transfer is a method for mapping video recorded performances of one individual to facial animations of another. It extracts visemes (speech-related mouth articulations), expressions, and three-dimensional (3D) pose from monocular video or film footage. These parameters are then used to generate and drive a detailed 3D textured face mesh for a target identity, which can be seamlessly rendered back into target footage. The underlying face model automatically adjusts for how the target performs facial expressions and visemes. The performance data can be easily edited to change the visemes, expressions, pose, or even the identity of the target—the attributes are separably controllable. This supports a wide variety of video rewrite and puppetry applications.
Face Transfer is based on a multilinear model of 3D face meshes that separably parameterizes the space of geometric variations due to different attributes (e.g., identity, expression, and viseme). Separability means that each of these attributes can be independently varied. A multilinear model can be estimated from a Cartesian product of examples (identities expressions visemes) with techniques from statistical analysis, but only after careful preprocessing of the geometric data set to secure one-to-one correspondence, to minimize cross-coupling artifacts, and to fill in any missing examples. Face Transfer offers new solutions to these problems and links the estimated model with a face-tracking algorithm to extract pose, expression, and viseme parameters.
1. Introduction
- multilinear model decouples the three attributes
- separability & consistency
3. Multilinear Algebra
Tensors
Mode Spaces
Mode- Product
- — tensor
- — matrix
indicates a linear transformation of vectors in 's mode-n space by the matrix . would replace each mode-2 vector with a transformed vector .
Tensor Decomposition
- — data tensor
- — core tensor
- — orthonormal matrix whose columns contain left singular vectors of the -th mode space, and can be computes via SVD
- — truncated versions of with last few columns removed
4. Multilinear Face Model
4.1. Face Data
Bilinear model
vertices expressions identities
Trilinear model
vertices visemes expressions identities
4.2. Correspondence
manually specified 42 reference points
- align the template and the scan
- deform the template mesh into the scan
- at first, weighing the marked correspondences heavily
- afterward emphasizing vertex proximity
4.3. Face Model
- — multilinear model of face geometry
- — a column vector of parameters ( weights ) for the attribute corresponding to mode
- — a column vector of vertices describing the resulting face
4.4. Missing Data
Description
- For each model we assemble an incomplete matrix whose columns are the corresponding mode vectors. We then seek a subspace decomposition that best reconstructs the known values of that matrix. ( PPCA )
- The linear constraints are combined through the missing elements, because they are shared across all groups of modal vectors and must be filled in with consistent values. To that end, we collect linear equations that determine a particular missing value in all the modes, and solve them together.
Evaluation
Probabilistic Interpretation
- — data and model tensors
- — the -th modal subspace
- — a Gaussian noise source
Each Gaussian is found by fixing and turning the tensor Equation into matrix form: . Here, columns of are the mode- vectors of , and the columns of are the mode- vectors of . The resulting likelihood becomes:
Taking logarithms and discarding constant factors, we seek to minimize the sum-squared error
Each term of the summation presents a matrix factorization problem with missing values, where and are treated as unknown factors of the incomplete matrix , and are solved for using PPCA.
5. Face Transfer
5.1. Face Tracking
optical flow + weak-perspective camera model
Using the symmetric Kanade-Lucas-Tomasi formulation, we express the frame-to-frame motion of a tracked point with a linear system:
- — describe the image-space motion of the point
- — the point’s true location
- — its current best guess ( location from the previous frame if no guess )
- matrix , vector — contain spatial and temporal intensity gradient information in the surrounding region
Using a weak-perspective imaging model, the point position can be expanded in terms of rigid head-motion parameters and non-rigid facial shape parameters, which are constrained by the multilinear model:
- — scale factor
- — 3D rotation matrix
- — image-space translation
- — the -th 3D vertex being tracked
Solving for the pose and all the multilinear weights from a pair of frames using Equation is not a well-constrained problem. To simplify the computation, we use a coordinate-descent method: we let only one of the face attributes vary at a time by fixing all the others to their current guesses. This transforms the multilinear problem into a linear one.
— the mode corresponding to non-fixed attribute
— a vector of weights for that attribute
— corresponding linear basis for the tracked vertex obtained from
If the currently tracked attribute varies from frame to frame ( e.g. expression ), we solve the set of linear systems and proceed to the next pair of neighboring frames.
If the attribute is constant across all frames ( e.g. identity ), we accumulate the mentioned linear systems from each pair of frames and solve them together as one combined system.
5.2. Initialization
- specify a small number of feature points which are then used to position the face geometry
- correspondences: user-provided or automatically detected
8. Conclusion
- estimate a highly detailed face model from an incomplete set of face scans
- multilinear model
- separability: different attributes, such as identity and expression, can be manipulated independently