Degas: Detailed Expressions on Full-Body Gaussian Avatars
DEGAS introduces a method for creating full-body avatars with detailed facial expressions using 3D Gaussian Splatting, integrating body motion and expressions through a conditional variational autoencoder and 2D images.
Read original articleDEGAS (Detailed Expressions on Full-Body Gaussian Avatars) is a novel modeling method that utilizes 3D Gaussian Splatting (3DGS) to create full-body avatars capable of exhibiting rich facial expressions. Despite advancements in neural rendering for lifelike avatars, the integration of detailed expressions into full-body models has been underexplored. DEGAS addresses this gap by employing a conditional variational autoencoder trained on multiview videos, which captures both body motion and facial expressions. Unlike traditional methods that rely on 3D Morphable Models (3DMMs), DEGAS uses an expression latent space derived from 2D portrait images, effectively linking 2D talking faces with 3D avatars. The resulting avatars can be reenacted to produce photorealistic images with nuanced facial expressions. The method's effectiveness is demonstrated through experiments on existing datasets and a newly introduced DREAMS Avatar Dataset, which includes multi-view captures of six subjects performing various expressions and motions. Additionally, an audio-driven extension of the method is proposed, leveraging 2D talking faces to enhance interactivity in AI agents.
- DEGAS is the first method to model full-body avatars with detailed facial expressions using 3D Gaussian Splatting.
- The approach integrates body motion and facial expressions through a conditional variational autoencoder.
- It utilizes an expression latent space based on 2D images, bridging 2D and 3D avatar technologies.
- The DREAMS Avatar Dataset features multi-view captures of subjects performing standard and freestyle motions.
- An audio-driven extension of DEGAS opens new avenues for interactive AI applications.
Related
WildGaussians: 3D Gaussian Splatting in the Wild
A novel method, WildGaussians, enhances 3D scene reconstruction for in-the-wild data by combining DINO features and appearance modeling with 3D Gaussian Splatting. It outperforms NeRFs and 3DGS in handling dynamic scenes.
InstantSplat: Sparse-View SfM-Free Gaussian Splatting in Seconds
InstantSplat is a new framework for novel view synthesis from sparse images, reducing training time significantly and improving 3D scene reconstruction efficiency without relying on traditional Structure-from-Motion methods.
Real time face swap and one-click video deepfake
Deep-Live-Cam is a GitHub project for AI-generated media, emphasizing ethical use. It requires specific software for installation and features a GUI for face swapping and webcam functionality.
Gaussian Splatting Slam [CVPR 2024]
The "Gaussian Splatting SLAM" paper presents a real-time 3D reconstruction method using 3D Gaussians, achieving high-quality results for small and transparent objects with support from Dyson Technology Ltd.
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
MVSplat is a new model for 3D Gaussian splatting from sparse multi-view images, achieving state-of-the-art performance with faster inference, fewer parameters, and strong cross-dataset generalization capabilities.
- Many users express enthusiasm for the potential of creating realistic avatars for various applications, including video meetings and content creation.
- There are concerns about the ethical implications of deepfake technology and the ease of impersonation it may enable.
- Questions arise about the technical requirements for using the technology, such as data needs and real-time processing capabilities.
- Some commenters note the impressive realism achieved, while also pointing out existing weaknesses, particularly in cloth simulation.
- Overall, the discussion reflects a blend of fascination with the technology and caution regarding its misuse.
With that kind of technology, what are the key problem still to be solved before being massively applied to deepfakes ? More specifically:
- how much datas (pictures or video) of the "target" is needed to use this ? Does it requires a specific lighting, a lot of different poses... or is it possible to just use some "online" videos (found on tiktok for example) or to record the "target" in the street with a phone ? How is it to create a "virtual doppelganger" ?
- when there is a "target" model, is it possible to use this in realtime ? How much power would it need ? A small laptop ? A big machine in the cloud ? Only a state-sponsored infrastructure ?
It looks like this technology has a real potential to "impersonate" anybody really efficiently
Barring the obvious deepfake implications though, I'd be excited to see a new era of SFM-style content made with this
Also if you like Degas, this is another state of the art project in progress called VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality
Related
WildGaussians: 3D Gaussian Splatting in the Wild
A novel method, WildGaussians, enhances 3D scene reconstruction for in-the-wild data by combining DINO features and appearance modeling with 3D Gaussian Splatting. It outperforms NeRFs and 3DGS in handling dynamic scenes.
InstantSplat: Sparse-View SfM-Free Gaussian Splatting in Seconds
InstantSplat is a new framework for novel view synthesis from sparse images, reducing training time significantly and improving 3D scene reconstruction efficiency without relying on traditional Structure-from-Motion methods.
Real time face swap and one-click video deepfake
Deep-Live-Cam is a GitHub project for AI-generated media, emphasizing ethical use. It requires specific software for installation and features a GUI for face swapping and webcam functionality.
Gaussian Splatting Slam [CVPR 2024]
The "Gaussian Splatting SLAM" paper presents a real-time 3D reconstruction method using 3D Gaussians, achieving high-quality results for small and transparent objects with support from Dyson Technology Ltd.
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
MVSplat is a new model for 3D Gaussian splatting from sparse multi-view images, achieving state-of-the-art performance with faster inference, fewer parameters, and strong cross-dataset generalization capabilities.