Self-Occluded Avatar Recovery from a Single Video in the Wild
The Self-Occluded Avatar Recovery (SOAR) framework reconstructs human avatars from occluded videos, outperforming existing methods in accuracy, realism, and detail, while enabling novel view rendering and animation.
Read original articleThe paper introduces a novel framework called Self-Occluded Avatar Recovery (SOAR) aimed at reconstructing human avatars from videos where parts of the body are occluded. Traditional monocular human reconstruction systems struggle with this issue, as they typically require full visibility of the body. SOAR addresses this challenge by utilizing a structural normal prior and a generative diffusion prior. The structural normal prior employs a reposable surfel model to represent human shapes, while the generative diffusion prior refines initial reconstructions through score distillation. The results demonstrate that SOAR outperforms existing state-of-the-art methods in various benchmarks, providing more realistic and detailed reconstructions. The framework also showcases capabilities in novel view rendering and animation of avatars in different poses, with comparisons to other methods like GART and GaussianAvatar highlighting its superior texture and structure quality.
- SOAR effectively reconstructs human avatars from partially occluded video observations.
- The framework combines structural normal and generative diffusion priors for improved accuracy.
- SOAR outperforms existing methods in benchmarks and produces high-quality visual results.
- The method allows for novel view rendering and animation of reconstructed avatars.
- Comparisons with other techniques show SOAR's advantages in realism and detail.
Related
Shape of Motion: 4D Reconstruction from a Single Video
Shape of Motion reconstructs 4D scenes from monocular videos by modeling 3D motion. It uses SE(3) motion bases and data-driven priors for global consistency. The method excels in 3D/2D motion estimation and view synthesis, as shown in experiments. Comparisons with other methods are included.
Gaussian Splatting Slam [CVPR 2024]
The "Gaussian Splatting SLAM" paper presents a real-time 3D reconstruction method using 3D Gaussians, achieving high-quality results for small and transparent objects with support from Dyson Technology Ltd.
Degas: Detailed Expressions on Full-Body Gaussian Avatars
DEGAS introduces a method for creating full-body avatars with detailed facial expressions using 3D Gaussian Splatting, integrating body motion and expressions through a conditional variational autoencoder and 2D images.
Sapiens: Foundation for Human Vision Models
The "Sapiens" models enhance human-centric vision tasks through self-supervised pretraining on 300 million images, showing strong generalization and scalability, outperforming benchmarks in several datasets.
A New System for Temporally Consistent Stable Diffusion Video Characters
Alibaba Group's MIMO system improves full-body avatar generation with Stable Diffusion, addressing temporal stability issues and utilizing three encodings for character, scene, and occlusion, demonstrating flexibility in video synthesis.
Related
Shape of Motion: 4D Reconstruction from a Single Video
Shape of Motion reconstructs 4D scenes from monocular videos by modeling 3D motion. It uses SE(3) motion bases and data-driven priors for global consistency. The method excels in 3D/2D motion estimation and view synthesis, as shown in experiments. Comparisons with other methods are included.
Gaussian Splatting Slam [CVPR 2024]
The "Gaussian Splatting SLAM" paper presents a real-time 3D reconstruction method using 3D Gaussians, achieving high-quality results for small and transparent objects with support from Dyson Technology Ltd.
Degas: Detailed Expressions on Full-Body Gaussian Avatars
DEGAS introduces a method for creating full-body avatars with detailed facial expressions using 3D Gaussian Splatting, integrating body motion and expressions through a conditional variational autoencoder and 2D images.
Sapiens: Foundation for Human Vision Models
The "Sapiens" models enhance human-centric vision tasks through self-supervised pretraining on 300 million images, showing strong generalization and scalability, outperforming benchmarks in several datasets.
A New System for Temporally Consistent Stable Diffusion Video Characters
Alibaba Group's MIMO system improves full-body avatar generation with Stable Diffusion, addressing temporal stability issues and utilizing three encodings for character, scene, and occlusion, demonstrating flexibility in video synthesis.