MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
MVSplat is a new model for 3D Gaussian splatting from sparse multi-view images, achieving state-of-the-art performance with faster inference, fewer parameters, and strong cross-dataset generalization capabilities.
Read original articleMVSplat is a novel model designed for efficient 3D Gaussian splatting from sparse multi-view images, introduced by a team of researchers from various institutions. The model constructs a cost volume representation through plane sweeping, which aids in accurately localizing Gaussian centers and estimating depth using cross-view feature similarities. MVSplat operates with a multi-view Transformer to extract image features and employs a 2D U-Net for refining cost volumes and predicting depth maps. The model demonstrates significant advancements over existing methods, achieving state-of-the-art performance on benchmarks like RealEstate10K and ACID, with a feed-forward inference speed of 22 frames per second. Notably, MVSplat utilizes 10 times fewer parameters than the previous leading model, pixelSplat, while providing over twice the inference speed and improved quality in both appearance and geometry. Additionally, MVSplat excels in cross-dataset generalization, effectively adapting to novel scenes due to its robust cost volume representation. The research highlights the importance of the cost volume in learning feed-forward Gaussians and showcases the model's capabilities through qualitative comparisons with other state-of-the-art techniques.
- MVSplat efficiently predicts 3D Gaussians from sparse multi-view images.
- It achieves state-of-the-art performance with faster inference and fewer parameters than pixelSplat.
- The model excels in cross-dataset generalization, adapting well to various scene types.
- Cost volume representation is crucial for accurate depth estimation and Gaussian localization.
- MVSplat demonstrates significant improvements in geometry quality compared to existing models.
Related
Mip-Splatting: Alias-Free 3D Gaussian Splatting
The paper introduces Mip-Splatting, enhancing 3D Gaussian Splatting by addressing artifacts with a 3D smoothing filter and a 2D Mip filter, achieving alias-free renderings and improved image fidelity in 3D rendering applications.
WildGaussians: 3D Gaussian Splatting in the Wild
A novel method, WildGaussians, enhances 3D scene reconstruction for in-the-wild data by combining DINO features and appearance modeling with 3D Gaussian Splatting. It outperforms NeRFs and 3DGS in handling dynamic scenes.
New Gaussian Splatting viewer that allows code modification during runtime
The GitHub project "splatviz" offers an interactive viewer for 3D Gaussian Splatting scenes. It uses Python GUI library (imgui) for real-time editing and visualization, supporting scene saving and video creation. Users can clone the repository for exploration.
InstantSplat: Sparse-View SfM-Free Gaussian Splatting in Seconds
InstantSplat is a new framework for novel view synthesis from sparse images, reducing training time significantly and improving 3D scene reconstruction efficiency without relying on traditional Structure-from-Motion methods.
Gaussian Splatting Slam [CVPR 2024]
The "Gaussian Splatting SLAM" paper presents a real-time 3D reconstruction method using 3D Gaussians, achieving high-quality results for small and transparent objects with support from Dyson Technology Ltd.
It does feel like we're getting closer and closer to being able to synthesize novel views in realtime from a small set of images at a framerate and quality high enough for use in AR, which is an interesting concept. I'd love to be able to 'walk around' in my photo library.
As far as I can tell, I will need to offer a bunch of photographs with camera pose data added — okay, fair enough, the splat architecture exists to generate splats.
Now, what’s the best way to get camera pose data from arbitrary outdoor photos? … Cue a long wrangle through multiple papers. Maybe, as of today… FAR? (https://crockwell.github.io/far/). That claims up to 80% pose accuracy depending on source data.
I have no idea how MVSplat will deal with 80% accurate camera pose data… And I also don’t understand if I should use a pre-trained model from them or train my own or fine tune one of their models on my photos… This is sounding like a long project.
I don’t say this to complain, only to note where the edges are right now, and think about the commercialization gap. There are iPhone apps that will get (shitty) splats together for you right now, and there are higher end commercial projects like Skydio that will work with a drone to fill in a three dimensional representation of an object (or maybe some land, not sure about the outdoor support), but those are like multiple thousand-dollar per month subscriptions + hardware as far as I can tell.
Anyway, interesting. I expect that over the next few years we’ll have push button stacks based on ‘good enough’ open models, and those will iterate and go through cycles of being upsold / improved / etc. We are still a ways away from a trawl through an iPhone/gphoto library and a “hey, I made some environments for you!” Type of feature. But not infinitely far away.
Every gaussian splat repo I have looked at doesn't mention how to use the pre-trained models to "simply" take MY images as input and output a GS. They all talk about evaluation, but the CMD interface requires the eval datasets as input.
Is training/fine-tuning on my data the only way to get the output?
Related
Mip-Splatting: Alias-Free 3D Gaussian Splatting
The paper introduces Mip-Splatting, enhancing 3D Gaussian Splatting by addressing artifacts with a 3D smoothing filter and a 2D Mip filter, achieving alias-free renderings and improved image fidelity in 3D rendering applications.
WildGaussians: 3D Gaussian Splatting in the Wild
A novel method, WildGaussians, enhances 3D scene reconstruction for in-the-wild data by combining DINO features and appearance modeling with 3D Gaussian Splatting. It outperforms NeRFs and 3DGS in handling dynamic scenes.
New Gaussian Splatting viewer that allows code modification during runtime
The GitHub project "splatviz" offers an interactive viewer for 3D Gaussian Splatting scenes. It uses Python GUI library (imgui) for real-time editing and visualization, supporting scene saving and video creation. Users can clone the repository for exploration.
InstantSplat: Sparse-View SfM-Free Gaussian Splatting in Seconds
InstantSplat is a new framework for novel view synthesis from sparse images, reducing training time significantly and improving 3D scene reconstruction efficiency without relying on traditional Structure-from-Motion methods.
Gaussian Splatting Slam [CVPR 2024]
The "Gaussian Splatting SLAM" paper presents a real-time 3D reconstruction method using 3D Gaussians, achieving high-quality results for small and transparent objects with support from Dyson Technology Ltd.