July 20th, 2024

Shape of Motion: 4D Reconstruction from a Single Video

Shape of Motion reconstructs 4D scenes from monocular videos by modeling 3D motion. It uses SE(3) motion bases and data-driven priors for global consistency. The method excels in 3D/2D motion estimation and view synthesis, as shown in experiments. Comparisons with other methods are included.

Read original articleLink Icon
Shape of Motion: 4D Reconstruction from a Single Video

The article discusses a method called Shape of Motion that reconstructs a 4D scene from a single monocular video. The approach aims to address the challenges of monocular dynamic reconstruction by explicitly modeling 3D motion in dynamic scenes captured in monocular videos. The method utilizes a compact set of SE(3) motion bases to represent scene motion and incorporates data-driven priors like monocular depth maps and 2D tracks to create a globally consistent representation of the dynamic scene. Experimental results demonstrate that the proposed method achieves state-of-the-art performance in long-range 3D/2D motion estimation and novel view synthesis for dynamic scenes. The article also includes comparisons with other methods in 3D tracking, novel view synthesis, and 2D tracking, highlighting the strengths and limitations of the Shape of Motion approach. Additionally, the article acknowledges contributors and funding sources for the project.

Related

TERI, almost IRL Blade Runner movie image enhancement tool

TERI, almost IRL Blade Runner movie image enhancement tool

Researchers at the University of South Florida introduce TERI, an image processing algorithm inspired by Blade Runner, reconstructing hidden objects in photos using shadows. Potential applications in self-driving vehicles and robotics.

Wave-momentum shaping for moving objects in heterogeneous and dynamic media

Wave-momentum shaping for moving objects in heterogeneous and dynamic media

A new method, wave-momentum shaping, uses sound waves to manipulate objects in dynamic environments without prior knowledge. By adjusting wavefronts iteratively based on real-time measurements, objects can be effectively moved and rotated. This innovative approach shows promise for diverse applications.

MASt3R – Matching and Stereo 3D Reconstruction

MASt3R – Matching and Stereo 3D Reconstruction

MASt3R, a model within the DUSt3R framework, excels in 3D reconstruction and feature mapping for image collections. It enhances depth perception, reduces errors, and revolutionizes spatial awareness across industries.

Depth Anything V2

Depth Anything V2

Depth Anything V2 is a monocular depth estimation model trained on synthetic and real images, offering improved details, robustness, and speed compared to previous models. It focuses on enhancing predictions using synthetic images and large-scale pseudo-labeled real images.

New framework allows robots to learn via online human demonstration videos

New framework allows robots to learn via online human demonstration videos

Researchers develop a framework for robots to learn manipulation skills from online human demonstration videos. The method includes Real2Sim, Learn@Sim, and Sim2Real components, successfully training robots in tasks like tying knots.

Link Icon 11 comments
By @Geee - 7 months
For 3D VR videos, this would be useful for adjusting IPD for every person, rather than use the static IPD of the camera setup. Also, allowing just a little bit of head movement would really increase the immersiveness. I don't need to travel long distances inside the video. If the video is already filmed with static stereo setup, it would be even easier to reconstruct an accurate 4D video limited to short travel distances without glaring errors.
By @yieldcrv - 7 months
One thing I liked about Team Ico (a studio behind the Shadow of the Colossus, Ico, Last Guardian video games) was how the player can move the camera just a little but during automated sequences

Getting that kind of look around in a video scene would be really engaging. A bit different than VR or watching in The Sphere, with the engagement being that there are still things right out of view you have to pan the camera for

By @InDubioProRubio - 7 months
Our children will be so weird out by blade runner. Not by the zoom into the picture, but by the fact that the guy believes in halucinated data.
By @PaulHoule - 7 months
Whenever I play a video game (Monster Hunter World comes to me immediately) and see an establishing shot with moving camera (like the ones demoed on their web site) I think the game really wants to run in an a VR headset where you can walk around and see different angles.

(Funny there is a VR mod for Monster Hunter Rise which makes me think just how fun Monster Hunter VR would be)

By @smusamashah - 7 months
I was wondering how were they getting depth from a video where camera is still.

> we utilize a comprehensive set of data-driven priors, including monocular depth maps

> Our method relies on off-the-shelf methods, e.g., mono-depth estimation, which can be incorrect.

By @tizio13 - 7 months
This reminds me of the description of Disneys(future movies) in Cloud Atlas. The movie had a good visualization, this feels like that.
By @moritonal - 7 months
Curiosity, what is the difference between 4D or 6DoF (six degrees of freedom)? Sounds a lot like the 6DoF work that Lytro did back in 2012, although this obviously is coming at the problem from the other direction, generating it rather than capturing it.
By @latexr - 7 months
The results are impressive, but what makes this 4D? Where’s the extra dimension and how is it relevant to 3D human beings?
By @blt - 7 months
the first HyperNeRF cat video is quite interesting-looking and surreal!
By @DiscourseFan - 7 months
Looks flat to me!