November 21st, 2024

The Matrix: Infinite-Horizon World Generation with Real-Time Interaction

The Matrix project aims to create an immersive digital universe with real-time interactions, featuring AAA-level graphics, frame-level precision, and open-sourced data to promote further research in world simulation technology.

Read original article

CuriositySkepticismExcitement

The Matrix: Infinite-Horizon World Generation with Real-Time Interaction

The Matrix project represents a significant advancement in real-time world simulation, aiming to create an immersive digital universe reminiscent of the film "The Matrix." Developed by a collaboration of researchers from Alibaba Group, the University of Hong Kong, and the University of Waterloo, the system achieves frame-level precision in user interactions, delivering visuals that are nearly indistinguishable from reality. It boasts infinite generative capacity, allowing for endless exploration of diverse environments, including deserts, cities, and forests. The Matrix distinguishes itself from other generative models by offering AAA-level graphics, high resolution, and robust domain generalization, enabling users to navigate dynamic landscapes seamlessly. The project utilizes a unique GameData Platform to collect high-quality, precise action-frame pairs from various games, which will be open-sourced to foster further research and innovation. The Matrix operates at 16 frames per second, demonstrating strong adaptability from virtual to real-world settings. This pioneering work not only showcases the potential of AI in crafting interactive worlds but also sets a foundation for future developments in immersive technology.

- The Matrix project aims to create a fully immersive digital universe with real-time interaction.

- It features frame-level precision and AAA-level visuals, allowing for seamless exploration of diverse environments.

- The project utilizes a unique data collection platform to ensure high-quality datasets for training.

- It operates at 16 FPS, showcasing adaptability from virtual to real-world scenarios.

- The open-sourcing of data aims to encourage further research and innovation in world simulation technology.

Diffusion Models Are Real-Time Game Engines

GameNGen, developed by Google and Tel Aviv University, simulates DOOM in real-time at over 20 frames per second using a two-phase training process, highlighting the potential of neural models in gaming.

New AI model can hallucinate a game of 1993's Doom in real time

Researchers from Google and Tel Aviv University developed GameNGen, an AI model that simulates Doom in real time, generating over 20 frames per second, but faces challenges with graphical glitches and visual consistency.

Oasis: A Universe in a Transformer

Oasis is an innovative AI model for real-time, open-world gameplay, generating interactions based on user inputs at 20 frames per second, with future enhancements planned for clarity and control.

Oasis: A Universe in a Transformer

Decart will launch Oasis, a real-time AI model for interactive gaming, on October 31, 2024. It generates gameplay based on user inputs, simulating physics and graphics with advanced techniques.

Oasis: A Universe in a Transformer

Oasis is an innovative open-world AI model by Decart and Etched, generating real-time gameplay at 20 frames per second, with plans for scaling and performance optimization, including a demo release.

AI: What people are saying

The comments on the Matrix project reveal a mix of skepticism and excitement about its potential.

Concerns about the definition of "world" and the need for spatial consistency in simulations.
Skepticism regarding the project's viability, with some labeling it as potential vaporware.
Interest in the technological advancements that could make immersive experiences more feasible.
Suggestions for using traditional game engines for stability and physics instead of generating everything from scratch.
Excitement about the future possibilities of immersive technology and its applications in various fields.

18 comments

By @alkonaut - 5 months

What does "world" mean here? How does the spatiality fit into some latent space? Or what constitutes the "world"? If the answer is, there is none, the world is just frames of video and any consistency quickly blurs out after a few seconds. That's not a world generation, that's just generation of video frames following frames. Not that it isn't cool, but it has almost zero usability for generating a "world" simulation. The key to a realistic world is that you can reliably navigate it. Visit and revisit places. If you modify anything, those modifications are persisted. If you leave a room and re-enter it hours later, the base expectation is that the same objects are in that room.

Wouldn't a working approach be to just create a really low resolution 3D world in the traditional "3D game world" sense to get the spatial consistency. Then this crude map with attributes is fed into frame generation to create the resulting world? It wouldn't be infinite, but on the other hand no one has a need for an infinite world either. A spherical world solves the border issue pretty handily. As I understood it, there was some element of that in the new FS2024 (discussed yesterday on HN).

By @ilaksh - 5 months

I'm into VR and mixed reality, and I think this is headed to making the Holodeck real in an immersive way. That's the concept of the Matrix and what they are demoing, just in 2d.

I am guessing the main thing holding this stuff back in terms of fidelity and consistency or generalization is just compute. But the new techniques they have here have just dramatically lowered the compute costs and increased the generalization.

Maybe just something like the giant Cerebras SRAM chips will get to the next 10 X in scale that smooths this out and pushes it closer to Star Trek. Or maybe some new paradigm like memristors.

But I'm looking forward to within just a few years being able to put on some fairly comfortable mixed reality glasses and just asking for whatever or whoever I want to appear in my home (for example) according to my whim.

Or, train it on a lot of how-to videos such as cooking. It just materializes an example of someone showing you exactly what you need to do right in your kitchen.

Here's another crazy idea: train on videos and interactions with productivity applications rather than games. In the future, for small businesses, we skip having the AI generate source code and just describe how the application works. The data and program state are just stored in a giant context window, and the application functionality changes the instant you make a request.

By @wavemode - 5 months

I wish researchers would spend more time on using generative models to create level geometry, rather than trying to generate video from scratch. It would be both cheaper and more effective for stable gameplay.

By @pedalpete - 5 months

This is the future I was trying to pitch in 2018 when we had built Ayvri and had every paraglider in the world, the world's largest ultramarathons, drone operators, and lots of other users of our real-world 3D environment.

Though we were using map tiles at the time, we were developing a model that took photos and a GPS track to add information that better matched environmental conditions (cloud, better lighting, etc).

People still ask me to open-source or give them our source code, but the code was acquired, so that isn't possible. But I do regularly say that if I were to rebuild Ayvri today, I'd do it as an interactive video rather than loading tiles.

By @akutlay - 5 months

Why would you want to generate all the pixels using this model instead of generating all the art, physics, and objects in the world using a game engine? The engine does so much of the physics and keeps everything stable for very cheap.

By @darepublic - 5 months

I didn't fully grok what this was about from the website. Though just last night I was talking to a friend about that quote from the Matrix that Morpheus tells Neo, so some nice synchronicity there. The sense I got from this is that they are developing a triple AAA type virtual world that can get generated on the fly based on text prompts? When the authors say frame level control do they mean that at any point, the next frame can be manipulated, either to be completely new or to influence the current story or context that is being played out?

By @jerpint - 5 months

I’m really excited for where this is going. From the demo videos, it seems to be a step up from Oasis, which itself came out only 2 weeks ago. I expect to see a lot of innovative use cases in this field

By @grodes - 5 months

unreadable website

By @shmerl - 5 months

> Click to play

Clicking - nothing works.

By @airbreather - 5 months

No source, no playable demo, just promises of.

Could be total vaporware for all we know.

It is an ad, a statement of achievement in case someone else states it first, or what?

Seems like it would be better on Youtube, it really doesn't offer much of use right now.

By @m3kw9 - 5 months

Definitely used Cyberpunk2077 footage to train

By @petermcneeley - 5 months

"Welcome to the Matrix" with matrix-like rain seems like an invitation for Warner Bros to sue you into oblivion.

By @efitz - 5 months

Prediction: in 20 years, I’m going to be reading about some dude who wrote a program to drive the car continuously until it ran into some surreal edge condition, and finally hit it. There will be a subculture of “matrix glitchers” who spend much of their time doing these kinds of experiments.

By @abricq - 5 months

This is surely really cool. Just a bit sad that, as phrased by the authors, the "First Real-Time" virtual world created for the demo is a fat & fast SUV driving on virgin lands.

By @ribcage - 5 months

Someone should ban "AI" articles on Hacker News.

Diffusion Models Are Real-Time Game Engines

New AI model can hallucinate a game of 1993's Doom in real time

Oasis: A Universe in a Transformer

Oasis is an innovative AI model for real-time, open-world gameplay, generating interactions based on user inputs at 20 frames per second, with future enhancements planned for clarity and control.

Oasis: A Universe in a Transformer

Decart will launch Oasis, a real-time AI model for interactive gaming, on October 31, 2024. It generates gameplay based on user inputs, simulating physics and graphics with advanced techniques.

Oasis: A Universe in a Transformer

Oasis is an innovative open-world AI model by Decart and Etched, generating real-time gameplay at 20 frames per second, with plans for scaling and performance optimization, including a demo release.

The Matrix: Infinite-Horizon World Generation with Real-Time Interaction

Related

Diffusion Models Are Real-Time Game Engines

New AI model can hallucinate a game of 1993's Doom in real time

Oasis: A Universe in a Transformer

Oasis: A Universe in a Transformer

Oasis: A Universe in a Transformer

Related

Diffusion Models Are Real-Time Game Engines

New AI model can hallucinate a game of 1993's Doom in real time

Oasis: A Universe in a Transformer

Oasis: A Universe in a Transformer

Oasis: A Universe in a Transformer