November 1st, 2024

Oasis: A Universe in a Transformer

Oasis is an innovative open-world AI model by Decart and Etched, generating real-time gameplay at 20 frames per second, with plans for scaling and performance optimization, including a demo release.

Read original articleLink Icon
Oasis: A Universe in a Transformer

Oasis is a groundbreaking open-world AI model developed by Decart and Etched, designed to generate real-time gameplay entirely through AI without a traditional game engine. It responds to user inputs, allowing actions like moving, jumping, and interacting with objects in a dynamic environment. The model utilizes a 500M parameter architecture, featuring a spatial autoencoder and a latent diffusion backbone, both based on Transformer technology. Oasis achieves real-time output at 20 frames per second, significantly faster than existing text-to-video models, which can take up to 20 seconds for a single frame. The model's efficiency is enhanced by Decart's inference engine and the upcoming Sohu ASIC, which is expected to support larger models at higher resolutions. Despite its impressive capabilities, Oasis faces challenges such as maintaining temporal stability, domain generalization, and precise control over game mechanics. Future developments will focus on scaling the model and optimizing performance to address these issues. The release includes the model's code, weights, and a playable demo, marking a significant step towards more complex interactive AI-driven worlds.

- Oasis is the first playable, real-time, open-world AI model.

- It generates gameplay based on user inputs without a traditional game engine.

- The model operates at 20 frames per second, outperforming existing text-to-video models.

- Future improvements aim to enhance model scaling and address current limitations.

- The project includes a live demo and the release of the model's code and weights.

Link Icon 38 comments
By @redblacktree - 6 months
"If you were dreaming in Minecraft" is the impression that I get. It feels very much like a dream with the lack of object permanence. Also interesting is light level. If you stare at something dark for a while or go "underwater" and get to the point where the screen is black, it's difficult to get back to anything but a black screen. (I didn't manage it in my one playthrough)

Very odd sensation indeed.

By @robotresearcher - 6 months
I don't see how you design and ship a game like this. You can't design a game by setting model weights directly. I do see how you might clone a game, eventually without all the missing things like object permanence and other long-term state. But the inference engine is probably more expensive to run than the game engine it (somewhat) emulates.

What is this tech useful for? Genuine question from a long-time AI person.

By @thrance - 6 months
> It's a video game, but entirely generated by AI

I ctrl-F'ed the webpage and saw 0 occurrence of "Minecraft". Why? This isn't a video game, this is a poor copy of a real video game you didn't even bother to say the name of, let alone credit it.

By @blixt - 6 months
Super cool, and really nice to see the continuous rapid progress of these models! I have to wonder how long-term state (building a base and coming back later) as well as potentially guided state (e.g. game rules that are enforced in traditional code, or multiplayer, or loading saved games, etc) will work.

It's probably not by just extending the context window or making the model larger, though that will of course help, because fundamentally external state and memory/simulation are two different things (right?).

Either way it seems natural that these models will soon be used for goal-oriented imagination of a task – e.g. imagine a computer agent that needs to find a particular image on a computer, it would continuously imagine the path between what it currently sees and its desired state, and unlike this model which takes user input, it would imagine that too. In some ways, to the best of my understanding, this already happens with some robot control networks, except without pixels.

By @duendefm - 6 months
It's not a videogame, it's a fast minecraft screenshot simulator where the prompt between each frame is the state of the input and the previous frames, with something of a resemblance of coherence.
By @jiwidi - 6 months
So basically trained a model on minecraft. This is not generalistic at all or whatsoever. Is not like the game comes from a prompt, it probably comes from a bunch of finetuning and gigadatasets from playing minecraft.

Would love to see some work like this but with world/games coming from a prompt.

By @whism - 6 months
Allow the user to draw into the frame buffer during play and feed that back, and you could have something very interesting.
By @brap - 6 months
Waiting line is too long so I gave up. Can anyone tell me, are the pixels themselves generated by the model, or does it just generate the environment which is rendered by “classical” means?
By @xyzal - 6 months
Maybe we should train models on Mario games to make Nintendo fight for the "Good Cause".
By @gessha - 6 months
I find this extremely disappointing. A diffusion transformer trained on Minecraft frames and accelerated on an ASIC... Okay?

From the demo(that doesn't work on Firefox) you can see that it's overfit to the training set and it doesn't have a consistent state transition.

If you define it as a Markov decision process with states being images, actions being keyboard/mouse inputs, the probability transition being the transformer model, the model is a very poor one. Turning the mouse around shouldn't result in a completely different world, it should result in the exact same point of space from different camera orientation. You can fake it by fudging with the training data and augmenting with walking a bit, doing a 360 camera rotation and continuing the exploration but that will just overfit to that specific seed.

The page says their ASICs model inference supports 60+ players. Where are they shown playing together? What's the point of touting multiplayer performance when realistically, the poor state transition will mean those 60+ players are playing single player DeepDream Minecraft?

By @jmartin2683 - 6 months
Why? Seems like a very expensive way to vaguely clone a game.
By @piperly - 6 months
From a research perspective, this approach isn’t new; David Ha and Danijar Hafner explored similar ideas years ago. However, the technique itself and the achievement of deploying it for testing by hundreds of users is commendable. It feels more like an experimental prototype than a viable replacement for mainstream gaming.
By @shanim_ - 6 months
Could you explain how the interaction between the spatial autoencoder (ViT-based) and the latent diffusion backbone (DiT-based) enables both rapid response to real-time input and maintains temporal stability across long gameplay sequences? Specifically, how does dynamic noising integrate with these components to mitigate error compounding over time in an autoregressive setup?
By @vannevar - 6 months
If anyone has ever read Tad Williams' Otherland series, this is basically the core idea. "The dream that is dreaming us."
By @djhworld - 6 months
I think this is really cool as a sort of art piece? It's very dreamlike and unsettling, especially with the music
By @0xTJ - 6 months
Seems like a neat idea, but too bad that the demo it doesn't work on Firefox.
By @amiramer - 6 months
So cool! Curious to see how it evolves.. seems like a portal into fully generated content, 0 applications. So exciting. Will it also be promptable at some point?
By @joshdavham - 6 months
Incredible work! I think once we’re able to solidly emulate these tiny universes, we can then train agents within them to make even more intelligent AI.
By @aaladdin - 6 months
How would you verify that real world physics actually hold here? Otherwise, such breaches could be maliciously and unfairly exploited.
By @mrtnl - 6 months
Very cool tech demo! Curious to see if we continue to generate environments in this level or move more to generating the physics
By @GaggiX - 6 months
Kinda hyped to see how this model (or a much bigger one) will run on Etched's transformer ASIC, Sohu, if it ever comes out.
By @th0ma5 - 6 months
This feels like a nice preview at the bottom of the kinds of unsolvable issues these things will always have to some degree.
By @TalAvner - 6 months
This is next level! I can't believe it's all AI generated in real time. Can't wait to see what's next.
By @goranim - 6 months
Love it! this virtual world looks so goo and it is also changing really fast so seems like a very powerful model!
By @Daroar - 6 months
I can see where they are going with it and wow! Truly the proof that we are all indeed in a simulation.
By @drdeca - 6 months
This apparently currently only supports chrome. I hope it will support non-chrome browsers in the future.
By @therein - 6 months
Queue makes it untestable. It isn't running client-side? What's with the queueing?
By @pka - 6 months
Negative comments are so weird, it's like people forgot what GPT 2 was like. I know this isn't completely new, but it's a world simulation inside a goddamn LLM. Not perfect, not coherent over longer time periods, but still insane. I swear if tomorrow magic turned out to be real and wizards start controlling the literal fabric of the universe people will be like "meh" before the week ends :D
By @gunalx - 6 months
Really cool tech demo. What for the most part impressed me is the inference speed. But I don't really see any use for this unless a way to store worldstate to avoid the issue of it forgetting what it just said.
By @petersonh - 6 months
Very cool - has a very dreamlike quality to it
By @jhonj - 6 months
tried their not-a-game and it was SICK to play knowing it's not a game engine. really sick. When did these Decart ppl started working on that. must be f genius ppl
By @duan2112 - 6 months
Love it!!!
By @keidartom - 6 months
So cool!
By @robblbobbl - 6 months
Me gusta!
By @hesyechter - 6 months
Very very cool, i love it Good luck