October 4th, 2024

Meta Movie Gen

Meta has launched Movie Gen, an AI model for creating and editing high-definition videos from text inputs, allowing personalized content generation and sound integration while emphasizing responsible AI development.

Read original article

ConcernExcitementSkepticism

Meta has introduced Movie Gen, an advanced AI model designed for creating immersive media content. This innovative tool allows users to generate custom videos and sounds from simple text inputs, edit existing videos, and transform personal images into unique video content. Movie Gen is capable of producing high-definition videos in various aspect ratios, marking a significant advancement in the industry. Users can input descriptive text to create scenes, such as a girl running on the beach or a sloth floating in a pool, and the AI will generate corresponding videos. Additionally, Movie Gen offers precise video editing capabilities, enabling users to modify styles, transitions, and other elements through text commands. The platform also supports the creation of personalized videos by uploading images, ensuring that human identity and motion are preserved. Furthermore, Movie Gen can generate sound effects and soundtracks to accompany the videos, enhancing the overall experience. Meta emphasizes the importance of building AI responsibly, focusing on trust and safety in its applications. The company encourages users to explore its research paper for more insights into the benchmarks set by Movie Gen in media generation.

- Meta's Movie Gen allows video creation and editing from text inputs.

- The AI can produce high-definition videos in various aspect ratios.

- Users can upload images to create personalized videos while preserving identity.

- The platform generates sound effects and soundtracks to enhance video content.

- Meta prioritizes responsible AI development focused on trust and safety.

Generating audio for video

Google DeepMind introduces V2A technology for video soundtracks, enhancing silent videos with synchronized audio. The system allows users to guide sound creation, aligning audio closely with visuals for realistic outputs. Ongoing research addresses challenges like maintaining audio quality and improving lip synchronization. DeepMind prioritizes responsible AI development, incorporating diverse perspectives and planning safety assessments before wider public access.

Meta 3D Gen

Meta introduces Meta 3D Gen (3DGen), a fast text-to-3D asset tool with high prompt fidelity and PBR support. It integrates AssetGen and TextureGen components, outperforming industry baselines in speed and quality.

Instagram starts letting people create AI versions of themselves

Meta has launched AI Studio, enabling US users to create customizable AI versions of themselves for Instagram, aimed at enhancing interaction while managing content and engagement with followers.

Show HN: Infinity – Realistic AI characters that can speak

Infinity AI has developed a groundbreaking video model that generates expressive characters from audio input, trained for 11 GPU years at a cost of $500,000, addressing limitations of existing tools.

Meta confirms it trains its AI on any image you ask Ray-Ban Meta AI to analyze

Meta can use images shared with its Ray-Ban Meta AI for training, raising privacy concerns as users may unknowingly provide sensitive data. Users must opt out to prevent data usage.

AI: What people are saying

The launch of Meta's Movie Gen AI model has generated a wide range of reactions and discussions among users.

Many commenters express concerns about the quality and authenticity of AI-generated videos, noting a distinct "AI sheen" and unrealistic elements.
There are worries about the potential misuse of the technology for misinformation and deepfakes, with calls for regulation and watermarking.
Some users see the technology as a tool for democratizing content creation, allowing more people to produce videos without significant resources.
Critics question the societal impact of AI-generated content, fearing it may overwhelm genuine human creativity and lead to a decline in quality.
Overall, the comments reflect a mix of excitement for the technology's potential and apprehension about its implications for the future of media and creativity.

116 comments

By @syntaxing - 6 months

I find the edit video with text the most fascinating aspect. I can see this being used for indie films that doesn’t have a CGI budget. Like the scene with the movie theater, you can film them on lounge chairs first and then edit it to seem like a movie theater.

By @deng - 6 months

These are not movies, these are clips. The stock photo/clip industry is surely worried about this, and probably will sue because 100% these models were trained on their work. If this technology ever makes movies, it'll be exactly like all the texts, images and music these models create: an average of everything ever created, so incredibly mediocre.

By @Aeolun - 6 months

Why are there so many websites that are essentially static HTML that make my phone stutter?

The video’s look cool, but I can’t really enjoy reading about them if my phone freezes every 2 seconds.

By @reneberlin - 6 months

We humans are so excessively dependent on vision input and with entertaining through visuals, too. But more and more all those visuals become meaningless to me and it all just feels like fast-food-junk to me.

As any pre-schooler will be able to produce anything (watch out parents) imaginable in seconds doesn't make it better to me or is of any real value.

Ok, i needed to edit it again to add: maybe this IS the value of it. We can totally forget about phantasizing stories with visuals (movies) because nobody will care anymore.

By @heurist - 6 months

I've been saying for years that generated content is an impending tsunami that's going to drown out all real human voices online. The internet may become effectively unusable as a result for anything other than entertainment.

By @TrackerFF - 6 months

All the vids have that instantly recognizable GenAI "sheen", for the lack of a better word. Also, I think the most obvious giveaway are all the micro-variations that happen along the edges, which give a fuzzy artifact.

By @wiseowise - 6 months

So I’m probably going to be too closed minded about this: but who the f*ck asked for this and did anyone consider consequences of easily accessible AI slop generation?

It’s already nearly impossible to find quality content on the internet if you don’t know where to look at.

By @nthdesign - 6 months

My kids both have creative hearts, and they are terrified that A.I. will prevent them from earning a living through creativity. Very recently, I've had an alternate thought. We've spent decades improving the technology of entertainment, spending billions (trillions?) of dollars in the process. When A.I. can generate any entertainment you can imagine, we might start finding this kind of entertainment boring. Maybe, at that point, we decide that exploring space, stretching our knowledge of physics and chemistry, and combating disease are far more interesting because they are real. And, through the same lens, maybe human-created art is more interesting because it is real.

By @lwansbrough - 6 months

This is really something. The spatial and temporal coherence is unbelievable.

By @Animats - 6 months

Likely results:

- Every script in Hollywood will now be submitted with a previs movie.

- Manga to anime converters.

- Online commercials for far more products.

By @turblety - 6 months

Why do these video generation ones never become usable to the public. Is it just they had to create millions of videos and cherry pick only a handful of decent generations? Or is it just so expensive there's no business model for it?

My mind instantly assumes it a money thing and they're just wanting to charge millions for it, therefore out of reach for the general public. But then with Meta's whole stance on open ai models, that doesn't seem to ring true.

By @tqi - 6 months

A lot of folks in this thread have mentioned that the problem with the current generation of models is that only 1 in (?) prompts returns something useful. Isn't that exactly what a reward model is supposed to help improve? I'm not an ML person by any means so the entire concept of reward models feels like creating something from nothing, so very curious to understand more.

By @clvx - 6 months

Off topic but some day you could live off grid with your own solar fusion mini reactor powering your own hardware that enables creating your own stories, movies and tales. No more need of streaming services. Internet would be to obtain news, goods and buy greatest and latest (or not) data to update your models. Decentralization could be for once not as painful as it is now; however, I still believe every single hardware vendor would try to hook to the internet and make you install an app. Looking forward to this AI revolution for sure.

By @sourraspberry - 6 months

Impressive.

Always important to bear in mind that the examples they show are likely the best examples they were able to produce.

Many times over the past few years a new AI release has "wowed" me, but none of them resulted in any sudden overnight changes to the world as we know it.

VFX artists: You can sleep well tonight, just keep an eye on things!

By @thinkingemote - 6 months

Are any image / video generation tools giving just the output or the layers, timelines, transitions, audio as things to work with in our old fashioned toolsets?

The problem: In my limited playing of these tools they don't quite make the mark and I would easily be able to tweak something if I had all the layers used. I imagine in the future products could be used to tweak this to match what I think the output should be....

At least the code generation tools are providing source code. Imagine them only giving compiled bytecode.

By @gorgoiler - 6 months

Photo sharing websites (including Facebook) used to be wrappers around ImageMagick with extra features. I love how the backbone of their training involves calling out to ffmpeg. It gives a little hope to those of us who, too, are working on a smaller scale but with similar techniques.

Scale? I have access to an H100. Meta trained their cat video stuff on six thousand H100s.

They mention that these consume 700W each. Do they pay domestic rates for power? Is that really only $500 per hour of electricity?

By @benabbott - 6 months

Things are about to get weird. We can't control this at any level:

At the level of image/video synthesis: Some leading companies have suggested they put watermarks in the content they create. Nice thought, but open source will always be an option, and people will always be able to build un-watermarked tools.

At the level of law: You could attempt to pass a law banning image/video generation entirely, or those without watermarks, but same issue as before– you can't stop someone from building this tech in their garage with open-source software.

At the level of social media platforms: If you know how GANs work, you already know this isn't possible. Half of image generation AI is an AI image detector itself. The detectors will always be just about as good as the generators- that's how the generators are able to improve themselves. It is, I will not mince words, IMPOSSIBLE to build an AI detector that works longterm. Because as soon as you have a great AI content classifier, it's used to make a better generator that outsmarts the classifier.

So... smash the looms..?

By @msp26 - 6 months

Any chance of this being released open weights? Or is the risk of bad PR too high (especially near a US election)?

It being 30B gives me hope.

By @throw310822 - 6 months

The porntential is immense.

Seriously though. This is the company that is betting hard on VR goggles. And these are engines that can produce real time dreams, 3d, photographic quality, obedient to our commands. No 3d models needed, no physics simulations, no ray tracing, no prebuilt environments and avatars. All simply dreamed up in real time, as requested by the user in natural language. It might be one of the most addictive technologies ever invented.

By @bitbasher - 6 months

If social media was the scourge of the last decade, the next decade's scourge will be artificial content.

Digital minimalism is looking more and more attractive.

By @nanna - 6 months

Absolutely terrifying. Please stop.

By @kleiba - 6 months

To all the folks with negative opinions of this work: you guys are nuts! This work is incredible. Is it the end of the line yet? Of course not, but come on! This is unbelievably cool, and who of you would have predicted any of this ten years ago?

By @jcims - 6 months

One thing I've noticed with the set of music generation tools (eg Udio, Suno) is that there's a sort of profound attachment to songs that you create. I've never made music the old fashioned way so I'm guessing the same could be true for that as well, but there are songs I've made on Udio that I personally think are amazing but nobody else really responds to. Conversely I can see similar levels of pride and attachment from others for songs they have created that don't do anything for me.

It's going to be interesting to see how that plays out when you can make just about any kind of media you wish. (Especially when you can mix this as a form of 'embodiment' to realize relationships with virtual agents operated by LLMs.)

By @skerit - 6 months

I made a silly 1-hour long movie with friends +/- 20 years ago, on DV tape. I would love to use this to actually be able to implement all the things we wanted to achieve back then

By @darepublic - 6 months

The problem with gen AI right now is it still feels fairly obvious. There are numerous YouTube channels that primarily rely on gpt for the visuals. And I don't like them.

By @alexawarrior4 - 6 months

It is important to note, no matching audio dialog, or even an attempt at something like dialog. This seems to be way beyond current full video generation models.

By @voidUpdate - 6 months

Some of these look really obviously bad, like the guy spinning the fire and the girl running along the beach. And it completely failed at the bubbles

By @tracerbulletx - 6 months

The most powerful example for me is actually the rain one because it will probably be good enough for lower key effects shots like that to replace a lot of jobs there. More complex generations might look goofy for a while, but if it's good for sky replacement, pyrotechnics, and other line of work effects shots its going to be heavily disruptive.

By @pookha - 6 months

Facebook just spent 40 billion dollars on their AI infrastructure. Can they recoup those costs with stuff like this (especially after the VI debacle)? I doubt it. AI has been a wild ass jagged wasteland of economic failure since the 1950's and should be used with extreme caution by these companies...Like is it worth peoples time to spend ten\fifteen dollars (they have to eventually charge for this) to let AI create a, to be freank, half-assed valley of the uncanny movie? I respect the technology and what they're trying to accomplish but this just seems like they're going completely all in on an industry that's laid waste to smarter people than Mark.

By @sanj - 6 months

Hippos don't float.

By @bbor - 6 months

Incredible, simply incredible. You know a paper is seminal when all the methods seem obvious in hindsight! Though I’m not caught up on SOTA, so maybe some of this is obvious in normal-sight, too.

RIP Pika and ElevenLabs… tho I guess they always can offer convenience and top tier UX. Still, gotta imagine they’re panicking this morning!

  Upload an image of yourself and transform it into a personalized video. Movie Gen’s cutting-edge model lets you create personalized videos that preserve human identity and motion.

Given how effective the still images of Trump saving people in floodwater and fixing electrical poles have been despite being identifiable as AI if you look closely (or think…), this is going to be nuts. 16 seconds is more than enough to convince people, I’m guessing the average video watch time is much less than that on social media.

Also, YouTube shorts (and whatever Meta’s version is) is about to get even worse, yet also probably more addicting! It would be hard to explain to an alien why we got so unreasonably good at optimal content to keep people scrolling. Imagine an automated YouTube channel running 24/7 A/B experiments for some set of audiences…

By @smusamashah - 6 months

I was looking for that landslide effect (as seen even in Sora and Kling) where land seems moving very disproportionally to everything else. It makes me motion sick. I have not seen those Sora demo videos a second time for that reason.

These are smooth, consistent, no landslide (except sloth floating in water, the stones on right are moving at much higher rate than the dock coming closer), no things appearing out of nowhere. Editing seems not as high quality (the candle to bubble example).

To me, these didn't induce nausea while being very high quality makes it best among current video generators.

By @tomw1808 - 6 months

Why does it look ... fake?

Before you downvote, don't get this as a belittling the effort and all the results, they are stunning, but as a sincere question.

I do plenty of photography, I do a lot of videography. I know my way around Premiere Pro, Lightroom and After Effects. I also know a decent amount about computer vision and cg.

If I look at the "edited" videos, they look fake. Immediately. And not a little bit. They look like they were put through a washing machine full of effects: too contrasty, too much gamma, too much clarity, too low levels, like a baby playing with the effect controls. Can't exactly put my fingers on, but comparing the "original" videos to the ones that simply change one element, like the "add blue pom poms to his hands", it changes the whole video, and makes the whole video a bit cartooney, for lack of a better word.

I am simply wondering why?!

Is that a change in general through the model that processes the video? Is that something that is easy to get rid of in future versions, or inherently baked into how the model transforms the video?

By @tyjen - 6 months

While we're still a fair distance away from creating polished products capable of replacing Hollywood gatekeeping; the bursting of the creative dam is on the horizon and it's exciting! I'm looking forward to when you can write a script and effectively make your own series or movie. Tweaking it as you go to fit your vision without the exhausting a large amount of resources, capital, and human networking to produce similar products pre-AI.

By @sroerick - 6 months

I haven’t had any luck being able to effectively generate compositions with text to image / text to video. Prompts like “subject in the lower third of the frame” have thus far completely failed me. I’m sure this will change in the future but this seems pretty fundamental for any ‘AI Powered Film’ to function the way a film director would.

Curious if anybody has a solution or if this works for that

By @mempko - 6 months

Hippos can't swim. Things are about to get weird where people will start believing strange things. We already have people believing Trump helped people during the hurricane, with images of him wading through water (that are clearly AI generated if you look close enough). We are going to get a form of model collapse at not just the AI level, but societal one.

By @davedx - 6 months

This is just the landing page for a research paper? It's hard to understand what the actual production capabilities of this are.

By @terminatornet - 6 months

does anyone have an example of an AI generated video that's more than 10 seconds long that doesn't look like garbage? All of these tools seem to generate a weirdly zooming shot of something that turns a little bit and that's about it.

Anything longer than a single clip is just a bunch of these clips stitched together.

By @modeless - 6 months

Did I miss it or did they not say anything about letting people actually use these models, let alone open sourcing them?

By @zoogeny - 6 months

I wonder how they will package this as a product. I mean, there is some advantage to keeping the tool proprietary and wrapping it in a consumer product for Instagram/Facebook.

What I hope (since I am building a story telling front-end for AI generated video) is that they consider b2c and selling this as a bulk service over an api.

By @999900000999 - 6 months

Very cool.

But I'm worried about this tech being used for propaganda and dis information.

Someone with a 1K computer and enough effort can generate a video that looks real enough. Add some effects to make it look like it was captured by a CCTV or another low res camera.

This is what we know about, who knows what's behind NDAs or security clearances.

By @pk-protect-ai - 6 months

It is really amazing how consistent this model is in demo videos about world object details over time. This spatial comprehension is really spooky and super amazing at the same time. I hope Meta will release this model with open weights and open code, as they have done for the LLaMA models.

By @qwery - 6 months

What can you even say about this stuff? It's another incremental improvement, good job Mark. These new video clips of yours are certainly something. I don't know how you do it. Round of applause for Mark!

I will now review some of the standout clips.

That alien thing in the water is horrifying. The background fish look pretty convincing, except for the really flamboyant one in the dark.

I guess I should be impressed that the kite string seems to be rendered every frame and appears to be connected between the hand and the kite most of the time. The whole thing is really stressful though.

drunk sloth with weirdly crisp shadow should take the top slot from girl in danger of being stolen by kite.

man demonstrates novel chain sword fire stick with four or five dimensions might be better off in the bin...

> The camera is behind a man. The man is shirtless, wearing a green cloth around his waist. He is barefoot. With a fiery object in each hand, he creates wide circular motions. A calm sea is in the background. The atmosphere is mesmerizing, with the fire dance.

This just reads like slightly clumsy lyrics to a lost Ween song.

By @smusamashah - 6 months

One extra clip on their blog post

https://ai.meta.com/blog/movie-gen-media-foundation-models-g...

By @lozzo - 6 months

When I was little I used to think it was a shame that I could not show my dreams. I could tell my parents what I dreamt but not show them what I saw (or thought I saw while dreaming). Getting closer

By @barumrho - 6 months

Thinking about abuse potential, is there such a thing as irreversible finger-printing of media generated like this? So that even bad actors couldn't hide the fact that it was generated by AI.

By @quest88 - 6 months

That's very impressive.

By @oulipo - 6 months

Impressive, yet more burning GPUs and pushing CO2 in the atmosphere just for stupid stuff that is only of interest to rich western people...

I'd rather have those people work on climate change solutions

By @wseqyrku - 6 months

Looking at bolt.new I think all the Studio/IDE type of apps are going to look like that. Could be video or code or docs etc.

I can see myself paying a little too much to have a local setup for this.

By @Jean-Papoulos - 6 months

Website doesn't work on Firefox and videos don't play on Edge. They should consider asking the AI to make a correct website before having it make hippos swim.

By @eth0up - 6 months

Alright, I may or may not be a moron, but none of my versions of Firefox can connect to this site because 'some HSTS shit'.

Anyone able to update/inform a dinosaur?

By @joshdavham - 6 months

I wonder if one day we’ll have generative recommender systems where, instead of finding videos the algorithm thinks you’ll like, it just generates them on the spot.

By @devonsolomon - 6 months

I’ve long ago heard it said that the two drivers of technology innovation are the military and porn. And, welp, I don’t see any use of this to the military.

By @gcr - 6 months

    > Upload an image of yourself and transform it
    > into a personalized video. Movie Gen’s
    > cutting-edge model lets you create personalized
    > videos that preserve human identity and motion.

A stalker’s dream! I’m sure my ex is going to love all the videos I’m going to make of her!

Jokes aside, it’s a little bizarre to me that they treat identity preservation as a feature while competitors treat that as a bug, explicitly trying not to preserve identity of generated content to minimize deepfake reputation risk.

Any woman could have flagged this as an issue before this hit the public.

By @greybox - 6 months

At what point did someone look at this and think: "Ah yes, this will be good for humanity right now" ?

By @TranquilMarmot - 6 months

Most of the comments here talking about bad actors using this for misinformation, but they're ignoring what Meta does- it collects your information and it sells ads.

Especially based on the examples on this site, it's not a far reach to say that they will start to generate video ads of you (yes, YOU! your face! You've already uploaded hundreds of photos for them to reference!) using a specific product and showing how happy you are because you bought it. Imagine scrolling Instagram and seeing your own face smelling some laundry detergent or laughing because you took some prescription medicine.

By @phkahler - 6 months

The kids kite is flying backwards....

By @tikkun - 6 months

FAQs I found:

Is it available for use now? Nope

When will it be available for use? On FB, IG and WhatsApp in 2025

Will it be open sourced? Maybe

What are they doing before releasing it? Working with filmmakers, improving video quality, reducing inference time

By @jmakov - 6 months

Wonder what a AI generated movie from the same script as original would look like.

By @HumblyTossed - 6 months

I'm not impressed with the quality. Did they mean to make it look so cartoony?

By @ramshanker - 6 months

It feels like in the field of AI, a major advancement happens every month now!

By @bhouston - 6 months

Where can I download this model? Meta is the open source AI company right?

By @mike_hearn - 6 months

The paper that comes with this is nearly as crazy as the videos themselves. At a cool 92 pages it's closer to a small book than a normal scientific publication. There's nearly 10 pages of citations alone. I'll have to work through this in the coming days, but here's a few interesting points from the first few sections.

For a long time people have speculated about The Singularity. What happens when AI is used to improve AI in a virtuous circle of productivity? Well, that day has come. To generate videos from text you need video+text pairs to train on. They get that text from more AI. They trained a special Llama3 model that knows how to write detailed captions from images/video and used it to consistently annotate their database of approx 100M videos and 1B images. This is only one of many ways in which they deployed AI to help them train this new AI.

They do a lot of pre-filtering on the videos to ensure training on high quality inputs only. This is a big recent trend in model training: scaling up data works but you can do even better by training on less data after dumping the noise. Things they filter out: portrait videos (landscape videos tend to be higher quality, presumably because it gets rid of most low effort phone cam vids), videos without motion, videos with too much jittery motion, videos with bars, videos with too much text, video with special motion effects like slideshows, perceptual duplicates etc. Then they work out the "concepts" in the videos and re-balance the training set to ensure there are no dominant concepts.

You can control the camera because they trained a dedicated camera motion classifier and ran that over all the inputs, the outputs are then added to the text captions.

The text embeddings they mix in are actually a concatenation of several models. There's MetaCLIP providing the usual understanding of what's in the request, but they also mix in a model trained on character-level text so you can request specific spellings of words too.

The AI sheen mentioned in other comments mostly isn't to do with it being AI but rather because they fine-tune the model on videos selected for being "cinematic" or "aesthetic" in some way. It looks how they want it to look. For instance they select for natural lighting, absence of too many small objects (clutter), vivid colors, interesting motion and absence of overlay text. What remains of the sheen is probable due to the AI upsampling they do, which lets them render videos at a smaller scale followed by a regular bilinear upsample + a "computer, enhance!" step.

They just casually toss in some GPU cluster management improvements along the way for training.

Because the MovieGen was trained on Llama3 generated captions, it's expecting much more detailed and high effort captions than users normally provide. To bridge the gap they use a modified Llama3 to rewrite people's prompts to become higher detail and more consistent with the training set. They dedicated a few paragraphs to this step, but it nonetheless involves a ton of effort with distillation for efficiency, human evals to ensure rewrite quality etc.

I can't even begin to imagine how big of a project this must have been.

By @zamadatix - 6 months

When the comments range from "it's the demise of the world" to "it doesn't look quite right" (and everything in-between) you get a sense of just how early we are into this decade's "big new tech thing".

By @Juliate - 6 months

Impressive but meh.

Impressive on the relative quality of the output. And of the productivity gains, sure.

But meh on the substance of it. It may be a dream for (financial) producers. For the direct customers as well (advertisement obviously, again). But for creators themselves (who are to be their own producers at some point, for some)?

On the maker side, art/work you don't sweat upon has little interest and emotional appeal. You shape it about as much as it shapes you.

On the viewer side, art that's not directed and produced by a human has little interest, connection and appeal as well. You can't be moved by something that's been produced by someone or something you can't relate to. Especially not a machine. It may have some accidental aesthetic interest, much like generative art had in the past. But uninhabited by someone's intent, it's just void of anything.

I know it's not the mainstream opinion, but Generative AI every day sounds more and more like cryptocurrencies and NFTs and these kinds of technologies that did not find _yet_ their defining problem to which they could be a solution.

By @petabyt - 6 months

I'm sick of seeing this generative stuff. It's at least 50% of the content I see online these days. At this point it's so refreshing to see real photography and art, made by real humans. I hope we never lose that.

By @bilekas - 6 months

This just seems to serpent eating its own tail and distopian to me, Facebook, a company where people share their own content like videos and pictures now generating content from nothing but AI. To what end?

By @brisky - 6 months

Was this trained on personal facebook video data?

By @garrettgarcia - 6 months

Harry Potter-style moving pictures are now a reality.

By @sleepybrett - 6 months

It should be federal law that any video created with GenAI should be watermarked both stenographically and visually. (Same goes for images and audio.. not sure what can be done about ascii)

By @FrequentLurker - 6 months

They didn't post any examples where it fails?

By @idunnoman1222 - 6 months

It will be as interesting as our dreams. So maybe personally interesting, like for a small group sitting around a table and taking the piss. But it’s not gonna make a global sensation.

By @seydor - 6 months

I can finally watch Star Wars the Smurfs edition

By @chasing - 6 months

Periodic generative AI reminder:

It will not make you creative. It will not give you taste or talent. It is a technical tool that will mostly be used to produce cheap garbage unless you develop the skills to use it as a part of your creative toolkit -- which should also include many, many other things.

By @anonzzzies - 6 months

So can we try it? Announcements are no good.

By @ksynwa - 6 months

Facebook is already flooded with very strange (to put it kindly) AI boomer engagement bait AI images. I cannot help but think about how much worse the problem could get with AI generated videos. But they are not cheap to make right now.

By @rootedbox - 6 months

Did this website kill anyone else’s phone?

By @tempusalaria - 6 months

We live in the future. I just hope we consumers get easy access to these video tools at some point. I want to make personal movies from my favorite books

By @mucle6 - 6 months

The text to modify a video looks so cool

By @bamboozled - 6 months

Seems like it mops the floor with Sora

By @Krei-se - 6 months

This is totally awesome - the tech is out there and whether you use it to make videos or solve long human / world problems is up to you.

Yeah, we might get the bad killer robots. But it's more likely this will make it unnecessary to wonder where on this blue planet you can still live when we power the deserts with solar and go to space. Getting clean nutrition and environment will be within reach. I think that's great.

As with all technology: Yes a car is faster than you. And you can buy or rent one. But it's still great to be healthy and able to jog. So keep your brains folks and get some skills :)

By @indymike - 6 months

The giant crab-like thing in the background of the Hippo swimming (if a hippo could swim) is the stuff of nightmares.

By @karel-3d - 6 months

Why don't videos like this ever trend?

#cabincrew

#scarletjohanson

#amen

By @swayvil - 6 months

Just feed it a book?

By @Jiahang - 6 months

student here ，i learn cs and management. And i really Puzzled what i learn now can help me have better life in this era of rapid development of technology.

By @nephy - 6 months

McDonald’s art.

By @codeduck - 6 months

those penguins are incredibly buoyant.

By @yorozu - 6 months

(commented on wrong thread somehow)

By @Hard_Space - 6 months

Yet another one-shot, single-clip Instagram machine that can't do a follow-on shot natively.

As it stands, the only chance you have of depicting a consistent story across a series of shots is image-to-video, presuming you can use LoRAs or similar techniques to get the seed photos consistent in themselves.

By @animanoir - 6 months

wow more useless tech

By @moomoo11 - 6 months

This is great. Honestly imagine we get to a point this technology makes most things so demystified we move on to things that are more difficult.

Like cool a movie doesn’t need to cost $200 million or whatever.

Imagine if those creative types were freed up to do something different. What would we see? Better architecture and factories? Maybe better hospitals?

By @_sys49152 - 6 months

change style to pencil sketch = absolute gamechanger. (penguins vid)

thats the most amenable approach to ai filmmaking ive seen available yet.

id have to see wayyy more pencil sketch conversions to see exactly whats going on....

...but that right there is the easiest way to hack making movies - with the most control.....so far...

By @nektro - 6 months

everyone who worked on this should be ashamed

By @famahar - 6 months

I've kinda given up on the internet at this point. It's sad but comforting. My social networks are just my friends and I've started to get back into reading books and long form blogs. Don't want to be exposed to this endless slop. Every day it gets harder to find something that was so easy before. It's all being buried by endless content. I'm hoping some non AI generative content branch of the internet will be created. Don't know if something like that is possible. Curation seems like the next best step.

By @intended - 6 months

Man, everyone is happy with these advancements, and they are impressive.

I’m here looking at users and wondering - the content pipelines are broader, but the exit points of attention and human brains are constant. How the heck are you supposed to know if your content is valid?

During a recent apple event, someone on YT had an AI generated video of Tim Cook announcing a crypto collaboration; it had a 100k users before it was taken down.

Right now, all the videos of rockets falling on Israel can be faked. Heck, the responses on the communities are already populated by swathes of bots.

It’s simply cheaper to create content and overwhelm society level filters we inherited from an era of more expensive content creation.

Before anyone throws the sink at me for being a Luddite or raining on the parade - I’m coming from the side where you deal with the humans who consume content, and then decide to target your user base.

Yes, the vast majority of this is going to be used to create lovely cat memes and other great stuff.

At the same time, it takes just 1 post to act as a lightning rod and blow up things.

Edit:

From where I sit, there are 3 levels of issues.

1) Day to day arguments - this is organic normal human stuff

2) Bad actors - this is spammers, hate groups, hackers.

3) REALLY Bad actors - this is nation states conducting information warfare. This is countries seeding African user bases with faked stories, then using that as a basis for global interventions.

This is fake videos of war crimes, which incense their base and overshadow the harder won evidence of actual war crimes.

This doesn’t seem real, but political forces are about perception, not science and evidence.

By @cs702 - 6 months

Impressive.

It's only going to get better, faster, cheaper, easier.[a]

Sooner than anyone could have expected, we'll be able to ask the machines: "Turn this book into a two-hour movie with the likeness of [your favorite actor/actress] in the lead role."

Sooner than anyone could have expected, we'll be able to have immersive VR experiences that are crafted to each person.

Sooner than anyone could have expected, we won't be able to identify deepfakes anymore.

We sure live in interesting times!

---

[a] With apologies to Daft Punk: https://www.youtube.com/watch?v=gAjR4_CbPpQ

By @dpcan - 6 months

You know it’s going to happen:

“I want a funny road trip movie staring Jim Carey and Chris Farley, based in Europe, in the fall, where they have to rescue their mom played by Lucille ball from making the mistake of marrying a character played by an older Steve Martin.”

10 minutes later your movie is generated.

If you like it, you save it, share it, etc.

You have a queue of movies shared by your friends that they liked.

Content will be endless and generated.

By @azinman2 - 6 months

Unless I'm missing something, this technology's harmful potential outweighs the good. What is the great outcome from it that makes society better? MORE content? TikTok already shows that you can out-influence Hollywood/governments in 10 seconds with your smartphone. Heck, you can cause riots through forwarding text messages on WhatsApp [1]. Not everything that can be done should be done, and I think this is just too harmful for people to work on. I wish we'd globally ban it.

[1] https://www.dw.com/en/whatsapp-in-india-scourge-of-violence-...

By @woggy - 6 months

It's already bad, but the amount of garbage that is going to flood YouTube now is going to make it unusable.

By @lucasyvas - 6 months

Well I wouldn't have called it, but I think Meta is in the lead. They beat Apple to AR and affordable VR. Their AI tooling has basically caught up to OpenAI and at this rate will pass them - is anyone else even playing? Maybe their work culture is just better suited to realizing these technologies than the others.

They're not really showing signs of slowing down either. Hey, Zuck, always thought you were kind of lame in the past. But maybe you weren't a one trick pony after all.

By @brianjking - 6 months

Additional Links: https://x.com/AIatMeta/status/1842188252541043075 https://ai.meta.com/static-resource/movie-gen-research-paper

From Twitter/X:

Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date.

Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in entirely new possibilities for casual creators and creative professionals alike.

More details and examples of what Movie Gen can do https://go.fb.me/kx1nqm

Movie Gen models and capabilities Movie Gen Video: 30B parameter transformer model that can generate high-quality and high-definition images and videos from a single text prompt.

Movie Gen Audio: A 13B parameter transformer model that can take a video input along with optional text prompts for controllability to generate high-fidelity audio synced to the video. It can generate ambient sound, instrumental background music and foley sound — delivering state-of-the-art results in audio quality, video-to-audio alignment and text-to-audio alignment.

Precise video editing: Using a generated or existing video and accompanying text instructions as an input it can perform localized edits such as adding, removing or replacing elements — or global changes like background or style changes.

Personalized videos: Using an image of a person and a text prompt, the model can generate a video with state-of-the-art results on character preservation and natural movement in video.

We’re continuing to work closely with creative professionals from across the field to integrate their feedback as we work towards a potential release. We look forward to sharing more on this work and the creative possibilities it will enable in the future.

By @jsheard - 6 months

If nothing else it will produce some amazing material for this account, once the content farms get their hands on it: https://x.com/FacebookAIslop

By @niles - 6 months

Crashes Firefox mobile. Looks pretty impressive on Chrome! Apparently hosted only

By @happyraul - 6 months

Hippos can't actually swim though.

By @39896880 - 6 months

They all have dead eyes. It's creepy.

By @shortrounddev2 - 6 months

I strongly believe this technology is bad

By @ribcage - 6 months

AI destroying mankind has always been a theme, but it's probably not going to be in the form of a physical war. This picture generation will probably have a very negative impact on mankind. Imagination is one of the driving forces of mankind, and so is our desire to realize the things we desire. When these generators become a common thing, human imagination and intelligence will start inevitably degrading. Nobody will bother painting pictures, making music, making video games or movies, when anyone can just see and hear whatever they want instantly. And people will brobably work all day long just so come home and pay to use these generators. This has the potential to destroy mankind by destryoing the human spirit.

Generating audio for video

Meta 3D Gen

Instagram starts letting people create AI versions of themselves

Meta has launched AI Studio, enabling US users to create customizable AI versions of themselves for Instagram, aimed at enhancing interaction while managing content and engagement with followers.

Show HN: Infinity – Realistic AI characters that can speak

Infinity AI has developed a groundbreaking video model that generates expressive characters from audio input, trained for 11 GPU years at a cost of $500,000, addressing limitations of existing tools.

Meta confirms it trains its AI on any image you ask Ray-Ban Meta AI to analyze

Meta can use images shared with its Ray-Ban Meta AI for training, raising privacy concerns as users may unknowingly provide sensitive data. Users must opt out to prevent data usage.

Meta Movie Gen

Related

Generating audio for video

Meta 3D Gen

Instagram starts letting people create AI versions of themselves

Show HN: Infinity – Realistic AI characters that can speak

Meta confirms it trains its AI on any image you ask Ray-Ban Meta AI to analyze

Related

Generating audio for video

Meta 3D Gen

Instagram starts letting people create AI versions of themselves

Show HN: Infinity – Realistic AI characters that can speak

Meta confirms it trains its AI on any image you ask Ray-Ban Meta AI to analyze