July 31st, 2024

Meta introduces Segment Anything Model 2

Meta has launched the Segment Anything Model 2 (SAM 2) for segmenting objects in images and videos, featuring real-time processing, zero-shot performance, and open-sourced resources for enhanced user interaction.

Read original articleLink Icon
Meta introduces Segment Anything Model 2

Meta has introduced the Segment Anything Model 2 (SAM 2), a unified model designed for segmenting objects in both images and videos. SAM 2 allows users to select objects using various input methods, such as clicks or masks, and can track these objects across video frames. The model demonstrates strong zero-shot performance, meaning it can effectively segment objects not seen during training, making it applicable in diverse real-world scenarios. SAM 2 is optimized for real-time processing, enabling interactive applications with minimal user input.

The architecture of SAM 2 includes a per-session memory module that retains information about selected objects, allowing for continuous tracking even if the object is temporarily obscured. This model was trained on a large dataset, the Segment Anything Video Dataset (SA-V), which includes over 600,000 object masks from approximately 51,000 videos, ensuring geographic and contextual diversity.

Meta is open-sourcing the pretrained SAM 2 model, the SA-V dataset, and providing a demo for public use. The model's outputs can be integrated with other AI systems for advanced video editing capabilities. SAM 2 aims to enhance user interaction with video content and supports future extensions for creative real-time applications. The initiative reflects Meta's commitment to transparency and collaboration within the research community.

Link Icon 11 comments
By @null_investor - 9 months
Meta is killing it. Google seems to be lagging behind them in AI research and useful things that is shared within the community.

I'm sure this, LLAMA and the other projects that they have will help drive up new creations, companies and progress.

I'm also sure that this kind of openly sharing code and research will drive up business value for them. It may be hard to say right now what it is, but I'm sure it will.

That's the difference of a founder-led company vs. market led. Google is mostly concerned on short-term goals so they don't report a single bad quarter or has too high CAPEX on a project without profit on its sight (like VR).

Once Meta finds the killer app for VR, all the other companies will be so many years behind that they will need to buy software from Meta or not take any market share in this new space. Similar to what happened about AI chips and Nvidia. Nobody was investing enough on it.

By @gnabgib - 9 months
Discussion (720 points, 1 day ago, 130 comments) https://news.ycombinator.com/item?id=41104523
By @SXX - 9 months
If someone would tell me just a decade ago that Facebook will become one of the most open innovating companies and Mark Zuckerberg one of sane billionares I would really laugh in their face. But now...

Regardless of how actually successful their attempts at VR and AI end up they already going to have some place in history for that.

By @npiano - 9 months
I always think of the holographic orbit mapping UI from The Expanse when I see something like this. It's the paper of the future that will be hooked into everything we think about. Such a powerful tool for exploring the world.
By @switchstance - 9 months
I would have killed for this back in my editing and motion graphics days.

The Roto Brush in After Effects is similar, but its quality has always been lacking and it takes forever to process.

By @d3m0t3p - 9 months
it says that the released the code, but I couldn't find any except some example code, Did they release the training code ?
By @asah - 9 months
Impressive results! Here's a test video from inside Mercer Labs: https://youtu.be/W7kM0ISXkpQ?feature=shared
By @jdhzzz - 9 months
No love for Firefox.
By @jonatron - 9 months
Thanks the thousands of African workers doing the boring repetitive dataset work.
By @lakfha - 9 months
So you can track "objects" like people in a shopping mall and they are releasing it as a commoditized complement.

What are possible scenarios?

1) Advertisers build tracking systems in supermarkets and malls and Facebook sells them the profiles of the victims.

2) Facebook is getting people hooked on the "free" models but will later release a better and fully closed model with tracking software so advertisers have to purchase both profiles and the software.

3) Facebook releases the model in order to create goodwill among (a subset of) the developer population and normalize tracking. The subset of developers will also act as useful idiots in the EU and promote tracking and corporate power.