July 31st, 2024

Meta introduces Segment Anything Model 2

Meta has launched the Segment Anything Model 2 (SAM 2) for segmenting objects in images and videos, featuring real-time processing, zero-shot performance, and open-sourced resources for enhanced user interaction.

Read original article

Meta introduces Segment Anything Model 2

Meta has introduced the Segment Anything Model 2 (SAM 2), a unified model designed for segmenting objects in both images and videos. SAM 2 allows users to select objects using various input methods, such as clicks or masks, and can track these objects across video frames. The model demonstrates strong zero-shot performance, meaning it can effectively segment objects not seen during training, making it applicable in diverse real-world scenarios. SAM 2 is optimized for real-time processing, enabling interactive applications with minimal user input.

The architecture of SAM 2 includes a per-session memory module that retains information about selected objects, allowing for continuous tracking even if the object is temporarily obscured. This model was trained on a large dataset, the Segment Anything Video Dataset (SA-V), which includes over 600,000 object masks from approximately 51,000 videos, ensuring geographic and contextual diversity.

Meta is open-sourcing the pretrained SAM 2 model, the SA-V dataset, and providing a demo for public use. The model's outputs can be integrated with other AI systems for advanced video editing capabilities. SAM 2 aims to enhance user interaction with video content and supports future extensions for creative real-time applications. The initiative reflects Meta's commitment to transparency and collaboration within the research community.

Video annotator: a framework for efficiently building video classifiers

The Netflix Technology Blog presents the Video Annotator (VA) framework for efficient video classifier creation. VA integrates vision-language models, active learning, and user validation, outperforming baseline methods with an 8.3 point Average Precision improvement.

Big tech wants to make AI cost nothing

Meta has open-sourced its Llama 3.1 language model for organizations with fewer than 700 million users, aiming to enhance its public image and increase product demand amid rising AI infrastructure costs.

TreeSeg: Hierarchical Topic Segmentation of Large Transcripts

Augmend is creating a platform to automate tribal knowledge for development teams, featuring the TreeSeg algorithm, which segments session data into chapters by analyzing audio transcriptions and semantic actions.

Sam 2: Segment Anything in Images and Videos

The GitHub repository for Segment Anything Model 2 (SAM 2) by Facebook Research enhances visual segmentation with real-time video processing, a large dataset, and APIs for image and video predictions.

Sam 2: The next generation of Meta Segment Anything Model

Meta has launched SAM 2, a real-time object segmentation model for images and videos, enhancing accuracy and reducing interaction time. It supports diverse applications and is available under an Apache 2.0 license.

11 comments

By @null_investor - 9 months

Meta is killing it. Google seems to be lagging behind them in AI research and useful things that is shared within the community.

I'm sure this, LLAMA and the other projects that they have will help drive up new creations, companies and progress.

I'm also sure that this kind of openly sharing code and research will drive up business value for them. It may be hard to say right now what it is, but I'm sure it will.

That's the difference of a founder-led company vs. market led. Google is mostly concerned on short-term goals so they don't report a single bad quarter or has too high CAPEX on a project without profit on its sight (like VR).

Once Meta finds the killer app for VR, all the other companies will be so many years behind that they will need to buy software from Meta or not take any market share in this new space. Similar to what happened about AI chips and Nvidia. Nobody was investing enough on it.

By @gnabgib - 9 months

Discussion (720 points, 1 day ago, 130 comments) https://news.ycombinator.com/item?id=41104523

By @SXX - 9 months

If someone would tell me just a decade ago that Facebook will become one of the most open innovating companies and Mark Zuckerberg one of sane billionares I would really laugh in their face. But now...

Regardless of how actually successful their attempts at VR and AI end up they already going to have some place in history for that.

By @npiano - 9 months

I always think of the holographic orbit mapping UI from The Expanse when I see something like this. It's the paper of the future that will be hooked into everything we think about. Such a powerful tool for exploring the world.

By @switchstance - 9 months

I would have killed for this back in my editing and motion graphics days.

The Roto Brush in After Effects is similar, but its quality has always been lacking and it takes forever to process.

By @d3m0t3p - 9 months

it says that the released the code, but I couldn't find any except some example code, Did they release the training code ?

By @asah - 9 months

Impressive results! Here's a test video from inside Mercer Labs: https://youtu.be/W7kM0ISXkpQ?feature=shared

By @jdhzzz - 9 months

No love for Firefox.

By @jonatron - 9 months

Thanks the thousands of African workers doing the boring repetitive dataset work.

By @lakfha - 9 months

So you can track "objects" like people in a shopping mall and they are releasing it as a commoditized complement.

What are possible scenarios?

1) Advertisers build tracking systems in supermarkets and malls and Facebook sells them the profiles of the victims.

2) Facebook is getting people hooked on the "free" models but will later release a better and fully closed model with tracking software so advertisers have to purchase both profiles and the software.

3) Facebook releases the model in order to create goodwill among (a subset of) the developer population and normalize tracking. The subset of developers will also act as useful idiots in the EU and promote tracking and corporate power.

Meta introduces Segment Anything Model 2

Related

Video annotator: a framework for efficiently building video classifiers

Big tech wants to make AI cost nothing

TreeSeg: Hierarchical Topic Segmentation of Large Transcripts

Sam 2: Segment Anything in Images and Videos

Sam 2: The next generation of Meta Segment Anything Model

Related

Video annotator: a framework for efficiently building video classifiers

Big tech wants to make AI cost nothing

TreeSeg: Hierarchical Topic Segmentation of Large Transcripts

Sam 2: Segment Anything in Images and Videos

Sam 2: The next generation of Meta Segment Anything Model