Meta introduces Segment Anything Model 2
Meta has launched the Segment Anything Model 2 (SAM 2) for segmenting objects in images and videos, featuring real-time processing, zero-shot performance, and open-sourced resources for enhanced user interaction.
Read original articleMeta has introduced the Segment Anything Model 2 (SAM 2), a unified model designed for segmenting objects in both images and videos. SAM 2 allows users to select objects using various input methods, such as clicks or masks, and can track these objects across video frames. The model demonstrates strong zero-shot performance, meaning it can effectively segment objects not seen during training, making it applicable in diverse real-world scenarios. SAM 2 is optimized for real-time processing, enabling interactive applications with minimal user input.
The architecture of SAM 2 includes a per-session memory module that retains information about selected objects, allowing for continuous tracking even if the object is temporarily obscured. This model was trained on a large dataset, the Segment Anything Video Dataset (SA-V), which includes over 600,000 object masks from approximately 51,000 videos, ensuring geographic and contextual diversity.
Meta is open-sourcing the pretrained SAM 2 model, the SA-V dataset, and providing a demo for public use. The model's outputs can be integrated with other AI systems for advanced video editing capabilities. SAM 2 aims to enhance user interaction with video content and supports future extensions for creative real-time applications. The initiative reflects Meta's commitment to transparency and collaboration within the research community.
Related
Video annotator: a framework for efficiently building video classifiers
The Netflix Technology Blog presents the Video Annotator (VA) framework for efficient video classifier creation. VA integrates vision-language models, active learning, and user validation, outperforming baseline methods with an 8.3 point Average Precision improvement.
Big tech wants to make AI cost nothing
Meta has open-sourced its Llama 3.1 language model for organizations with fewer than 700 million users, aiming to enhance its public image and increase product demand amid rising AI infrastructure costs.
TreeSeg: Hierarchical Topic Segmentation of Large Transcripts
Augmend is creating a platform to automate tribal knowledge for development teams, featuring the TreeSeg algorithm, which segments session data into chapters by analyzing audio transcriptions and semantic actions.
Sam 2: Segment Anything in Images and Videos
The GitHub repository for Segment Anything Model 2 (SAM 2) by Facebook Research enhances visual segmentation with real-time video processing, a large dataset, and APIs for image and video predictions.
Sam 2: The next generation of Meta Segment Anything Model
Meta has launched SAM 2, a real-time object segmentation model for images and videos, enhancing accuracy and reducing interaction time. It supports diverse applications and is available under an Apache 2.0 license.
I'm sure this, LLAMA and the other projects that they have will help drive up new creations, companies and progress.
I'm also sure that this kind of openly sharing code and research will drive up business value for them. It may be hard to say right now what it is, but I'm sure it will.
That's the difference of a founder-led company vs. market led. Google is mostly concerned on short-term goals so they don't report a single bad quarter or has too high CAPEX on a project without profit on its sight (like VR).
Once Meta finds the killer app for VR, all the other companies will be so many years behind that they will need to buy software from Meta or not take any market share in this new space. Similar to what happened about AI chips and Nvidia. Nobody was investing enough on it.
Regardless of how actually successful their attempts at VR and AI end up they already going to have some place in history for that.
The Roto Brush in After Effects is similar, but its quality has always been lacking and it takes forever to process.
What are possible scenarios?
1) Advertisers build tracking systems in supermarkets and malls and Facebook sells them the profiles of the victims.
2) Facebook is getting people hooked on the "free" models but will later release a better and fully closed model with tracking software so advertisers have to purchase both profiles and the software.
3) Facebook releases the model in order to create goodwill among (a subset of) the developer population and normalize tracking. The subset of developers will also act as useful idiots in the EU and promote tracking and corporate power.
Related
Video annotator: a framework for efficiently building video classifiers
The Netflix Technology Blog presents the Video Annotator (VA) framework for efficient video classifier creation. VA integrates vision-language models, active learning, and user validation, outperforming baseline methods with an 8.3 point Average Precision improvement.
Big tech wants to make AI cost nothing
Meta has open-sourced its Llama 3.1 language model for organizations with fewer than 700 million users, aiming to enhance its public image and increase product demand amid rising AI infrastructure costs.
TreeSeg: Hierarchical Topic Segmentation of Large Transcripts
Augmend is creating a platform to automate tribal knowledge for development teams, featuring the TreeSeg algorithm, which segments session data into chapters by analyzing audio transcriptions and semantic actions.
Sam 2: Segment Anything in Images and Videos
The GitHub repository for Segment Anything Model 2 (SAM 2) by Facebook Research enhances visual segmentation with real-time video processing, a large dataset, and APIs for image and video predictions.
Sam 2: The next generation of Meta Segment Anything Model
Meta has launched SAM 2, a real-time object segmentation model for images and videos, enhancing accuracy and reducing interaction time. It supports diverse applications and is available under an Apache 2.0 license.