July 6th, 2024

Image Self Supervised Learning on a Shoestring

A new cost-effective approach in machine learning, IJEPA, enhances image encoder training by predicting missing parts internally. Released on GitHub, it optimizes image embeddings, reducing computational demands for researchers.

Read original article

Image Self Supervised Learning on a Shoestring

In the realm of machine learning research, where high computational costs often limit accessibility, a new approach called Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (IJEPA) offers a more cost-effective alternative for training image encoders. By utilizing techniques like random resolution sampling and unique masking strategies, IJEPA aims to provide higher quality image embeddings without the need for complex augmentations or text captions. The method involves training a model to predict missing parts of images, scoring its ability to reconstruct the complete image internally. By releasing code and weights on GitHub, the author demonstrates the implementation of IJEPA on a single GPU machine with specific hardware specifications. Through innovations like token merging and efficient masking and packing strategies, IJEPA-enhanced enhances the training process by reducing sequence lengths and eliminating noisy tokens. This novel approach opens up possibilities for training image models with limited computational resources, offering a promising avenue for researchers seeking to explore low-resource settings in machine learning.

Video annotator: a framework for efficiently building video classifiers

The Netflix Technology Blog presents the Video Annotator (VA) framework for efficient video classifier creation. VA integrates vision-language models, active learning, and user validation, outperforming baseline methods with an 8.3 point Average Precision improvement.

JEPA (Joint Embedding Predictive Architecture)

Yann LeCun's Joint Embedding Predictive Architecture (JEPA) enhances AI by emphasizing world models, self-supervised learning, and abstract representations. JEPA predicts future states by transforming inputs into abstract representations, handling uncertainty, and enabling complex predictions through multistep or hierarchical structures. Several models like I-JEPA, MC-JEPA, and V-JEPA have been developed to process visual data and improve AI's understanding of images and videos, moving towards human-like interaction with the world.

The super effectiveness of Pokémon embeddings using only raw JSON and images

Embeddings are vital in AI, with Pokémon data encoded for comparison. JSON data from a Pokémon API was optimized, generating embeddings for over 1,000 Pokémon. Similarities revealed relationships based on type and generation, showcasing the effectiveness of embeddings in data analysis.

Show HN: AI assisted image editing with audio instructions

The GitHub repository hosts "AAIELA: AI Assisted Image Editing with Language and Audio," a project enabling image editing via audio commands and AI models. It integrates various technologies for object detection, language processing, and image inpainting. Future plans involve model enhancements and feature integrations.

New AI Training Technique Is Drastically Faster, Says Google

Google's DeepMind introduces JEST, a new AI training technique speeding up training by 13 times and boosting efficiency by 10 times. JEST optimizes data selection, reducing energy consumption and improving model effectiveness.

1 comments

By @topwalktown - 10 months

I'm trying to train a variable resolution ViT using IJEPA. I'm currently topping out at about 30% on imagenet1k after training for 20 epochs (6 hours)

It'd be cool to have some help and feedback. I'm on the right track to getting really killer setup that is super fast to train it needs more evaluations and more tuning. Anyone interested?

Image Self Supervised Learning on a Shoestring

Related

Video annotator: a framework for efficiently building video classifiers

JEPA (Joint Embedding Predictive Architecture)

The super effectiveness of Pokémon embeddings using only raw JSON and images

Show HN: AI assisted image editing with audio instructions

New AI Training Technique Is Drastically Faster, Says Google

Related

Video annotator: a framework for efficiently building video classifiers

JEPA (Joint Embedding Predictive Architecture)

The super effectiveness of Pokémon embeddings using only raw JSON and images

Show HN: AI assisted image editing with audio instructions

New AI Training Technique Is Drastically Faster, Says Google