July 6th, 2024

New AI Training Technique Is Drastically Faster, Says Google

Google's DeepMind introduces JEST, a new AI training technique speeding up training by 13 times and boosting efficiency by 10 times. JEST optimizes data selection, reducing energy consumption and improving model effectiveness.

Read original articleLink Icon
New AI Training Technique Is Drastically Faster, Says Google

Google's DeepMind researchers have introduced a new AI training technique called JEST, which accelerates the training process by up to 13 times and increases efficiency by 10 times. This approach aims to reduce the computational resources and time required for AI training, potentially leading to lower energy demands. The AI industry is known for its high energy consumption, with large-scale AI systems like ChatGPT demanding significant processing power and water for cooling. The JEST method optimizes data selection for training, reducing the number of iterations and computational power needed, which could ultimately lower overall energy consumption. By selecting complementary batches of data to enhance the AI model's learnability, JEST proves to be more effective than traditional methods. This technique, based on multimodal contrastive learning, identifies dependencies between data points and focuses on high-quality datasets, improving training efficiency. If implemented at scale, JEST could enable AI trainers to create more powerful models with fewer resources, potentially mitigating the environmental impact of AI development.

Related

AI Is Already Wreaking Havoc on Global Power Systems

AI Is Already Wreaking Havoc on Global Power Systems

AI's rapid growth strains global power grids as data centers expand to meet energy demands. Major tech firms aim for green energy, but challenges persist in balancing AI's energy hunger with sustainability goals.

Taking a closer look at AI's supposed energy apocalypse

Taking a closer look at AI's supposed energy apocalypse

Artificial intelligence's impact on energy consumption in data centers is debated. Current data shows AI's energy use is a fraction of overall consumption, with potential growth by 2027. Efforts to enhance efficiency are crucial.

Taking a closer look at AI's supposed energy apocalypse

Taking a closer look at AI's supposed energy apocalypse

Artificial intelligence's energy impact, particularly in data centers, is debated. AI's energy demands are significant but only a fraction of overall data center consumption. Efforts to enhance AI cost efficiency could mitigate energy use concerns.

JEPA (Joint Embedding Predictive Architecture)

JEPA (Joint Embedding Predictive Architecture)

Yann LeCun's Joint Embedding Predictive Architecture (JEPA) enhances AI by emphasizing world models, self-supervised learning, and abstract representations. JEPA predicts future states by transforming inputs into abstract representations, handling uncertainty, and enabling complex predictions through multistep or hierarchical structures. Several models like I-JEPA, MC-JEPA, and V-JEPA have been developed to process visual data and improve AI's understanding of images and videos, moving towards human-like interaction with the world.

Can the climate survive the insatiable energy demands of the AI arms race?

Can the climate survive the insatiable energy demands of the AI arms race?

Google's emissions spike 50% in 5 years due to AI energy needs, posing climate challenges. Tech firms invest in renewables, but face infrastructure hurdles. AI advancements may paradoxically drive energy consumption.

Link Icon 6 comments
By @vessenes - 3 months
So the paper itself is pretty significant, I think, from looking at it. The general methodology seems to be: train small model as a discriminatory scoring model on very high quality data (JEST is mostly concerned with multi-modal tasks it seems, so think image/text caption pairs), have that model score ‘maximally learnable’ batches on a larger / lower quality dataset, then train the big model using the scoring.

This turns out to be significant FLOPs and quality win, even counting for the initial model training and scoring part of it, they claim roughly 10x for quality/FLOP tradeoffs, and they show some significantly beating SOTA numbers for some tasks in their model size.

The bad part, to me, is that this is some significant engineering — it requires known high quality datasets, training of the scoring model, selection and scoring of the data for the big training run - this is not a bold new leap that’s going to be easy to implement for hobbyists - this is a practitioner’s excellent engineering showing the way forward for certain training needs.

As always, appreciate the publishing from DeepMind - this looks like great work. It would be nice to see a company like together.ai or others get it actionized into a pipeline; it might be a bit, though. It looks relatively gnarly in the details on the data and scoring side.

By @morbicer - 3 months
Nice. Google scientists come up with ground breaking idea, then Google's PM bungles the chance to bring it to the market and productize it and someone like OpenAI or Anthropic will swoop in to reap the rewards. And the cycle repeats.

Deep Mind people invent transformers and then they watch people laugh at Bard or what it's called nowadays because product and engineering lost the plot. Kodak is paging you some message from the grave, read it Google.

By @eutropia - 3 months
By @kelseyfrog - 3 months
Great, improvements in efficiency will lead to greater resource consumption due to Jevons Paradox[1].

1. https://en.wikipedia.org/wiki/Jevons_paradox

By @ricopags - 3 months
Pretty similar to cappy https://arxiv.org/abs/2311.06720
By @swax - 3 months
AI advancement is coming at us both ways - orders of magnitude more compute, with orders of magnitude more efficiency. Hyper exponential.