December 12th, 2024

AI Scaling Laws

The article examines AI scaling laws, emphasizing ongoing investments by major labs, the importance of new paradigms for model performance, and the need for better evaluations amid existing challenges.

Read original articleLink Icon
AI Scaling Laws

The article discusses the current state and future of AI scaling laws, particularly in relation to large language models (LLMs) and the challenges faced in pre-training and inference. Despite skepticism surrounding the effectiveness of scaling laws, major AI labs and hyperscalers continue to invest heavily in infrastructure, indicating a belief in the ongoing relevance of these laws. The authors highlight the importance of new paradigms in scaling, such as reasoning models and synthetic data generation, which are proving to be effective in enhancing model performance. They also address the limitations of existing benchmarks and the need for more rigorous evaluations to measure progress accurately. The report emphasizes that while challenges exist, including data scarcity and the need for improved training methodologies, the overall trajectory of AI development remains positive. The authors argue that scaling will continue to evolve, drawing parallels to the historical advancements in computing, and suggest that the industry is on the brink of significant breakthroughs in AI capabilities.

- Major AI labs are investing heavily in infrastructure, indicating confidence in scaling laws.

- New paradigms like reasoning models and synthetic data generation are crucial for improving model performance.

- Existing benchmarks are inadequate, necessitating the development of more rigorous evaluations.

- Challenges such as data scarcity and training methodologies need to be addressed for continued progress.

- The evolution of scaling laws parallels historical advancements in computing, suggesting future breakthroughs in AI.

Link Icon 1 comments
By @tikkun - 2 months
SemiAnalysis consistently does deep technical posts like this. Worth subscribing.

My notes:

Scaling is continuing. Amazon's 400k trainium2 chips, Meta's 2gw datacenter, OpenAI's multi-datacenter training.

Opus 3.5 training succeeded. But it's a more profitable decision to use it to train Sonnet 3.5 and serve that instead. Large models are now teachers, not necessarily end products. Too expensive to serve to end users vs what they'll pay, but great for improving smaller models that are cheaper and faster to serve.

Orion (GPT-5) is being used for training data generation and in verifier/reward models. They say it's not economical to serve to end users until Blackwell chips (B200).

Models that can explore reasoning chains get smarter on certain kinds of problems. [My note, not from article: Math, science, law, programming. R&D, law and programming are perhaps the industries that are willing to pay more for higher reliability.]

Scaling with "berry training" - monte carlo tree search generating thousands of different answer trajectories, then uses functional verifiers to get rid of the ones that didn't arrive at the correct answer.

Big focus is on making inference cheaper and faster. [My note: If you want to work in AI, I imagine any research on LLM inference cost and speed will be highly valuable.]