August 29th, 2024

100M Token Context Windows

Magic has developed LTM models that process 100 million tokens, enhancing software applications. Their efficient LTM-2-mini model, new HashHop method, and partnership with Google Cloud support AI advancements.

Read original articleLink Icon
100M Token Context Windows

Magic has made significant advancements in ultra-long context models, specifically with their new LTM (Long-Term Memory) models capable of processing up to 100 million tokens during inference. This development aims to enhance applications in software development, allowing models to utilize extensive code, documentation, and libraries effectively. The company has introduced a new evaluation method called HashHop, which improves the model's ability to store and retrieve information without relying on traditional semantic hints. Magic has successfully trained its first 100M token context model, LTM-2-mini, which is significantly more efficient than existing models like Llama 3.1, requiring much less computational power. Additionally, Magic has partnered with Google Cloud to build advanced supercomputers, enhancing their model training and deployment capabilities. The company has raised $465 million in funding, including a recent $320 million investment from notable investors. Magic is focused on improving inference-time compute and is actively hiring engineers and researchers to support their growth and development in AI technology.

- Magic's LTM models can process up to 100 million tokens, enhancing software development applications.

- The new HashHop evaluation method improves information retrieval without traditional semantic hints.

- LTM-2-mini is significantly more efficient than existing models, requiring less computational power.

- Magic has partnered with Google Cloud to build advanced supercomputers for AI model training.

- The company has raised $465 million in funding and is hiring to accelerate its AI development efforts.

Link Icon 8 comments
By @shazami - 6 months
FYI wouldn't interview here. Got rejected after a 30 minute behavioral screen after spending 8 hours on an unpaid take-home.
By @dinobones - 6 months
Long context windows are IMO, “AGI enough.”

100M context window means it can probably store everything you’ve ever told it for years.

Couple this with multimodal capabilities, like a robot encoding vision and audio into tokens, you can get autonomous assistants than learn your house/habits/chores really quickly.

By @smusamashah - 6 months
It should be benchmarked against something like RULER[1]

1: https://github.com/hsiehjackson/RULER (RULER: What’s the Real Context Size of Your Long-Context Language Models)

By @fsndz - 6 months
Context windows are becoming larger and larger, and I anticipate more research focusing on this trend. Could this signal the eventual demise of RAG? Only time will tell. I recently experimented with RAG and the limitations are often surprising (https://www.lycee.ai/blog/rag-fastapi-postgresql-pgvector). I wonder if we will see some of the same limitations for long context LLM. In context learning is probably a form of semantic / lexical cues based arithmetic.
By @Sakos - 6 months
I was wondering how they could afford 8000 H100’s, but I guess I accidentally skipped over this part:

> We’ve raised a total of $465M, including a recent investment of $320 million from new investors Eric Schmidt, Jane Street, Sequoia, Atlassian, among others, and existing investors Nat Friedman & Daniel Gross, Elad Gil, and CapitalG.

Yeah, I guess that'd do it. Who are these people and how'd they convince them to invest that much?

By @anonzzzies - 6 months
What is the state of art on context on open models? Magic won't be open I guess after getting 500m in VC money.
By @samber - 6 months
Based on Mamba ?
By @htrp - 6 months
does anyone have a detailed tech breakdown of these guys? not quite sure how their LTM architecture works.