January 13th, 2025

Titans: Learning to Memorize at Test Time

The "Titans" paper presents a neural memory module that enhances attention mechanisms, outperforming Transformers and linear models in tasks requiring large context windows, achieving higher accuracy in various applications.

Read original article

The paper titled "Titans: Learning to Memorize at Test Time" introduces a novel neural long-term memory module designed to enhance the performance of attention mechanisms in machine learning models. Traditional recurrent models compress data into a fixed-size memory, while attention mechanisms capture dependencies across the entire context window but are limited by quadratic costs, restricting context length. The proposed Titans architecture combines short-term attention with long-term memory capabilities, allowing for effective utilization of historical context during inference. The authors present three variants of the Titans architecture, demonstrating its effectiveness in various tasks, including language modeling, common-sense reasoning, genomics, and time series analysis. Experimental results indicate that Titans outperform both Transformers and modern linear recurrent models, achieving higher accuracy in tasks requiring large context windows exceeding 2 million tokens. This advancement suggests a significant improvement in handling complex dependencies and scaling in machine learning applications.

- The Titans architecture integrates short-term attention and long-term memory for improved performance.

- It allows for effective utilization of historical context during inference.

- Experimental results show Titans outperform Transformers and linear recurrent models.

- The architecture can handle context windows larger than 2 million tokens.

- Titans demonstrate higher accuracy in complex tasks requiring extensive context.

Transformer Explainer: An Interactive Explainer of the Transformer Architecture

The Transformer architecture has transformed AI in text generation, utilizing self-attention and advanced features like layer normalization. The Transformer Explainer tool helps users understand its concepts interactively.

Transformer Explainer

The Transformer architecture has transformed AI in text generation, utilizing self-attention and key components like embedding and Transformer blocks, while advanced features enhance performance and stability.

Running LLMs with 3.3M Context Tokens on a Single GPU

DuoAttention enhances long-context LLM inference by optimizing memory and reducing latency, achieving significant memory savings and acceleration in processing while maintaining accuracy, with implementation code available for research.

New LLM optimization technique slashes memory costs up to 75%

Sakana AI has developed a technique called "universal transformer memory," reducing memory costs for large language models by 75% while improving task performance and allowing flexible context optimization.

Titans: Learning to Memorize at Test Time

The paper presents the Titans architecture, which integrates short-term attention and long-term memory, enabling larger context windows and outperforming existing models like Transformers in various tasks while supporting faster training.

2 comments

By @gwern - 3 months

Duplicate: https://news.ycombinator.com/item?id=42718166

By @cs702 - 3 months

Interesting. I like the idea of a meta-mechanism that learns to update an associative memory based on how surprising the data is. The other stuff, reading memory via keys and values and selectively erasing it with gating, look pretty conventional on a first glance. Thank you for sharing this on HN. I've added it to my reading list.

EDIT: I'm reminded of this other type of associative memory: https://github.com/glassroom/heinsen_routing. The idea there is to compute a mixture of memories that best predicts the given input sequence. Quite frankly, I don't remember how the whole thing works, but I do remember that it works. It's been a while since I used it, so YMMV. In any case, it may be of interest to you.

Titans: Learning to Memorize at Test Time

Related

Transformer Explainer: An Interactive Explainer of the Transformer Architecture

Transformer Explainer

Running LLMs with 3.3M Context Tokens on a Single GPU

New LLM optimization technique slashes memory costs up to 75%

Titans: Learning to Memorize at Test Time

Related

Transformer Explainer: An Interactive Explainer of the Transformer Architecture

Transformer Explainer

Running LLMs with 3.3M Context Tokens on a Single GPU

New LLM optimization technique slashes memory costs up to 75%

Titans: Learning to Memorize at Test Time