July 29th, 2024

Trillion-Parameter Sequential Transducers for Generative Recommendations

A new paper introduces HSTU, a trillion-parameter architecture for generative recommendations, outperforming existing models significantly in efficiency and effectiveness, with potential implications for large-scale applications and reduced carbon footprint.

Read original article

The paper titled "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" presents a novel approach to large-scale recommendation systems, which often struggle with high cardinality and heterogeneous features. The authors, led by Jiaqi Zhai, propose a new architecture called HSTU, which reformulates recommendation tasks as sequential transduction problems within a generative modeling framework. This architecture is designed to efficiently handle non-stationary streaming recommendation data. The results indicate that HSTU significantly outperforms existing models, achieving up to 65.8% improvement in NDCG and being 5.3x to 15.2x faster than FlashAttention2-based Transformers on long sequences. With 1.5 trillion parameters, HSTU-based Generative Recommenders have shown a 12.4% improvement in online A/B tests and have been deployed across various platforms with billions of users. The study highlights that the model quality scales as a power-law of training compute, which could reduce the carbon footprint associated with future model developments. This research paves the way for foundational models in recommendation systems, potentially transforming how recommendations are generated in large-scale applications. The paper includes 26 pages and 13 figures, and the code is made available for further exploration.

Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs

The study presents a method to boost Large Language Models' retrieval and reasoning abilities for long-context inputs by fine-tuning on a synthetic dataset. Results show significant improvements in information retrieval and reasoning skills.

The Illustrated Transformer

Jay Alammar's blog explores The Transformer model, highlighting its attention mechanism for faster training. It outperforms Google's NMT in some tasks, emphasizing parallelizability. The blog simplifies components like self-attention and multi-headed attention for better understanding.

Math Behind Transformers and LLMs

This post introduces transformers and large language models, focusing on OpenGPT-X and transformer architecture. It explains language models, training processes, computational demands, GPU usage, and the superiority of transformers in NLP.

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

The paper introduces Test-Time Training (TTT) layers for sequence modeling, featuring linear complexity and self-supervised learning for training on test sequences. TTT-Linear outperforms Transformer, while TTT-MLP shows potential for long contexts.

XLSTMTime: Long-Term Time Series Forecasting with xLSTM

The paper introduces xLSTM, an architecture for long-term time series forecasting. It addresses transformer model challenges, showing superior performance in real-world datasets. Authors: Musleh Alharthi and Ausif Mahmood.

0 comments

Trillion-Parameter Sequential Transducers for Generative Recommendations

Related

Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs

The Illustrated Transformer

Math Behind Transformers and LLMs

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

XLSTMTime: Long-Term Time Series Forecasting with xLSTM

Related

Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs

The Illustrated Transformer

Math Behind Transformers and LLMs

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

XLSTMTime: Long-Term Time Series Forecasting with xLSTM