November 28th, 2024

Laser: Attention with Exponential Transformation

The paper "LASER: Attention with Exponential Transformation" presents a new attention mechanism that enhances gradient signals in Transformers, improving performance in various tasks and is under review for ICLR 2025.

Read original articleLink Icon
Laser: Attention with Exponential Transformation

The paper titled "LASER: Attention with Exponential Transformation" introduces a new attention mechanism designed to enhance the performance of Transformers in sequence-related tasks. The authors, Sai Surya Duvvuri and Inderjit S. Dhillon, analyze the limitations of the traditional softmax-based dot-product attention, particularly its tendency to produce small gradient signals during backpropagation, which can hinder effective learning. To address this issue, they propose the LASER attention mechanism, which is shown to provide a larger gradient signal and can be easily integrated into existing attention frameworks. Experimental results demonstrate that LASER significantly improves the performance of autoregressive large language models (LLMs) with up to 2.2 billion parameters, yielding an average improvement of approximately 1% and up to 3.38% in downstream evaluations. Notable enhancements include a 4.67% accuracy increase in Vision Transformers on Imagenet, a 2.25% reduction in error rate for Conformer on Librispeech speech-to-text tasks, and a 0.93% decrease in incorrect predictions in BERT models. The paper is currently under review for ICLR 2025.

- LASER is a new attention mechanism that improves gradient signal strength in Transformers.

- The mechanism can be implemented with minor modifications to existing attention systems.

- Experimental results show significant performance improvements across various tasks, including vision and speech.

- The paper is under review for ICLR 2025.

Link Icon 0 comments