October 7th, 2024

Sorbet: A neuromorphic hardware-compatible transformer-based spiking model

The paper presents Sorbet, a neuromorphic transformer-based language model focused on energy efficiency for resource-constrained environments, utilizing innovative techniques like PTsoftmax and BSPN to enhance performance and reduce energy consumption.

Read original article

Sorbet: A neuromorphic hardware-compatible transformer-based spiking model

The paper titled "Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model" introduces a new language model designed for deployment in resource-constrained environments, focusing on energy efficiency. The authors, Kaiwen Tang, Zhanglu Yan, and Weng-Fai Wong, highlight the challenges of implementing key operations like softmax and layer normalization in spiking neural networks (SNNs), which are essential for transformer-based models. To overcome these issues, Sorbet employs a novel shifting-based softmax method called PTsoftmax and a power normalization technique using bit-shifting (BSPN). These innovations aim to replace traditional energy-intensive operations. The model also utilizes knowledge distillation and model quantization to create a highly compressed binary weight model that retains competitive performance while significantly reducing energy consumption. The effectiveness of Sorbet is validated through extensive testing on the GLUE benchmark and various ablation studies, showcasing its potential as an energy-efficient solution for language model inference.

- Sorbet is designed for resource-constrained devices, emphasizing energy efficiency.

- It introduces PTsoftmax and BSPN to address challenges in implementing softmax and layer normalization on SNNs.

- The model achieves a highly compressed binary weight format through knowledge distillation and quantization.

- Extensive testing on the GLUE benchmark demonstrates Sorbet's competitive performance.

- The research highlights the potential of neuromorphic hardware for language model applications.

Researchers run high-performing LLM on the energy needed to power a lightbulb

Researchers at UC Santa Cruz developed an energy-efficient method for large language models. By using custom hardware and ternary numbers, they achieved high performance with minimal power consumption, potentially revolutionizing model power efficiency.

Efficient Execution of Structured Language Model Programs

SGLang is a new system for executing complex language model programs, featuring a frontend language and runtime optimizations. It offers significant throughput improvements and is publicly available for further exploration.

Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers

Abhi and Alex from deepsilicon are developing custom silicon for ternary transformer models to enhance performance, reduce hardware demands, and improve efficiency, while seeking feedback on their approach and deployment interest.

Fine-Tuning LLMs to 1.58bit

BitNet introduces extreme quantization for large language models, achieving 1.58 bits per parameter, enhancing efficiency and performance, particularly in fine-tuning Llama3 8B models while integrating into existing frameworks.

MIT Researchers Unveil New Method to Improve LLM Inference Performance

A new algorithm, L-Mul, approximates floating point multiplication using integer addition, reducing energy costs by up to 95% while maintaining precision, potentially enhancing the sustainability of language models.

6 comments

By @magicalhippo - 7 months

I found this[1] article to give a nice overview over spiking neural networks and their connections to the more "traditional" neural networks of modern fame.

In particular the connection between the typical weighted-sum plus activation function and a simplistic spiking model where one considers the output simply by the spiking rate was illuminating (section 3).

[1]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9313413/ Spiking Neural Networks and Their Applications: A Review

By @satvikpendem - 7 months

I wonder how well this model can typecheck Ruby code.

By @krasin - 7 months

There's no code or weights released => no way to reproduce their results.

By @evanwolf - 7 months

sometimes it seems folks are just making up words.

By @allendave6945 - 7 months

Hello, as a newbie to cryptocurrency trading, I lost a lot of money trying to navigate the market on my own, then in my search for a genuine and trusted trader/broker, i came across Trader Bernie Doran who guided and helped me retrieve my lost cryptocurrencies and I made so much profit up to the tune of $60,000. I made my first investment with $2,000 and got a ROI profit of $25,000 in less than 2 week. You can contact this expert trader Mr Bernie Doran via Gmail : BERNIEDORANSIGNALS@GMAIL.COM or WhatsApp +14242850682 and be ready to share your experience , tell him I referred you

By @remon - 7 months

I definitely know what "A" and "model" means.

Sorbet: A neuromorphic hardware-compatible transformer-based spiking model

Related

Researchers run high-performing LLM on the energy needed to power a lightbulb

Efficient Execution of Structured Language Model Programs

Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers

Fine-Tuning LLMs to 1.58bit

MIT Researchers Unveil New Method to Improve LLM Inference Performance

Related

Researchers run high-performing LLM on the energy needed to power a lightbulb

Efficient Execution of Structured Language Model Programs

Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers

Fine-Tuning LLMs to 1.58bit

MIT Researchers Unveil New Method to Improve LLM Inference Performance