August 4th, 2024

Self-Compressing Neural Networks

The paper "Self-Compressing Neural Networks" presents a method to reduce neural network size, maintaining accuracy while using only 3% of bits and 18% of weights, accepted for the 2023 DL-Hardware conference.

Read original article

CuriosityExcitementSkepticism

The paper titled "Self-Compressing Neural Networks" by Szabolcs Cséfalvay and James Imber addresses the challenge of reducing the size of neural networks, which significantly impacts execution time, power consumption, bandwidth, and memory usage. The authors introduce a method called Self-Compression, which aims to achieve two main objectives: eliminating redundant weights and minimizing the number of bits needed to represent the remaining weights. This is accomplished through a generalized loss function designed to reduce the overall size of the network. Experimental results indicate that it is possible to maintain floating-point accuracy while retaining only 3% of the bits and 18% of the weights in the network. The work has been accepted for presentation at the 2023 DL-Hardware Co-Design for AI Acceleration conference, highlighting its relevance in the field of machine learning and artificial intelligence. The proposed method offers a straightforward approach to optimizing neural networks without requiring specialized hardware, making it accessible for efficient training and inference.

Researchers run high-performing LLM on the energy needed to power a lightbulb

Researchers at UC Santa Cruz developed an energy-efficient method for large language models. By using custom hardware and ternary numbers, they achieved high performance with minimal power consumption, potentially revolutionizing model power efficiency.

Researchers upend AI status quo by eliminating matrix multiplication in LLMs

Researchers innovate AI language models by eliminating matrix multiplication, enhancing efficiency. A MatMul-free method reduces power consumption, costs, and challenges the necessity of matrix multiplication in high-performing models.

Training of Physical Neural Networks

Physical Neural Networks (PNNs) leverage physical systems for computation, offering potential in AI. Research explores training larger models for local inference on edge devices. Various training methods are investigated, aiming to revolutionize AI systems by considering hardware physics constraints.

A beginner's guide to LLM quantization and testing

Quantization in machine learning involves reducing model parameters to lower precision for efficiency. Methods like GGUF are explored, impacting model size and performance. Extreme quantization to 1-bit values is discussed, along with practical steps using tools like Llama.cpp for optimizing deployment on various hardware.

Accuracy is Not All You Need

The study "Accuracy is Not All You Need" highlights the limitations of solely relying on accuracy to evaluate compressed Large Language Models (LLMs). It suggests incorporating metrics like KL-Divergence and flips for a more thorough assessment.

AI: What people are saying

The comments on the paper "Self-Compressing Neural Networks" reflect a range of perspectives and insights on the topic of neural network optimization and compression.

Several commenters highlight the importance of prior work in neural network sparsity and optimization that the paper may not have referenced.
There is a shared interest in the potential for self-organizing models that adapt their size and structure based on the task at hand.
Some express excitement about the implications of this research for mimicking biological processes, such as neuroplasticity.
Questions arise regarding the advantages of this method compared to traditional post-training compression techniques.
Commenters speculate on the future of neural networks and their potential impact on AGI development.

14 comments

By @w-m - 8 months

Using as little computational resources (memory and/or FLOPS) as possible as an additional optimization criterion when training NNs is an interesting avenue. I think the current state of pre-trained model families is weird. Take Llama 3.1 or Segment Anything 2: you get tiny/small/medium/larger/huge models, where for each tier the model size was predefined, and they are trained somewhat (completely?) independently. This feels iffy, patchy, and like we haven't really arrived yet.

I'd want a model that scales up and down depending on the task given at inference, and a model that doesn't have a fixed size when starting the training. Shouldn't it specialize over training progress, when seeing more tokens, and grow larger where needed? Without some human fixing a size beforehand?

Self-organization is a fascinating topic to me. This last year I've been working on Self-Organizing Gaussian Splats [0]. With a lot of squinting, this lives in a similar space as the Self-Compressing Neural Networks from the link above. The idea of the Gaussians was to build on Self-Organizing Maps (lovely 90s concept, look for some GIFs if you don't know it), and use that to represent 3D scenes in a memory-efficient way. By mapping attributes into a locally smooth 2D grid. It's quite a simple algorithm, but works really well, and better than many quite complicated coding schemes. So this has me excited that we'll (re-)discover great methods in this space in the near future.

[0]: https://fraunhoferhhi.github.io/Self-Organizing-Gaussians/

By @throwup238 - 8 months

I think this might be the first step to making neural networks that actually mimic biological brains. IMO the biggest piece missing from NN architectures is a mechanism like neuroplasticity that modifies the topology of neurons. Brains reorganize themselves around the things they learn.

This paper is a long way from implementing synaptic pruning/strengthening/weakening, neurogenesis, or synaptogenesis but it’s the first one I’ve seen where the network is self optimizing.

By @dpkingma - 8 months

There seems to be some relevant prior work that is not referenced by this paper, such as our work on training sparse neural networks:

https://arxiv.org/abs/1712.01312

Abstract: "We propose a practical method for L0 norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero. Such regularization is interesting since (1) it can greatly speed up training and inference, and (2) it can improve generalization. [...]"

By @mlajtos - 8 months

That is pretty cool. I found a follow up work that applies this technique to LLM: https://konczer.github.io/doc/Poster_EEML23_JozsefKonczer.pd...

By @Version467 - 8 months

This is super cool. It's surprising to me that it took so long for someone to try this. It seems like such an obvious idea (in hindsight). But I guess that's easy to say now that someone came up with it. If this turns out to work well even for much larger models, then we might see loss functions that incorporate ever more specific performance metrics, conceivably even actual execution times on specific hardware.

By @spacemanspiff01 - 8 months

So this was published a year and a half ago? Is there a reason it did not catch on?

By @szcs - 8 months

Author here, I just noticed this. If you have any questions I can try answering them.

By @adipandas - 8 months

Beautiful idea.

My take on it: I find it difficult to generalize the notion of layer removal when the bit depth of that layer goes to zero. It's wouldn't be straight forward although the authors provide equation 5. It feels like lot of information is missing in this work to even reproduce it. And authors do only 1 case study.

I believe some implementation is required to understand the authors completely. Example, optimizer modification for layer when it is removed in training.

By @bilsbie - 8 months

dynamic quantization-aware training that puts size (in bytes) of the model in the loss

By @smusamashah - 8 months

Blog post on the subject by author https://blog.imaginationtech.com/self-compressing-neural-net...

By @Gettingolderev - 8 months

Anyone knows if re-ordering also helps here?

If i can sort the lines of the matrix, which is probably defined by how the token embedding is setup, i could potentially zero out weights which do not contribute at all and have areas of zeros i could mark and skip?

By @octocop - 8 months

What advantages does this have over applying neural network compression methods after training?

By @alecco - 8 months

(Jan 2023)

By @andrewflnr - 8 months

This kind of thing, much more than LLMs, makes me worry about AGI takeoff.

Self-Compressing Neural Networks

Related

Researchers run high-performing LLM on the energy needed to power a lightbulb

Researchers upend AI status quo by eliminating matrix multiplication in LLMs

Training of Physical Neural Networks

A beginner's guide to LLM quantization and testing

Accuracy is Not All You Need

Related

Researchers run high-performing LLM on the energy needed to power a lightbulb

Researchers upend AI status quo by eliminating matrix multiplication in LLMs

Training of Physical Neural Networks

A beginner's guide to LLM quantization and testing

Accuracy is Not All You Need