Accuracy is Not All You Need
The study "Accuracy is Not All You Need" highlights the limitations of solely relying on accuracy to evaluate compressed Large Language Models (LLMs). It suggests incorporating metrics like KL-Divergence and flips for a more thorough assessment.
Read original articleThe paper "Accuracy is Not All You Need" discusses the limitations of using accuracy alone to evaluate compressed Large Language Models (LLMs). While current evaluation methods focus on accuracy, the study reveals that even when accuracy remains similar between baseline and compressed models, there can be significant changes in the model's behavior, leading to incorrect answers. The authors propose the use of additional metrics like KL-Divergence and flips to assess the quality of compressed models more effectively. Through a detailed analysis across various compression techniques, models, and datasets, they demonstrate that the performance of compressed models can differ significantly from baseline models, especially in free-form generative tasks. The study emphasizes the importance of considering user experience and behavior changes in compressed models beyond just accuracy measurements, suggesting a more comprehensive evaluation approach for model compression techniques.