Trying Kolmogorov-Arnold Networks in Practice
Recent interest in Kolmogorov-Arnold networks (KANs) stems from claims of improved accuracy and faster training. However, practical testing revealed that despite matching neural networks' performance, KANs require complex implementation and tuning. Despite efforts to optimize KANs, simpler neural networks consistently outperformed them. Alternative activation functions were explored, leading to the conclusion that neural networks are more effective with less effort. While KANs may excel in niche cases, neural networks remain a stronger default choice, emphasizing the value of exploring alternatives for AI advancements.
Read original articleThere has been recent interest in Kolmogorov-Arnold networks (KANs) due to claims of improved accuracy and faster training compared to traditional neural networks. A practitioner decided to test KANs and found that while they can match neural networks' performance with the same parameters, they require significant tuning and complex implementation. KANs focus on learning activation functions rather than weights, with B-Splines being a common choice. Implementing KANs involved challenges like tuning, bias addition, and using learnable weight vectors. Despite efforts to optimize KANs with techniques like base functions and spline weights, simpler neural networks consistently outperformed them. The practitioner experimented with alternative activation functions and concluded that neural networks were more effective with less effort. While KANs may excel in specific niche cases, the simplicity and performance of neural networks make them a stronger default choice. Investigating alternatives to traditional neural networks remains valuable for potential advancements in AI capabilities.
Related
Why We're Deeply Invested in Making AI Better at Math Tutoring
Khan Academy is advancing AI for math tutoring with Khanmigo, aiming to mimic human tutors. Despite some errors, efforts continue to improve tutoring with tools like calculators, GPT-4 Turbo, and GPT-4o models. They prioritize enhancing AI's tutoring capabilities and sharing insights with the education community.
Whats better: Neural nets wider with less layers or thinner with more layers
Experiments compared Transformer models with varying layer depths and widths. Optimal performance was achieved with a model featuring four layers and an embedding dimension of 1024. Balancing layer depth and width is crucial for efficiency and performance improvement.
My finetuned models beat OpenAI's GPT-4
Alex Strick van Linschoten discusses his finetuned models Mistral, Llama3, and Solar LLMs outperforming OpenAI's GPT-4 in accuracy. He emphasizes challenges in evaluation, model complexities, and tailored prompts' importance.
Analysing 16,625 papers to figure out where AI is headed next (2019)
MIT Technology Review analyzed 16,625 AI papers, noting deep learning's potential decline. Trends include shifts to machine learning, neural networks' rise, and reinforcement learning growth. AI techniques cycle, with future dominance uncertain.
My Python code is a neural network
Neural networks are explored for identifying program code in engineering messages. Manual rules and a Python classifier are discussed, with a suggestion to use a recurrent neural network for automated detection.
[1] https://arxiv.org/pdf/2404.19756 - "Both MLPs and KANs are trained with LBFGS for 1800 steps in total."
[2] https://en.wikipedia.org/wiki/Limited-memory_BFGS
(Quasi-)Newton methods approximate learning rate using local curvature which gradient-based methods do not do.The post relies on Tinygrad because it is familiar to author and author tinkers with batch size and learning rate, but not with optimizer itself.
I think that even line search for minimum on the direction of the batch gradient can provide most of the benefits of LBFGS. It is easy to implement.
I think it’s probably worth clarifying a little here that a Bspline is essentially a little MLP, where, at least for uniform Bsplines, the depth is equal to the polynomial degree of the spline. (That’s also the width of the first layer.)
So those two network diagrams are only superficially similar, but the KAN is actually a much bigger network if degree > 1 for the splines. I wonder if that contributed to the difficulty of training it. It is possible some of the “code smell” you noticed and got rid of is relatively important for achieving good results. I’d guess the processes for normalizing inputs and layers of a KAN need to be a bit different than for standard nets.
And you can choose which ones to invert automatically using the free+Free https://invertornot.com/ API - IoN will correctly return that eg https://i.ameo.link/caa.png (and the other two) should be inverted.
""" ... the most significant factor controlling performance is just parameter count. """
""" No matter what I did, the most simple neural network was still outperforming the fanciest KAN-based model I tried. """
I suspected this was the case when I first heard about KANs. Its nice to see someone diving into a bit more, even if it is just anecdotal.
Related
Why We're Deeply Invested in Making AI Better at Math Tutoring
Khan Academy is advancing AI for math tutoring with Khanmigo, aiming to mimic human tutors. Despite some errors, efforts continue to improve tutoring with tools like calculators, GPT-4 Turbo, and GPT-4o models. They prioritize enhancing AI's tutoring capabilities and sharing insights with the education community.
Whats better: Neural nets wider with less layers or thinner with more layers
Experiments compared Transformer models with varying layer depths and widths. Optimal performance was achieved with a model featuring four layers and an embedding dimension of 1024. Balancing layer depth and width is crucial for efficiency and performance improvement.
My finetuned models beat OpenAI's GPT-4
Alex Strick van Linschoten discusses his finetuned models Mistral, Llama3, and Solar LLMs outperforming OpenAI's GPT-4 in accuracy. He emphasizes challenges in evaluation, model complexities, and tailored prompts' importance.
Analysing 16,625 papers to figure out where AI is headed next (2019)
MIT Technology Review analyzed 16,625 AI papers, noting deep learning's potential decline. Trends include shifts to machine learning, neural networks' rise, and reinforcement learning growth. AI techniques cycle, with future dominance uncertain.
My Python code is a neural network
Neural networks are explored for identifying program code in engineering messages. Manual rules and a Python classifier are discussed, with a suggestion to use a recurrent neural network for automated detection.