What's Going on in Machine Learning? Some Minimal Models
Stephen Wolfram highlights the gaps in understanding machine learning's foundational principles, proposing minimal models for better comprehension, and drawing parallels with biological evolution while challenging traditional neural network methodologies.
Read original articleThe exploration of machine learning reveals a significant gap in understanding its foundational principles, despite advancements in engineering and neural network design. Stephen Wolfram discusses the complexity of neural networks, emphasizing that while they can perform impressive tasks, the underlying mechanisms remain largely unexplained. He proposes minimal models that simplify the structure of neural networks, allowing for better visualization and comprehension of essential phenomena. These models suggest that machine learning does not construct structured mechanisms but rather samples from the computational universe's complexity, achieving results through a process influenced by computational irreducibility. This concept implies that while pockets of computational reducibility may exist, a comprehensive narrative explanation of machine learning is unlikely. Wolfram draws parallels between machine learning and biological evolution, noting that both processes aim to optimize performance through adaptive training. He also highlights the potential for more efficient machine learning practices by understanding its core principles. The discussion includes traditional neural networks, mesh neural networks, and the possibility of using discrete systems for machine learning, challenging the notion that real-valued parameters are essential for successful adaptive processes.
- Machine learning's foundational principles remain poorly understood despite engineering advancements.
- Minimal models can simplify neural networks, aiding in visualization and comprehension.
- Computational irreducibility plays a crucial role in the functioning of machine learning systems.
- Machine learning shares similarities with biological evolution in terms of adaptive optimization.
- Discrete systems may effectively perform machine learning tasks, challenging traditional methodologies.
Related
Machine Learning Systems with TinyML
"Machine Learning Systems with TinyML" simplifies AI system development by covering ML pipelines, data collection, model design, optimization, security, and integration. It emphasizes TinyML for accessibility, addressing model architectures, training, inference, and critical considerations. The open-source book encourages collaboration and innovation in AI technology.
Guide to Machine Learning with Geometric, Topological, and Algebraic Structures
The paper discusses the shift in machine learning towards handling non-Euclidean data with complex structures, emphasizing the need to adapt classical methods and proposing a graphical taxonomy to unify recent advancements.
Darwin Machines
The Darwin Machine theory proposes the brain uses evolution to efficiently solve problems. It involves minicolumns competing through firing patterns, leading to enhanced artificial intelligence and creativity through recombination in cortical columns.
The Elegant Math of Machine Learning
Anil Ananthaswamy discusses his book on machine learning, emphasizing the importance of data for neural networks, their ability to approximate functions, and the mathematical elegance behind AI's potential applications.
How to get from high school math to cutting-edge ML/AI
The roadmap for learning deep learning includes four stages: foundational math, classical machine learning, deep learning, and advanced techniques like transformers and large language models, ensuring comprehensive understanding and skills development.
https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...
The part I find most interesting is his proposal that neural networks largely work by “hitching a ride” on fundamental computational complexity, in practice sort of searching around the space of functions representable by an architecture for something that works. And, to the extent this is true, that puts explainability at fundamental odds with the highest value / most dense / best deep learning outputs — if they are easily “explainable” by inspection, then they are likely not using all of the complexity available to them.
I think this is a pretty profound idea, and it sounds right to me — it seems like a rich theoretical area for next-gen information theory, essentially are their (soft/hard) bounds on certain kinds of explainability/inspectability?
FWIW, there’s a reasonably long history of mathematicians constructing their own ontologies and concepts and then people taking like 50 or 100 years to unpack and understand them and figure out what they add. I think of Wolfram’s cellular automata like this, possibly really profound, time will tell, and unusual in that he has the wealth and platform and interest in boosting the idea while he’s alive.
That being said, I’m enjoying this. I often experiment with neural networks in a similar fashion and like to see people’s work like this.
Is this similar to the lottery ticket hypothesis?
Also the visualizations are beautiful and a nice way to demonstrate the "universal approximation theorem"
It feels like a religious talk.
The presentation consists of chunks of hard-to-digest, profound-sounding text followed by a supposedly informative picture with lots of blobs, then the whole pattern is repeated over and over.
But it never gets to the point. There is never an outcome, never a summary. It is always some sort of patterns and blobs that are supposedly explaining everything ... except nothing useful is ever communicated. You are supposed to "see" how the blobs are "everything..." a new kind of Science.
He cannot predict anything; he can not forecast anything; all he does is use Mathematica to generate multiplots of symmetric little blobs and then suggests that those blobs somehow explain something that currently exists
I find these Wolfram blogs a massive waste of time.
They are boring to the extreme.
I think this is novel (I've seen BNN https://arxiv.org/pdf/1601.06071 This actually makes things continuous for training, but if inference is sufficiently fast and you have an effective mechanism for permutation, training could be faster using that)
I am curious what other folks (especially researchers) think. The takes on Wolfram are not always uniformly positive but this is interesting (I think!)
https://en.wikipedia.org/wiki/Tsetlin_machine
They are discrete, individually interpretable, and can be configured into complicated architectures.
>There’s no overarching theory to it in itself; it’s just a reflection of the resources that were out there. Or, in the case of machine learning, one can expect that what one sees will be to a large extent a reflection of the raw characteristics of computational irreducibility
Strikes me as a very reductive and defeatist take that flies in the face of the grand agenda Wolfram sets forth.
It would have been much more productive to chisel away at it to figure out something rather than expecting the Theory to be unveiled in full at once.
For instance, what I learn from the kinds of playing around that Wolfram does in the article is: neural nets are but one way to achieve learning & intellectual performance, and even within that there are a myriad different ways to do it, but most importantly: there is a breadth vs depth trade-off, in that neural nets being very broad/versatile are not quite the best at going deep/specialised; you need a different solution for that (e.g. even good old instruction set architecture might be the right thing in many cases). This is essentially why ChatGPT ended up needing Python tooling to reliably calculate 2+2.
"tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought."
It hurts one's pride to realize that the specialized thing they do isn't quite as special as was previously thought.
> a standard result from calculus gives us a vastly more efficient procedure that in effect “maximally reuses” parts of the computation that have already been done.
This partially explains why gradient descent becomes mainstream.
The acrobatics that Wolfram can do with the code and his analysis is awesome, and doing the same without the homoiconicity and metaprogramming makes my poor brain shudder.
Do note, Wolfram Language is homoiconic, and I think I remember reading that it supports Fexprs. It has some really neat properties, and it's a real shame that it's not Open Source and more widely used.
Related
Machine Learning Systems with TinyML
"Machine Learning Systems with TinyML" simplifies AI system development by covering ML pipelines, data collection, model design, optimization, security, and integration. It emphasizes TinyML for accessibility, addressing model architectures, training, inference, and critical considerations. The open-source book encourages collaboration and innovation in AI technology.
Guide to Machine Learning with Geometric, Topological, and Algebraic Structures
The paper discusses the shift in machine learning towards handling non-Euclidean data with complex structures, emphasizing the need to adapt classical methods and proposing a graphical taxonomy to unify recent advancements.
Darwin Machines
The Darwin Machine theory proposes the brain uses evolution to efficiently solve problems. It involves minicolumns competing through firing patterns, leading to enhanced artificial intelligence and creativity through recombination in cortical columns.
The Elegant Math of Machine Learning
Anil Ananthaswamy discusses his book on machine learning, emphasizing the importance of data for neural networks, their ability to approximate functions, and the mathematical elegance behind AI's potential applications.
How to get from high school math to cutting-edge ML/AI
The roadmap for learning deep learning includes four stages: foundational math, classical machine learning, deep learning, and advanced techniques like transformers and large language models, ensuring comprehensive understanding and skills development.