Reasoning in Large Language Models: A Geometric Perspective
The paper delves into how large language models reason geometrically, linking self-attention graph density to expressive power. Higher intrinsic dimensions enhance LLMs' capacity, supported by theoretical, toy examples, and empirical evidence.
Read original articleThe paper titled "Reasoning in Large Language Models: A Geometric Perspective" by Romain Cosentino and Sarath Shekkizhar explores the reasoning abilities of large language models (LLMs) through a geometric lens. The study establishes a connection between the expressive power of LLMs and the density of their self-attention graphs, showing that the density of these graphs defines the intrinsic dimension of the inputs to the MLP blocks. The research indicates that a higher intrinsic dimension leads to a greater expressive capacity of the LLM. The authors provide theoretical analysis and toy examples to support their findings and offer empirical evidence linking this geometric framework to recent advancements in methods aimed at improving the reasoning capabilities of LLMs. The work falls under the subjects of Artificial Intelligence and Computation and Language.
Related
Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]
The video discusses limitations of large language models in AI, emphasizing genuine understanding and problem-solving skills. A prize incentivizes AI systems showcasing these abilities. Adaptability and knowledge acquisition are highlighted as crucial for true intelligence.
LLMs now write lots of science. Good
Large language models (LLMs) are significantly shaping scientific papers, with up to 20% of computer science abstracts and a third in China influenced by them. Debates persist on the impact of LLMs on research quality and progress.
Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs
The study presents a method to boost Large Language Models' retrieval and reasoning abilities for long-context inputs by fine-tuning on a synthetic dataset. Results show significant improvements in information retrieval and reasoning skills.
Large language models have developed a higher-order theory of mind
Large language models like GPT-4 and Flan-PaLM perform comparably to adults on theory of mind tasks. Study shows GPT-4 excels in 6th order inferences. Model size and fine-tuning influence ToM abilities in LLMs, impacting user-facing applications.
Mental Modeling of Reinforcement Learning Agents by Language Models
The study examines large language models' (LLMs) ability to understand reinforcement learning (RL) agents through agent mental modeling. LLMs currently struggle to fully model agents, highlighting the importance of enhancing their capacity.
In the middle... AI doesn't work very well.
If an AI writes a multi-step plan, where the pieces have to fit together, I've found it goes off the rails. Parts 1 and 3 of a 4-part plan are fine. So is part 2. However they don't fit together! AI has no concept of "these four parts have to be closely connected, building a whole". It just builds from A to B in four steps... but taking two different paths and stitching the pieces together poorly.
LLM's are like Mad Libs with a "contextual predictor" - they produce syntactically correct output, and the "contextual predictor" limits the amount of nonsense because statistical correlations can generate meaningful output most of the time. But there is no "reasoning" occurring here - just syntactic templating and statistical auto-complete.
Am I missing something?
Ask a computer scientist, continental philosopher, and anthropologist what "reason" is and they will give you extremely different answers.
If by reason we mean deductive reasoning as practiced in mathematics and inductive reasoning as practiced in the sciences, there is no evidence that LLMs do anything of the sort. There is no reason (ha) to believe that linguistic pattern matching is enough to emulate all that we call thinking in man. To claim so is to adopt an drastically narrow definition of "thinking" and to ignore the fact that we are embodied intellects, capable of knowing ourselves in a transparent possibly prelinguistic way. Unless an AI becomes embodied and can do the same, I have no faith that it will ever "think" or "reason" as humans do. It remains a really good statistical parlor trick.
The multilayered perceptron[1] layers used in modern neural networks, like LLMs, essentially partitions the input into multiple regions. They show that the number of regions a single MLP layer can partition into depends exponentially on the intrinsic dimension[2] of the input. The number of regions/partitions increases the approximation power of the MLP layer.
Thus you can significantly increase the approximation power of a MLP layer without increasing the number of neurons, by essentially "distilling" the input to it.
In the transformer architecture, the inputs to the MLP layers are the self-attention layers[3]. The authors then show that the graph density of the self-attention layers[3] correlates strongly with the intrinsic dimension of the self-attention layer. Thus a more dense self-attention layer means the MLP can do a better job.
One way of increasing the density of the attention layers is to add more context. (edited, see comment) They show that prepending any token as context to a question which increases the intrinsic dimension of the final layer makes the LLM perform better.
They also note that the transformer architecture is susceptible to compounding approximation errors, and that the much more precise partitioning provided by the MLP layers when fed with high intrinsic-dimensional input can help with this. However the impact of this on generalization remains to be explored further.
If the results hold up it does seem like this paper provides nice insight into how to better optimize LLMs and similar neural networks.
[1]: https://en.wikipedia.org/wiki/Multilayer_perceptron
[2]: https://en.wikipedia.org/wiki/Intrinsic_dimension
[3]: https://en.wikipedia.org/wiki/Transformer_(deep_learning_arc...
And?
From a cursory glance it looks like we’re once again on the verge of realizing that we’re dealing with complex valued weights.
Even Anthropic will be publishing that before the year is out.
This is a fact. No graph will change this.
You want “reasoning,” then you need to invent a new technology to iterate, validate, experiment, validate, query external expertise, and validate again. When we get that technology, then AI will become resilient in solving complex problems.
We can observe LLM-like behaviour in humans: all those reactionaries who just parrot whatever catchphrases mass media programmed into them. LLMs are just the computer version of that uncle who thinks Fox News is true and is the reason your nieces have to wear long pants at family gatherings.
He doesn't understand the catchphrases he parrots any more than the chatbots do.
Actual AI will require a kind of modelling that as yet does not exist.
Related
Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]
The video discusses limitations of large language models in AI, emphasizing genuine understanding and problem-solving skills. A prize incentivizes AI systems showcasing these abilities. Adaptability and knowledge acquisition are highlighted as crucial for true intelligence.
LLMs now write lots of science. Good
Large language models (LLMs) are significantly shaping scientific papers, with up to 20% of computer science abstracts and a third in China influenced by them. Debates persist on the impact of LLMs on research quality and progress.
Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs
The study presents a method to boost Large Language Models' retrieval and reasoning abilities for long-context inputs by fine-tuning on a synthetic dataset. Results show significant improvements in information retrieval and reasoning skills.
Large language models have developed a higher-order theory of mind
Large language models like GPT-4 and Flan-PaLM perform comparably to adults on theory of mind tasks. Study shows GPT-4 excels in 6th order inferences. Model size and fine-tuning influence ToM abilities in LLMs, impacting user-facing applications.
Mental Modeling of Reinforcement Learning Agents by Language Models
The study examines large language models' (LLMs) ability to understand reinforcement learning (RL) agents through agent mental modeling. LLMs currently struggle to fully model agents, highlighting the importance of enhancing their capacity.