July 7th, 2024

Reasoning in Large Language Models: A Geometric Perspective

The paper delves into how large language models reason geometrically, linking self-attention graph density to expressive power. Higher intrinsic dimensions enhance LLMs' capacity, supported by theoretical, toy examples, and empirical evidence.

Read original articleLink Icon
Reasoning in Large Language Models: A Geometric Perspective

The paper titled "Reasoning in Large Language Models: A Geometric Perspective" by Romain Cosentino and Sarath Shekkizhar explores the reasoning abilities of large language models (LLMs) through a geometric lens. The study establishes a connection between the expressive power of LLMs and the density of their self-attention graphs, showing that the density of these graphs defines the intrinsic dimension of the inputs to the MLP blocks. The research indicates that a higher intrinsic dimension leads to a greater expressive capacity of the LLM. The authors provide theoretical analysis and toy examples to support their findings and offer empirical evidence linking this geometric framework to recent advancements in methods aimed at improving the reasoning capabilities of LLMs. The work falls under the subjects of Artificial Intelligence and Computation and Language.

Related

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

The video discusses limitations of large language models in AI, emphasizing genuine understanding and problem-solving skills. A prize incentivizes AI systems showcasing these abilities. Adaptability and knowledge acquisition are highlighted as crucial for true intelligence.

LLMs now write lots of science. Good

LLMs now write lots of science. Good

Large language models (LLMs) are significantly shaping scientific papers, with up to 20% of computer science abstracts and a third in China influenced by them. Debates persist on the impact of LLMs on research quality and progress.

Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs

Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs

The study presents a method to boost Large Language Models' retrieval and reasoning abilities for long-context inputs by fine-tuning on a synthetic dataset. Results show significant improvements in information retrieval and reasoning skills.

Large language models have developed a higher-order theory of mind

Large language models have developed a higher-order theory of mind

Large language models like GPT-4 and Flan-PaLM perform comparably to adults on theory of mind tasks. Study shows GPT-4 excels in 6th order inferences. Model size and fine-tuning influence ToM abilities in LLMs, impacting user-facing applications.

Mental Modeling of Reinforcement Learning Agents by Language Models

Mental Modeling of Reinforcement Learning Agents by Language Models

The study examines large language models' (LLMs) ability to understand reinforcement learning (RL) agents through agent mental modeling. LLMs currently struggle to fully model agents, highlighting the importance of enhancing their capacity.

Link Icon 16 comments
By @john-tells-all - 7 months
AI has a "bathtub curve" of value. At the low level, it's a super-autocomplete, able to write 1-3 lines of code that works good enough. At the high level, it's great for explaining high-level concepts that are relevant to a task at hand.

In the middle... AI doesn't work very well.

If an AI writes a multi-step plan, where the pieces have to fit together, I've found it goes off the rails. Parts 1 and 3 of a 4-part plan are fine. So is part 2. However they don't fit together! AI has no concept of "these four parts have to be closely connected, building a whole". It just builds from A to B in four steps... but taking two different paths and stitching the pieces together poorly.

By @EncomLab - 7 months
Does anyone remember the "Mad Libs" games - you fill out a form with blanks for "verb", "noun", "adjective", etc - then on the next page you fill in the words from the form to create a silly story. The results are funny because the words you provided initially were without context - they were syntactically correct, but were nonsense in context.

LLM's are like Mad Libs with a "contextual predictor" - they produce syntactically correct output, and the "contextual predictor" limits the amount of nonsense because statistical correlations can generate meaningful output most of the time. But there is no "reasoning" occurring here - just syntactic templating and statistical auto-complete.

By @lifeisstillgood - 7 months
But I understand there are two sides to the discussion - that by ingesting huge amounts of text these models have somehow built reasoning capabilities (language then reasoning) or that the reasoning was done by humans and then written down so as long as you ask something like “should romeo find another love after Juliet” there is a set of reasoning reflected in a billion English literature essays and the model just reflects those answers

Am I missing something?

By @dr_dshiv - 7 months
What does reasoning have to do with geometry? Is this like the idea that different concepts have inherent geometrical forms? A Platonic or noetic take on the geometries of reason? (I struggled to understand much of this paper…)
By @justkk - 7 months
What are regions in this context?, are more regions better, how one delimiter the regions?, can one region be the same concept as several related regions?
By @voidhorse - 6 months
As with many philosophical discussions, there is no point in claiming LLMs can "reason" because "reason" is not a well-defined term and you will not get everyone to agree on a singular definition.

Ask a computer scientist, continental philosopher, and anthropologist what "reason" is and they will give you extremely different answers.

If by reason we mean deductive reasoning as practiced in mathematics and inductive reasoning as practiced in the sciences, there is no evidence that LLMs do anything of the sort. There is no reason (ha) to believe that linguistic pattern matching is enough to emulate all that we call thinking in man. To claim so is to adopt an drastically narrow definition of "thinking" and to ignore the fact that we are embodied intellects, capable of knowing ourselves in a transparent possibly prelinguistic way. Unless an AI becomes embodied and can do the same, I have no faith that it will ever "think" or "reason" as humans do. It remains a really good statistical parlor trick.

By @DrMiaow - 7 months
"just add more dimensions, bro!"
By @magicalhippo - 7 months
I'm not into AI, but I like to watch from the sidelines. Here's my non-AI summary of the paper after glossing through (corrections appreciated):

The multilayered perceptron[1] layers used in modern neural networks, like LLMs, essentially partitions the input into multiple regions. They show that the number of regions a single MLP layer can partition into depends exponentially on the intrinsic dimension[2] of the input. The number of regions/partitions increases the approximation power of the MLP layer.

Thus you can significantly increase the approximation power of a MLP layer without increasing the number of neurons, by essentially "distilling" the input to it.

In the transformer architecture, the inputs to the MLP layers are the self-attention layers[3]. The authors then show that the graph density of the self-attention layers[3] correlates strongly with the intrinsic dimension of the self-attention layer. Thus a more dense self-attention layer means the MLP can do a better job.

One way of increasing the density of the attention layers is to add more context. (edited, see comment) They show that prepending any token as context to a question which increases the intrinsic dimension of the final layer makes the LLM perform better.

They also note that the transformer architecture is susceptible to compounding approximation errors, and that the much more precise partitioning provided by the MLP layers when fed with high intrinsic-dimensional input can help with this. However the impact of this on generalization remains to be explored further.

If the results hold up it does seem like this paper provides nice insight into how to better optimize LLMs and similar neural networks.

[1]: https://en.wikipedia.org/wiki/Multilayer_perceptron

[2]: https://en.wikipedia.org/wiki/Intrinsic_dimension

[3]: https://en.wikipedia.org/wiki/Transformer_(deep_learning_arc...

By @slashdave - 7 months
Okay. So more weights = more parameter space for expression.

And?

By @benreesman - 7 months
I read too much of this stuff to dive deep unless someone on HN resoundingly endorses it.

From a cursory glance it looks like we’re once again on the verge of realizing that we’re dealing with complex valued weights.

Even Anthropic will be publishing that before the year is out.

By @omerhac - 7 months
Each time research about LLM and reasoning comes out Yan LeCun gets an itch
By @ChicagoDave - 7 months
LLMs do not have the technology to iteratively solve a complex problem.

This is a fact. No graph will change this.

You want “reasoning,” then you need to invent a new technology to iterate, validate, experiment, validate, query external expertise, and validate again. When we get that technology, then AI will become resilient in solving complex problems.

By @bastien2 - 7 months
You can't "enhance" from zero. LLMs by design are not capable of reason.

We can observe LLM-like behaviour in humans: all those reactionaries who just parrot whatever catchphrases mass media programmed into them. LLMs are just the computer version of that uncle who thinks Fox News is true and is the reason your nieces have to wear long pants at family gatherings.

He doesn't understand the catchphrases he parrots any more than the chatbots do.

Actual AI will require a kind of modelling that as yet does not exist.