September 2nd, 2024

Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs

The paper examines reasoning abilities of Large Language Models, distinguishing inductive from deductive reasoning. It introduces SolverLearner, showing LLMs excel in inductive reasoning but struggle with deductive tasks, particularly counterfactuals.

Read original article

FrustrationSkepticismConfusion

The paper titled "Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs" by Kewei Cheng and colleagues explores the reasoning capabilities of Large Language Models (LLMs), specifically distinguishing between inductive and deductive reasoning. The authors argue that previous research has not adequately differentiated these two types of reasoning, leading to a conflation of their respective challenges. They introduce a new framework called SolverLearner, which allows LLMs to learn functions that map input data to output values using in-context examples, thereby isolating inductive reasoning from deductive reasoning. Their findings indicate that LLMs exhibit strong inductive reasoning capabilities, achieving near-perfect performance in many cases. However, the study also reveals that LLMs struggle with deductive reasoning, particularly in tasks that require counterfactual reasoning. This research highlights the need for a clearer understanding of the reasoning abilities of LLMs and suggests that while they excel in inductive reasoning, their deductive reasoning skills are comparatively limited.

- The paper distinguishes between inductive and deductive reasoning in LLMs.

- A new framework, SolverLearner, is proposed to enhance the study of inductive reasoning.

- LLMs demonstrate strong inductive reasoning capabilities but struggle with deductive reasoning.

- The research emphasizes the importance of understanding the reasoning abilities of LLMs.

- Findings suggest a need for further exploration of LLMs' deductive reasoning, especially in counterfactual scenarios.

Does Reasoning Emerge? Probabilities of Causation in Large Language Models

The paper investigates the reasoning capabilities of large language models, focusing on probability of necessity and sufficiency, and proposes a framework to evaluate their reasoning through mathematical examples.

AI: What people are saying

The discussion surrounding the reasoning abilities of Large Language Models (LLMs) reveals several critical perspectives.

Many commenters express skepticism about the ability of LLMs to genuinely reason, arguing that they primarily rely on memorization and pattern matching rather than true logical inference.
There is a notable absence of consideration for abductive reasoning in the analysis, with some suggesting it is a significant oversight.
Critics highlight the limitations of the experiments conducted, questioning the validity of the results due to potential biases in the training data.
Some commenters propose that LLMs may exhibit a hybrid form of reasoning that combines statistical calculations with rudimentary reasoning processes.
Overall, there is a consensus that more rigorous definitions and methodologies are needed to assess LLM reasoning capabilities accurately.

14 comments

By @godelski - 8 months

I'm really tired of these papers and experiments.

You cannot test reasoning when you don't know what's in the training set. You have to be able to differentiate reasoning from memorization, and that's not trivial.

Moreso, the results look to confirm that at least some memorization is going on. Do we really not think GPT has extensively been trained on arithmetic in base 10, 8, and 16? This seems like a terrible prior. Even if not explicitly, how much code has it read that performs these tasks. How many web pages, tutorials, Reddit posts cover oct and hex? They also haven't defined zero shot correctly. Arithmetic in these bases aren't 0-shot. They're explicitly in distribution...

I'm unsure about base 9 and 11. It's pretty interesting to see that GPT 4 is much better at these. Anyone know why? Did they train on these? More bases? Doesn't seem unreasonable but I don't know.

The experimentation is also extremely lacking. The arithmetic questions only have 1000 tests where they add two digits. This is certainly in the training data. I'm also unconvinced by the syntax reasoning tasks since the transformer (attention) architecture seems to be designed for this. I'm also unconvinced these tasks aren't in training. Caesar ciphers are also certainly in the training data.

The prompts are also odd and I guess that's why they're in the appendix. For example, getting GPT to be better at math or many tasks by having it write python code is not novel.

There's some stuff here but this really doesn't seem like a lot of work for 12 people from a top university and a trillion dollar company. It's odd to see that many people when the experiments can be run in a fairly short time.

By @hydrox24 - 8 months

Is there a good reason to exclude abductive reasoning from an analysis like this? It's even considered by at least one of the referenced papers (Fangzhi 2023a).

Abductive reasoning is common in day-to-day life. It seeks the best explanation for some (often incomplete) observations, and reaches conclusions without certainty. I would have thought it would be important to assess for LLMs.

By @AdieuToLogic - 8 months

Large Language Model algorithms do not reason.

They are statistical text generators, whose results are defined by their training data set. This is why the paper cited reads thusly:

  Despite extensive research into the reasoning capabilities
  of Large Language Models (LLMs), most studies have failed
  to rigorously differentiate between inductive and deductive
  reasoning ...

There is no differentiation because what was sought is the existence of what does not.

The authors then postulate:

  This raises an essential question: In LLM reasoning, which
  poses a greater challenge - deductive or inductive reasoning?

There is no such thing as "LLM reasoning." Therefore, the greatest challenge is accepting this fact and/or that anthropomorphism is a real thing.

By @randcraw - 8 months

The conclusions of the authors that LLMs can reason inductively very well runs counter to what I've read elsewhere. A big part of doing induction is the ability to generalize a shared pattern from multiple disparate examples, recognizing the essential elements that are necessary and sufficient to satisfy that pattern's operators' constraints. To date, I've seen consensus that LLMs can match verbs or basic relational operators across examples, thereby associating the mechanisms in similar events that lead to similar outcomes. But extending that facility further, to employing predicate logic operators, or even the simpler propositional ones appears to fall largely outside LLM capabilities. To suggest then that LLMs can then perform higher-order reasoning skills yet, like the modeling of contrapositives, this seems quite a stretch.

By @sfink - 8 months

Confession: I haven't read the paper.

But any mention of LLM reasoning ability ought to address the obvious confound: the LLM is trained on examples of deductive reasoning, inductive reasoning, abductive reasoning, SAT-solver reasoning, geniuses' musings, etc. If they replicate one of those examples, then should that be called "reasoning" of any sort or not? Regurgitating those examples may even involve some generalization, if the original topics of an example are swapped out (perhaps by a nearby topic in latent space).

Given that it appears they're training and testing on synthetic problems, this objection probably does not apply to their actual results. But given the fuzziness it creates for the definition of "reasoning" of any sort, I would have expected some working definition of reasoning in the paper's abstract.

Training on Moby Dick and thus being able to regurgitate text from Moby Dick does not mean the LLM is capable of writing a new Moby Dick-like book. (Thankfully; one is more than enough!)

By @jcims - 8 months

LLMs don’t feel like a whole brain, they feel like the impulsive side that gets filtered by other processes.

By @xiphias2 - 8 months

Transformers are amazing pattern matchers and terrible use of GPUs for reasoning, which is mostly search + execution of highly non-linear programs (lambda calculus).

I love seeing Victor Taelin experimenting with parallizing these programs (with HVM and other experiments with proof languages), but it's sometimes a bit sad how much time researchers take in making papers about existing things instead of trying to improve the state-of-the art in something that's most probably missing from the current models.

By @YeGoblynQueenne - 8 months

>> Reasoning encompasses two typical types: deductive reasoning and inductive reasoning.

I don't know about "typical" but every source that classifies reasoning (or, more appropriately, logical inference) as deductive and inductive, also includes the abductive category. This categorisation scheme goes all the way back to Charles Sanders Peirce:

'[Abduction] is logical inference( ... ) having a perfectly definite logical form. ( ... ) Namely, the hypothesis cannot be admitted, even as a hypothesis, unless it be supposed that it would account for the facts or some of them. The form of inference, therefore, is this:

The surprising fact, C, is observed;

But if A were true, C would be a matter of course,

Hence, there is reason to suspect that A is true.' (Collected Papers of Charles Sanders Peirce. Peirce, 1958)

(Quote copied from Abduction and Induction, Essays in their Relation and Integration, Peter Flach and Antonis Kakas eds. 200)

Consider a logical theory, formed of rules in the form of implications like A -> B (premise A implies conclusion B). Abduction is the inference of the premises after observation of the conclusions, i.e. if A -> B AND B is observed, then A may be inferred.

That's a different inference mode than both deduction: inferring a conclusion from a premise, e.g. if A -> B AND A, then B may be inferred; and induction: inferring a rule from an observation, e.g. inferring A -> B after observing A and B. Note that this is a simplification: induction assumes a background theory of more rules A1 -> A2, .... An -> A that can be applied to the observation A and B to infer A -> B.

Anyway, abduction is generally associated with probabilistic reasoning, albeit informally so. That probably means that we should categorise LLM inference as abductive, since it guesses the next token according to a model of probabilities of token sequences. But that's just a, er, guess.

By @calf - 8 months

Why does it have to be an either-or? Maybe LLMs are doing a bit of both, in a weird hybrid way; it is both doing a statistical calculation and yet the network is parameterized to do some very rudimentary (and nonhuman) reasoning computations. That's plausible to me, and explains why the reasoning is so hard to isolate... Just like how looking at a human brain it is hard to isolate the reasoning capacities.

By @Datagenerator - 8 months

The human mind wonders and takes time to dream autonomously. Perhaps the llm.c we need for the next breakthrough addresses rounds of meditation in it's training in order to provoke more reason alike features to the NextGen LLM.

By @moktonar - 8 months

Asking them to make ASCII art is the final test, to me.

By @WaitWaitWha - 8 months

Neither. LLMs are just really, really good pattern matchers with enormous set of patterns.

By @bob1029 - 8 months

LLMs are incredible about mapping to a space of already seen things. When this space is unimaginably large, you can be fooled for a long time.

But, they clearly struggle with generalization and rule following. This failure to generalize (extrapolate, deduce, compute) is why we still can't fire all of our DBAs.

Has anyone encountered an LLM-based text-to-SQL engine that actually gets the job done? I think that's your best canary. I stopped caring somewhere around "transpose these 2 letters of the alphabet" not working consistently.

Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs

Related

Does Reasoning Emerge? Probabilities of Causation in Large Language Models

Related

Does Reasoning Emerge? Probabilities of Causation in Large Language Models