September 17th, 2024

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

The study by Zhiyuan Li and colleagues demonstrates that the Chain of Thought approach enhances large language models' performance on arithmetic and symbolic reasoning tasks, enabling better serial computation capabilities.

Read original article

SkepticismCuriosityFrustration

The paper titled "Chain of Thought Empowers Transformers to Solve Inherently Serial Problems" by Zhiyuan Li and colleagues explores the effectiveness of the Chain of Thought (CoT) approach in enhancing the performance of large language models (LLMs) on tasks requiring arithmetic and symbolic reasoning. The authors provide a theoretical framework that explains how CoT enables decoder-only transformers to perform inherently serial computations, a capability that is typically limited in low-depth transformers. Previous research indicated that constant-depth transformers with finite precision could only address problems in the complexity class TC^0 without CoT. This study tightens the expressiveness bounds for such transformers, showing they can only solve problems in AC^0 with constant-bit precision. However, by incorporating T steps of CoT, these transformers can tackle any problem solvable by boolean circuits of size T, significantly improving accuracy on challenging tasks like permutation group composition and circuit value problems. The findings suggest that CoT is a crucial mechanism for enhancing the computational capabilities of transformers, particularly in scenarios where parallel computation is insufficient.

- The Chain of Thought (CoT) method improves the accuracy of large language models on complex tasks.

- CoT enables transformers to perform serial computations, which are typically challenging for low-depth models.

- The study establishes tighter expressiveness bounds for constant-depth transformers.

- Incorporating CoT allows transformers to solve problems beyond the limitations of traditional approaches.

- Empirical results demonstrate significant performance improvements in tasks that are difficult for parallel computation.

How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad

The study by Emmanuel Abbe et al. delves into Transformers' reasoning limitations, introducing 'distribution locality' and proposing an 'inductive scratchpad' to enhance learning and generalization, highlighting challenges in composing syllogisms.

Transformer Layers as Painters

The study "Transformer Layers as Painters" by Qi Sun et al. delves into transformer models, showcasing layer impact variations and potential for model optimization through strategic layer adjustments.

Transformer Explainer

The Transformer architecture has transformed AI in text generation, utilizing self-attention and key components like embedding and Transformer blocks, while advanced features enhance performance and stability.

Symmetric Power Transformers

Symmetric Power Transformers enhance linear transformer performance by using higher-dimensional embeddings and a hyperparameter \(p\) for state size, showing improved capabilities and compatibility with rotary embeddings in experiments.

Deductive Verification for Chain-of-Thought Reasoning in LLMs

The paper discusses limitations of Chain-of-Thought prompting in LLMs, proposing a framework called Natural Program to improve deductive reasoning accuracy and trustworthiness through structured verification and smaller subprocesses.

AI: What people are saying

The comments on the article reflect a mix of skepticism and curiosity regarding the Chain of Thought (CoT) approach in large language models.

Several commenters question the practical implications of the findings, suggesting that while theoretically interesting, they may not lead to real-world applications.
There is a discussion about the relationship between transformers and established computational theories, such as the Universal Approximation Theorem.
Some users express doubts about the novelty of the claims, arguing that the ability to solve problems with sufficient resources is not groundbreaking.
Concerns are raised about the nature of reasoning in AI, with calls for models to not just generate language but to exhibit genuine understanding and logic.
Comments highlight the need for further benchmarking and exploration of the CoT approach's effectiveness in practical scenarios.

25 comments

By @lsy - 7 months

Note that for the purposes of this paper a “problem” just means a formally decidable problem or a formal language, and the proof is that by creatively arranging transformers you can make individual transformer runs behave like individual Boolean circuits. However, this is a long way from any practical application of transformers: for one thing, most problems we care about are not stated as formal languages, and we already have an exceptionally more efficient way to implement Boolean circuits.

By @lgessler - 7 months

I liked Yoav Goldberg snarky's quote tweet:

> next paper: transformers can solve any problem but on some of them they may compute indefinitely and never provide an answer

> (and you cannot tell in advance which is which!!)

https://twitter.com/yoavgo/status/1835802380589203802

By @sigmoid10 - 7 months

>Remarkably, constant depth is sufficient.

How would that be remarkable, when it is exactly what he Universal Approximation Theorem already states? Since transformers also use fully connected layers, none of this should really come as a surprise. But from glancing at the paper, they don't even mention it.

By @larodi - 7 months

I'm waiting for peoples of AI to discover syllogism and inference in its original PROLOG sense, which this CoT abomination basically tries to achieve. Interestingly, if all logical content is translated to rules, and then only rules are fed into the LLM training set, what would the result be, and can the probabilistic magic be made into actually following reason without all the dice.

By @wodenokoto - 7 months

But didn't we already know that NN can solve any computable problem? The interesting thing is if they can be trained to solve any (computable) problem.

By @nopinsight - 7 months

In the words of an author:

"What is the performance limit when scaling LLM inference? Sky's the limit.

We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient.

http://arxiv.org/abs/2402.12875 (ICLR 2024)"

https://x.com/denny_zhou/status/1835761801453306089

By @JSDevOps - 7 months

So given infinite time and resources it can solve any problem? Hardly groundbreaking is it.

By @mg - 7 months

Has it been publicly benchmarked yet, if this approach:

    Hello LLM, please solve this task: <task>

Can be improved by performing this afterwards?

    for iteration in range(10):
        Hello LLM, please solve this task: <task>
        Here is a possible solution: <last_reply>
        Please look at it and see if you can improve it.
        Then tell me your improved solution.

By @HarHarVeryFunny - 7 months

Sure, in same sense as an infinitely long tape let's a Turing machine solve arbitrary problems. In theory at least. If one had the right program.

By @tossandthrow - 7 months

> We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.

This is also the case with plain and regular RNNs

By @smusamashah - 7 months

https://x.com/ctjlewis/status/1786948443472339247

"Running cellular automata and other programs on Claude 3 Opus."

Its one of the replies on this tweet.

By @seydor - 7 months

'can'

But will they? I believe the frontier has moved to making them make sense instead of just making infinite language.

The infinite monkey problem is not solved yet

By @scotty79 - 7 months

Chain of thought GPT is sort of a Turing machine with a tape that it's allowed to write to for purposes other than outputting the answer.

By @smusamashah - 7 months

A reply in this twitter thread links to a detailed blog post titled "Universal computation by attention: Running cellular automata and other programs on Claude 3 Opus." https://x.com/ctjlewis/status/1786948443472339247

By @phemartin - 7 months

Google Illuminate Summary Chat: https://illuminate.google.com/library?play=UzUI5b_HF8UI

By @floppiplopp - 7 months

They have also mathematically proven that transformers are great randomness generators.

By @cpldcpu - 7 months

Can any of these tools do anything that the Github copilot cannot do? (Apart from using other models?). I tried Continue.dev and cursor.ai, but it was not immediately obvious to me. Maybe I am missing something workflow specific?

By @empath75 - 7 months

Is this more general than LLMs? Is it possible to do something Chain-of-Thought-like in a transformer model that _isn't_ trained on language?

By @glial - 7 months

Apologies if this is a dumb question, but aren't all computations inherently serial? In that a Turing machine performs operations serially?

By @tonii141 - 7 months

Random generator of tokens can also solve any problem if you give it enough time and memory.

By @qmatch - 7 months

Is this similar to the Universal Approximator Theorem?

By @CarRamrod - 7 months

Damn, we just used our entire Round A acquiring an infinite amount of bananas and typewriter ink. The boss is not going to like this.

By @theshrike79 - 7 months

Are we getting to a point where the LLM will just answer "42" and we need to figure out the question? =)

By @bottlepalm - 7 months

Forget UBI, we're going to need Universal Basic Compute.

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Related