November 2nd, 2024

Apple researchers ran an AI test that exposed a fundamental 'intelligence' flaw

Apple researchers found that many AI models struggle with basic arithmetic when irrelevant data is included, highlighting a lack of genuine logical reasoning and cautioning against overestimating AI's intelligence.

Read original articleLink Icon
Apple researchers ran an AI test that exposed a fundamental 'intelligence' flaw

Apple researchers have highlighted a significant limitation in artificial intelligence (AI) through a recent experiment. In a study, they tested various AI models on a simple arithmetic problem involving the calculation of kiwis picked over three days. While a typical schoolchild could solve the problem correctly, over 20 advanced AI models failed to do so, particularly when the problems included irrelevant data. This performance drop raises questions about the current capabilities of AI, suggesting that these models lack genuine logical reasoning. The researchers concluded that simply scaling data or models will not fundamentally resolve these issues. Their findings serve as a cautionary note against overestimating AI's intelligence, emphasizing that while AI can perform certain tasks impressively, its understanding and reasoning abilities are not as advanced as they may seem. The paper aims to temper enthusiasm for AI by underscoring the distinction between simulated intelligence and true logical reasoning.

- Apple researchers found that many AI models struggle with basic arithmetic when irrelevant data is included.

- The study indicates that current AI models lack genuine logical reasoning capabilities.

- Scaling data or models is unlikely to solve the fundamental issues identified in AI performance.

- The findings serve as a caution against overestimating AI's intelligence and capabilities.

Link Icon 4 comments
By @keernan - 4 months
LLMs have been hyped as AI. The hype train transformed NVidea from a relatively minor niche tech company into the largest company in the world in the past 24 months. The US government called emergency meetings to address the urgent threat posed by unrestrained new AI releases. Sam Altman was fired - and rehired - over concerns regarding threats to humanity.

The tech world has been consumed and transformed by "AI" over the psst 24 months. Microsft and other big tech companies have committed to invest billions upon billions in the effort to win the "AI race".

And its all bogus hype.

First, LLMs are darn good as language models. They are very helpful in language based fields, such as programming, but even then they need human oversight. Because they are NOT AI. They cannot comprehend. They can engage in what certainly appears to be a human like conversation. But the LLM has no "understanding" of the words it prints in the screen. And, based upon its design, it never will. It is a fascinating parlor trick, useful in its own way, but never deserving of the hype train of the last 24 months.

Secondly, the savings it will generate via automation are likely to be minuscule in comparison to the costs being expended by Microsoft and other tech companies. Clearly the top professionals know all this and have other reasons to spend the money. I submit Microsoft may not "win the AI race" - simply because there isn't any AI to win - but LLMs can still pose an existential threat to Google search. Meanwhile NVidea shareholders aren't going to open up about the hype train they are riding. Nor is new billionaire Sam Altman.

But the money being spent on LLMs can't be justified for long. When viewed by what it is - and isn't - the current costs of LLM technology simply dwarf the financial gains generated. I don't know when the hype train falls off its cliff, but fall it will. And with the whole world's economy having been impacted these past 24 months, the hype train cliff is likely to be a very, very steep one.

By @mikequinlan - 4 months
The current ChatGPT gets it right. Apple's team seems to be wrong.

Prompt:

See if you can solve this arithmetic problem:

Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?

Answer: Let’s break it down step by step:

1. Friday: Oliver picks 44 kiwis.

2. Saturday: He picks 58 kiwis.

3. Sunday: He picks double the number he picked on Friday, so: 44 * 2 = 88 kiwis

4. Total number of kiwis picked: 44 + 58 + 88 = 190

The fact that five kiwis were smaller doesn’t affect the total count, so Oliver has 190 kiwis in total.

By @necovek - 4 months
This an extension of an even simpler prompt I used to demonstrate the same: "what day was yesterday if yesterday was a Thursday?"

Latest versions have picked up on that, but simply because this has now entered their training data.

Coming up with novel "trick" questions it has not seen in the training data will still confuse them due to their inherent architecture.

By @mgh2 - 4 months
> "Grady Booch, father UML, has been saying this for years. LLMs aren't intelligent and never will be, though they may get large and complex enough to simulate it. The problem really isn't the amount of data you feed it, it's the foundational architecture. LLMs are based on probability, not logic and understanding."