September 20th, 2024

The Sobering Reality of AI: A Researcher's Perspective

Terrance Craddock critiques large language models, noting a 10% success rate in accurate responses. He highlights their unreliability through a simple test, raising concerns about AI's practical applications and credibility.

Read original articleLink Icon
The Sobering Reality of AI: A Researcher's Perspective

Terrance Craddock, an independent AI researcher, shares his critical perspective on the current state of artificial intelligence, particularly large language models with 70 billion parameters. He argues that the excitement surrounding AI is exaggerated, as his experience reveals a mere 10% success rate in generating accurate and useful responses from these models. The remaining 90% of outputs are often irrelevant, nonsensical, or incorrect, which he believes undermines the credibility of the field. Craddock illustrates this point through a simple experiment where he asks AI models how many 'r's are in the word "strawberry." Despite the simplicity of the question, many models incorrectly assert there are two 'r's and refuse to reconsider their answers when challenged. This highlights a significant flaw in AI's reliability and raises concerns about its practical applications.

- AI models currently have a 10% success rate in providing accurate responses.

- The majority of outputs from these models are irrelevant or incorrect.

- A simple test reveals AI's inability to perform basic tasks accurately.

- The researcher emphasizes the need for a more realistic understanding of AI capabilities.

- Current AI technology may undermine the credibility of the field due to its flaws.

Link Icon 2 comments
By @mewpmewp2 - 5 months
Behind login wall, behind paywall, starts out by using 70b models to make an argument instead of the most advanced models. Uses the cliched "strawberry" as a test, which you can actually make many of the latest models count correctly if you ask them to count letter by letter like S - 0, T, R - 1, A, W, B, E, R - 2, R - 3, Y. That simulates how humans do it as well. If you were an AI researcher perhaps you would know that it sees tokens, not letters, it's kind of like asking someone who only sees everything you say to them translated into Japanese hieroglyphs and then asking how many R's were in the original text before translation. The only way they would be able to answer this is if they memorized it.

Focuses on some strawman argument about hype. I agree AGI is not here, and I don't see most people claiming that we are even near AGI, the idea of hype is just because people talk about AI so much, and I think for a good reason. It is still immensely valuable in so many different use-cases. It's not going to replace people right now, but it is absolutely going to be a productivity multiplier.

Also looking at the Medium of the Author and the content there, because of the frequency and made up stories conflicting with each other I have to presume it's all AI generated.