GenAI does not Think nor Understand
GenAI excels in language processing but struggles with logic-based tasks. An example reveals inconsistencies, prompting caution in relying on it. PartyRock is recommended for testing language models effectively.
Read original articleGenAI, a tool for simulating artificial intelligence, excels at tasks like language processing and text summarization but falls short in logic-based scenarios. An example provided involves a classic problem with a man, a wolf, a goat, and a cabbage crossing a river, where GenAI incorrectly adds a goat despite the absence of one in the original setup. This inconsistency raises concerns about the reliability of GenAI's responses, highlighting the importance of using it selectively. The author suggests leveraging GenAI for natural language processing and information recall while cautioning against relying on it for tasks requiring logic and comprehension. Due to the unpredictability of responses, testing GenAI becomes challenging, especially in situations where incorrect answers could lead to adverse outcomes. The author recommends PartyRock for testing different language models and creating text, chats, and images through a user-friendly interface.
Related
Lessons About the Human Mind from Artificial Intelligence
In 2022, a Google engineer claimed AI chatbot LaMDA was self-aware, but further scrutiny revealed it mimicked human-like responses without true understanding. This incident underscores AI limitations in comprehension and originality.
Gemini's data-analyzing abilities aren't as good as Google claims
Google's Gemini 1.5 Pro and 1.5 Flash AI models face scrutiny for poor data analysis performance, struggling with large datasets and complex tasks. Research questions Google's marketing claims, highlighting the need for improved model evaluation.
Txtai – A Strong Alternative to ChromaDB and LangChain for Vector Search and RAG
Generative AI's rise in business and challenges with Large Language Models are discussed. Retrieval Augmented Generation (RAG) tackles data generation issues. LangChain, LlamaIndex, and txtai are compared for search capabilities and efficiency. Txtai stands out for streamlined tasks and text extraction, despite a narrower focus.
How Good Is ChatGPT at Coding, Really?
A study in IEEE evaluated ChatGPT's coding performance, showing success rates from 0.66% to 89%. ChatGPT excelled in older tasks but struggled with newer challenges, highlighting strengths and vulnerabilities.
Pop Culture
Goldman Sachs report questions generative AI's productivity benefits, power demands, and industry hype. Economist Daron Acemoglu doubts AI's transformative potential, highlighting limitations in real-world applications and escalating training costs.
We need to make a distinction between levels of capabilities for independent agents with the capacity to take in information and produce a set of actions
1. Entry level action: Enough capacity to execute a task or set of tasks by implementing written instructions prepared by an expert (machines can do almost 100% of these tasks at this point)
2. Intermediate capacity: Determine what needs to be done from vague requirements, defining tasks with no intermediate specifications and completing those tasks (Current state of GPTs)
3. Expert capacity: Recognizing the context of the situation, Knowing how to define the problem in a way that is appropriate for context, evaluating the options and resources, setting up structures to granularity define the set of resources needed (GPTs can do this in limited and narrow cases)
Most humans only act at the first level. This is especially true in work contexts, and almost nobody would be hired if all jobs required level 2 or 3 capacity
What I’ve seen over 15 years of being in AI (and leadership generally) is that - as expected - the goalposts for what counts as “AI” always move up the expert chain with no relation to the existing distribution of capabilities of existing agents (humans)
This author is measuring GPTs on being consistent and pervasive with superhuman behaviors - but compared to what we measure as the average human capacity these systems are already superhuman.
We’re already well past the point that ChatGPT is a more coherent “thinker” than all but the top 1% of all humans. So what is good enough?
a man with a wolf and a cabbage wants to cross a river/ There is a boat but the man can only bring one more item beside himself at a time. What are the steps the man needs to do in order to cross the river?
The response was:To solve the puzzle where a man needs to cross a river with a wolf and a cabbage, while ensuring none of them are left together unsupervised (since the wolf will eat the sheep, and the sheep will eat the cabbage), follow these steps:
1.Man takes the sheep across the river (leaving the wolf
and the cabbage on the original side).
2.Man returns alone.
3.Man takes the cabbage across the river.
4.Man leaves the cabbage on the other side and takes the sheep back
5.Man takes the wolf across the river.
6.Man returns alone.
7.Man takes the sheep across the river.
8.This way, at no point are the wolf and the sheep left alone together, nor are the sheep and the cabbage left alone together.
at which I prompted: I didn't mention a sheepThe response was corrected but wrongly:
Oops, my mistake! Let’s solve the puzzle correctly with a man, a wolf, and a cabbage:
1.Man takes the wolf across the river (leaving the cabbage on the original side).
2.Man returns alone.
3.Man takes the cabbage across the river.
4.Man leaves the cabbage on the other side and takes the wolf back.
5.Man takes the man across the river.
6.Man returns alone.
7.Man takes the wolf across the river.
8.Now, both the man and the wolf are across the river, with no danger to the cabbage.
At step 3 the problem was solved but it kept on going.Not really the experience most people is getting at using AIs nowadays.
It actually shows a high degree of understanding or plays mostly perfectly an internal process that leads to show what humans perceive continously as "understanding" stuff.
In many scenarios shows a high degree of logical reasoning/thinking/pattern detection driven behavior. Again, whatever internal process used, is what humans usually think as "logic".
Some processes are known, others are a bit black boxes yet, mainly because of the non deterministic stuff happening inside the models, I mean the reason behind why model produces a different output having been inputed the same prompt more than once.
Not everything is mistery, most processes are known by now, they are "subproducts" of emergent mathematical structures inside the model's architecture, like when you take circle and you make it rotate around a point, you get a sphere.
The emergent "sphere" would be the "logic" and "reason" behind the models.
Are these terms well defined or just subjective "I know it when I feel it" echoes of an unresolved debate over residual beliefs in a dualistic mind/body dichotomy?
What if it turns out that a confluent sequence of sensory inputs amounts to a unique neurophysical vector that initiates a particular activation cascade in another cluster of nerves, some outputs of which do not have images in the conscious domain (are not phenomenological) despite strongly informing it resulting in what we call 'creativity', all together defining a path back through our sensory encoding/decoding apparatus which we recognize as 'thought.'
I am not convinced that we are looking at this question through the right end of the telescope here.
Related
Lessons About the Human Mind from Artificial Intelligence
In 2022, a Google engineer claimed AI chatbot LaMDA was self-aware, but further scrutiny revealed it mimicked human-like responses without true understanding. This incident underscores AI limitations in comprehension and originality.
Gemini's data-analyzing abilities aren't as good as Google claims
Google's Gemini 1.5 Pro and 1.5 Flash AI models face scrutiny for poor data analysis performance, struggling with large datasets and complex tasks. Research questions Google's marketing claims, highlighting the need for improved model evaluation.
Txtai – A Strong Alternative to ChromaDB and LangChain for Vector Search and RAG
Generative AI's rise in business and challenges with Large Language Models are discussed. Retrieval Augmented Generation (RAG) tackles data generation issues. LangChain, LlamaIndex, and txtai are compared for search capabilities and efficiency. Txtai stands out for streamlined tasks and text extraction, despite a narrower focus.
How Good Is ChatGPT at Coding, Really?
A study in IEEE evaluated ChatGPT's coding performance, showing success rates from 0.66% to 89%. ChatGPT excelled in older tasks but struggled with newer challenges, highlighting strengths and vulnerabilities.
Pop Culture
Goldman Sachs report questions generative AI's productivity benefits, power demands, and industry hype. Economist Daron Acemoglu doubts AI's transformative potential, highlighting limitations in real-world applications and escalating training costs.