Large language models have developed a higher-order theory of mind
Large language models like GPT-4 and Flan-PaLM perform comparably to adults on theory of mind tasks. Study shows GPT-4 excels in 6th order inferences. Model size and fine-tuning influence ToM abilities in LLMs, impacting user-facing applications.
Read original articleLarge language models (LLMs) like GPT-4 and Flan-PaLM have shown performance comparable to adult humans on higher-order theory of mind (ToM) tasks, which involve reasoning about multiple mental and emotional states recursively. A study introduced a test suite called Multi-Order Theory of Mind Q&A to compare LLMs with adult human benchmarks. Results indicate that GPT-4 and Flan-PaLM achieved adult-level or near adult-level performance on ToM tasks, with GPT-4 even surpassing adult performance on 6th order inferences. The study suggests an association between model size, fine-tuning, and the development of ToM abilities in LLMs. The best-performing LLMs seem to have acquired a generalized capacity for ToM, which is crucial for various human behaviors involving cooperation and competition. These findings have significant implications for the practical applications of user-facing LLMs.
Related
Testing Generative AI for Circuit Board Design
A study tested Large Language Models (LLMs) like GPT-4o, Claude 3 Opus, and Gemini 1.5 for circuit board design tasks. Results showed varied performance, with Claude 3 Opus excelling in specific questions, while others struggled with complexity. Gemini 1.5 showed promise in parsing datasheet information accurately. The study emphasized the potential and limitations of using AI models in circuit board design.
Researchers describe how to tell if ChatGPT is confabulating
Researchers at the University of Oxford devised a method to detect confabulation in large language models like ChatGPT. By assessing semantic equivalence, they aim to reduce false answers and enhance model accuracy.
Large Language Models are not a search engine
Large Language Models (LLMs) from Google and Meta generate algorithmic content, causing nonsensical "hallucinations." Companies struggle to manage errors post-generation due to factors like training data and temperature settings. LLMs aim to improve user interactions but raise skepticism about delivering factual information.
Meta Large Language Model Compiler
Large Language Models (LLMs) are utilized in software engineering but underused in code optimization. Meta introduces the Meta Large Language Model Compiler (LLM Compiler) for code optimization tasks. Trained on LLVM-IR and assembly code tokens, it aims to enhance compiler understanding and optimize code effectively.
Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs
The study presents a method to boost Large Language Models' retrieval and reasoning abilities for long-context inputs by fine-tuning on a synthetic dataset. Results show significant improvements in information retrieval and reasoning skills.
Theory of mind is obviously coupled with the recognition that we can be outsmarted - so this type of research is crucial in order to gauge what stage we are on.
Related
Testing Generative AI for Circuit Board Design
A study tested Large Language Models (LLMs) like GPT-4o, Claude 3 Opus, and Gemini 1.5 for circuit board design tasks. Results showed varied performance, with Claude 3 Opus excelling in specific questions, while others struggled with complexity. Gemini 1.5 showed promise in parsing datasheet information accurately. The study emphasized the potential and limitations of using AI models in circuit board design.
Researchers describe how to tell if ChatGPT is confabulating
Researchers at the University of Oxford devised a method to detect confabulation in large language models like ChatGPT. By assessing semantic equivalence, they aim to reduce false answers and enhance model accuracy.
Large Language Models are not a search engine
Large Language Models (LLMs) from Google and Meta generate algorithmic content, causing nonsensical "hallucinations." Companies struggle to manage errors post-generation due to factors like training data and temperature settings. LLMs aim to improve user interactions but raise skepticism about delivering factual information.
Meta Large Language Model Compiler
Large Language Models (LLMs) are utilized in software engineering but underused in code optimization. Meta introduces the Meta Large Language Model Compiler (LLM Compiler) for code optimization tasks. Trained on LLVM-IR and assembly code tokens, it aims to enhance compiler understanding and optimize code effectively.
Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs
The study presents a method to boost Large Language Models' retrieval and reasoning abilities for long-context inputs by fine-tuning on a synthetic dataset. Results show significant improvements in information retrieval and reasoning skills.