July 2nd, 2024

Large language models have developed a higher-order theory of mind

Large language models like GPT-4 and Flan-PaLM perform comparably to adults on theory of mind tasks. Study shows GPT-4 excels in 6th order inferences. Model size and fine-tuning influence ToM abilities in LLMs, impacting user-facing applications.

Read original article

Large language models have developed a higher-order theory of mind

Large language models (LLMs) like GPT-4 and Flan-PaLM have shown performance comparable to adult humans on higher-order theory of mind (ToM) tasks, which involve reasoning about multiple mental and emotional states recursively. A study introduced a test suite called Multi-Order Theory of Mind Q&A to compare LLMs with adult human benchmarks. Results indicate that GPT-4 and Flan-PaLM achieved adult-level or near adult-level performance on ToM tasks, with GPT-4 even surpassing adult performance on 6th order inferences. The study suggests an association between model size, fine-tuning, and the development of ToM abilities in LLMs. The best-performing LLMs seem to have acquired a generalized capacity for ToM, which is crucial for various human behaviors involving cooperation and competition. These findings have significant implications for the practical applications of user-facing LLMs.

Testing Generative AI for Circuit Board Design

A study tested Large Language Models (LLMs) like GPT-4o, Claude 3 Opus, and Gemini 1.5 for circuit board design tasks. Results showed varied performance, with Claude 3 Opus excelling in specific questions, while others struggled with complexity. Gemini 1.5 showed promise in parsing datasheet information accurately. The study emphasized the potential and limitations of using AI models in circuit board design.

Researchers describe how to tell if ChatGPT is confabulating

Researchers at the University of Oxford devised a method to detect confabulation in large language models like ChatGPT. By assessing semantic equivalence, they aim to reduce false answers and enhance model accuracy.

Large Language Models are not a search engine

Large Language Models (LLMs) from Google and Meta generate algorithmic content, causing nonsensical "hallucinations." Companies struggle to manage errors post-generation due to factors like training data and temperature settings. LLMs aim to improve user interactions but raise skepticism about delivering factual information.

Meta Large Language Model Compiler

Large Language Models (LLMs) are utilized in software engineering but underused in code optimization. Meta introduces the Meta Large Language Model Compiler (LLM Compiler) for code optimization tasks. Trained on LLVM-IR and assembly code tokens, it aims to enhance compiler understanding and optimize code effectively.

Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs

The study presents a method to boost Large Language Models' retrieval and reasoning abilities for long-context inputs by fine-tuning on a synthetic dataset. Results show significant improvements in information retrieval and reasoning skills.

5 comments

By @smusamashah - 10 months

These models are static snapshots of an arbitrary level of intelligence. Are there any small LLMs which are not static and can keep updating in real time with external or internal feedback?

By @Springtime - 10 months

That talk[1] (also paper[2]) covering some of this by the Microsoft team involved with the unfettered GPT-4 model certainly shifted my views last year at a time when until then it was more common to see downplaying about LLMs based on prior gen models. Recently saw an AI safety specialist who also become more concerned about the rapidity of progress cite their research.

[1] https://www.youtube.com/watch?v=qbIk7-JPB2c

[2] https://arxiv.org/abs/2303.12712

By @stareatgoats - 10 months

Proving theory of mind in LLMs is certainly one step toward recognizing that they are well on the path of acquiring "human-like" intelligence - if there are still people around who doubt that (proof of which would be if the ensuing discussion centers around if this is still just guessing the next token or whether hallucinations are proof of permanent unintelligence, etc).

Theory of mind is obviously coupled with the recognition that we can be outsmarted - so this type of research is crucial in order to gauge what stage we are on.

By @coreyh14444 - 10 months

I liked this discussion from a while back with Tim Lee and Robert Wright talking about ToM: https://youtu.be/yAJkmwn8jDo?si=3_9cSdqQBriM58SA&t=2488 (particularly the bit about how the model correctly called a complex case of schadenfreude)

Large language models have developed a higher-order theory of mind

Related

Testing Generative AI for Circuit Board Design

Researchers describe how to tell if ChatGPT is confabulating

Large Language Models are not a search engine

Meta Large Language Model Compiler

Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs

Related

Testing Generative AI for Circuit Board Design

Researchers describe how to tell if ChatGPT is confabulating

Large Language Models are not a search engine

Meta Large Language Model Compiler

Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs