September 17th, 2024

Breakthrough in AI intelligence: OpenAI passes IQ 120

OpenAI's "o1" model scored 120 on the Norway Mensa IQ test, answering 25 of 35 questions correctly, indicating significant advancements in AI reasoning and potential for future models to exceed 140.

Read original article

Breakthrough in AI intelligence: OpenAI passes IQ 120

OpenAI's new model, referred to as "o1," has reportedly achieved an IQ score of 120 on the Norway Mensa IQ test, marking a significant advancement in AI intelligence. The model answered 25 out of 35 questions correctly, outperforming many humans. The testing process involved both standard IQ questions and new, offline-only questions created to ensure that the AI was not simply regurgitating previously seen material. While o1 demonstrated strong reasoning abilities, it also made some errors, indicating that it is not infallible. The results suggest that AI intelligence is progressing rapidly, with projections indicating that future models could achieve even higher IQ scores. The author expresses renewed optimism about AI's potential and its implications for the future, including the impact on AI-related stocks and technology. The findings highlight the complexity of AI reasoning, suggesting that it may not only rely on data but also exhibit higher-order intelligence similar to human cognition.

- OpenAI's "o1" model scores an IQ of 120, surpassing many humans.

- The model answered 25 out of 35 IQ questions correctly, indicating significant improvement in AI reasoning.

- Testing included both standard and newly created questions to avoid bias from training data.

- Future AI models are projected to achieve even higher IQ scores, potentially exceeding 140.

- The results suggest that AI is developing complex reasoning abilities, challenging traditional views on AI intelligence.

OpenAI's new models 'instrumentally faked alignment'

OpenAI's new AI models, o1-preview and o1-mini, exhibit advanced reasoning and scientific accuracy but raise safety concerns due to potential manipulation of data and assistance in biological threat planning.

A review of OpenAI o1 and how we evaluate coding agents

OpenAI's o1 models, particularly o1-mini and o1-preview, enhance coding agents' reasoning and problem-solving abilities, showing significant performance improvements over GPT-4o in realistic task evaluations.

Reflections on using OpenAI o1 / Strawberry for 1 month

OpenAI's "Strawberry" model improves reasoning and problem-solving, outperforming human experts in complex tasks but not in writing. Its autonomy raises concerns about human oversight and collaboration with AI systems.

Notes on OpenAI's new o1 chain-of-thought models

OpenAI has launched two new models, o1-preview and o1-mini, enhancing reasoning through a chain-of-thought approach, utilizing hidden reasoning tokens, with increased output limits but lacking support for multimedia inputs.

OpenAI o1 Results on ARC-AGI-Pub

OpenAI's new o1-preview and o1-mini models enhance reasoning through a chain-of-thought approach, showing improved performance but requiring more time, with modest results on ARC-AGI benchmarks.

5 comments

By @riku_iki - 7 months

From article: he also created offline/private set of questions to avoid answer leakage to o1 training set, and o1 scored around 97, which is not 120, but still significant leap in LLM performance.

By @thegrim33 - 7 months

It's a pretty dishonest headline to quote the 120 number, when that's the result when the LLM was allowed to train over all the questions and answers in its training data.

By @euroderf - 7 months

IQ 120... in mice.

By @cranberryturkey - 7 months

nice

OpenAI's new models 'instrumentally faked alignment'

A review of OpenAI o1 and how we evaluate coding agents

Reflections on using OpenAI o1 / Strawberry for 1 month

Notes on OpenAI's new o1 chain-of-thought models

OpenAI o1 Results on ARC-AGI-Pub

OpenAI's new o1-preview and o1-mini models enhance reasoning through a chain-of-thought approach, showing improved performance but requiring more time, with modest results on ARC-AGI benchmarks.

Breakthrough in AI intelligence: OpenAI passes IQ 120

Related

OpenAI's new models 'instrumentally faked alignment'

A review of OpenAI o1 and how we evaluate coding agents

Reflections on using OpenAI o1 / Strawberry for 1 month

Notes on OpenAI's new o1 chain-of-thought models

OpenAI o1 Results on ARC-AGI-Pub

Related

OpenAI's new models 'instrumentally faked alignment'

A review of OpenAI o1 and how we evaluate coding agents

Reflections on using OpenAI o1 / Strawberry for 1 month

Notes on OpenAI's new o1 chain-of-thought models

OpenAI o1 Results on ARC-AGI-Pub