July 25th, 2024

AI solves IMO problems at silver medal level

Google DeepMind's AI systems, AlphaProof and AlphaGeometry 2, solved four out of six International Mathematical Olympiad problems, achieving a silver medalist level, marking a significant milestone in AI mathematical reasoning.

Read original articleLink Icon
CuriositySkepticismExcitement
AI solves IMO problems at silver medal level

Google DeepMind has announced that its AI systems, AlphaProof and AlphaGeometry 2, have achieved a significant milestone by solving four out of six problems from the International Mathematical Olympiad (IMO), reaching a performance level equivalent to a silver medalist. This accomplishment marks the first time AI has matched this standard in the prestigious competition, which has been held annually since 1959 and is recognized as a benchmark for advanced mathematical reasoning capabilities in AI.

AlphaProof utilizes a reinforcement-learning approach to formal mathematical reasoning, while AlphaGeometry 2 is an enhanced version of a geometry-solving system. The AI systems were evaluated by prominent mathematicians, including Prof. Sir Timothy Gowers, who noted the impressive nature of the solutions generated. The problems were translated into formal mathematical language, allowing the AI to process and solve them effectively. AlphaProof successfully tackled two algebra problems and one number theory problem, including the most challenging problem of the competition, while AlphaGeometry 2 solved the geometry problem.

The systems scored a total of 28 points out of a possible 42, just shy of the gold medal threshold. This achievement highlights the potential of AI in mathematical reasoning and its ability to assist mathematicians in exploring complex problems and proofs. DeepMind plans to continue developing these systems and exploring various AI approaches to enhance mathematical reasoning further.

Related

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

The video discusses limitations of large language models in AI, emphasizing genuine understanding and problem-solving skills. A prize incentivizes AI systems showcasing these abilities. Adaptability and knowledge acquisition are highlighted as crucial for true intelligence.

AI can beat real university students in exams, study suggests

AI can beat real university students in exams, study suggests

A study from the University of Reading reveals AI outperforms real students in exams. AI-generated answers scored higher, raising concerns about cheating. Researchers urge educators to address AI's impact on assessments.

AI Revolutionized Protein Science, but Didn't End It

AI Revolutionized Protein Science, but Didn't End It

Artificial intelligence, exemplified by AlphaFold2 and its successor AlphaFold3, revolutionized protein science by predicting structures accurately. AI complements but doesn't replace traditional methods, emphasizing collaboration for deeper insights.

"Superhuman" Go AIs still have trouble defending against these simple exploits

"Superhuman" Go AIs still have trouble defending against these simple exploits

Researchers at MIT and FAR AI found vulnerabilities in top AI Go algorithms, allowing humans to defeat AI with unorthodox strategies. Efforts to improve defenses show limited success, highlighting challenges in creating robust AI systems.

Can the New Mathstral LLM Accurately Compare 9.11 and 9.9?

Can the New Mathstral LLM Accurately Compare 9.11 and 9.9?

Mathstral is a new 7B model by Mistral AI for math reasoning, with a 32k context window and Apache 2.0 license. It aims to improve common sense in math problem-solving, deployable locally with LlamaEdge and shareable via GaiaNet for customization and integration.

AI: What people are saying
The comments on the article about Google DeepMind's AI systems achieving silver medalist level in the International Mathematical Olympiad reveal a mix of excitement and skepticism regarding the implications of this achievement.
  • Many commenters express skepticism about the AI's performance, noting that the problems were manually translated into formal language, which significantly alters the nature of the challenge.
  • There is a discussion about the time taken by the AI to solve problems compared to human competitors, with some arguing that if humans had the same time, they could perform better.
  • Several users highlight the need for more transparency regarding the AI's reasoning process and the nature of its solutions, questioning whether it truly understands the problems or is merely brute-forcing answers.
  • Some commenters emphasize the potential of AI in mathematics, suggesting that it could revolutionize the field, while others caution against overhyping its current capabilities.
  • Concerns are raised about the environmental impact of training such large models and whether the resources spent justify the results achieved.
Link Icon 98 comments
By @Smaug123 - 6 months
So I am extremely hyped about this, but it's not clear to me how much heavy lifting this sentence is doing:

> First, the problems were manually translated into formal mathematical language for our systems to understand.

The non-geometry problems which were solved were all of the form "Determine all X such that…", and the resulting theorem statements are all of the form "We show that the set of all X is {foo}". The downloadable solutions from https://storage.googleapis.com/deepmind-media/DeepMind.com/B... don't make it clear whether the set {foo} was decided by a human during this translation step, or whether the computer found it. I want to believe that the computer found it, but I can't find anything to confirm. Anyone know?

By @necovek - 6 months
This is certainly impressive, but whenever IMO is brought up, a caveat should be put out: medals are awarded to 50% of the participants (high school students), with 1:2:3 ratio between gold, silver and bronze. That puts all gold and silver medalists among the top 25% of the participants.

That means that "AI solves IMO problems better than 75% of the students", which is probably even more impressive.

But, "minutes for one problem and up to 3 days for each remaining problem" means that this is unfortunately not a true representation either. If these students were given up to 15 days (5 problems at "up to 3 days each") instead of 9h, there would probably be more of them that match or beat this score too.

It really sounds like AI solved only a single problem in the 9h students get, so it certainly would not be even close to the medals. What's the need to taint the impressive result with apples-to-oranges comparison?

Why not be more objective and report that it took longer but was able to solve X% of problems (or scored X out of N points)?

By @golol - 6 months
This is the real deal. AlphaGeometry solved a very limited set of problems with a lot of brute force search. This is a much broader method that I believe will have a great impact on the way we do mathematics. They are really implementing a self-feeding pipeling from natural language mathematics to formalized mathematics where they can train both formalization and proving. In principle this pipeline can also learn basic theory building like creating auxilliary definitions and Lemmas. I really think this is the holy grail of proof-assistance and will allow us to formalize most mathematics that we create very naturally. Humans will work podt-rigorously and let the machine asisst with filling in the details.
By @Ericson2314 - 6 months
The lede is a bit buried: they're using Lean!

This is important for more than Math problems. Making ML models wrestle with proof systems is a good way to avoid bullshit in general.

Hopefully more humans write types in Lean and similar systems as a much way of writing prompts.

By @michael_nielsen - 6 months
A good brief overview here from Tim Gowers (a Fields Medallist, who participated in the effort), explaining and contextualizing some of the main caveats: https://x.com/wtgowers/status/1816509803407040909
By @signa11 - 6 months
> ... but whenever IMO is brought up, a caveat should be put out: medals are awarded to 50% of the participants (high school students), with 1:2:3 ratio between gold, silver and bronze. That puts all gold and silver medalists among the top 25% of the participants.

yes, it is true, but getting to the country specific team is itself an arduous journey, and involves brutal winnowing every step of the way f.e. regional math-olympiad, and then national math-olympiad etc.

this is then followed by further trainings specifically meant for this elite bunch, and maybe further eliminations etc.

suffice it to say, that qualifying to be in a country specific team is imho a big deal. getting a gold/silver from amongst them is just plain awesome !

By @fancyfredbot - 6 months
I'm seriously jealous of the people getting paid to work on this. Sounds great fun and must be incredibly satisfying to move the state of the art forward like that.
By @cynicalpeace - 6 months
Machines have been better than humans at chess for decades.

Yet no one cares. Everyone's busy watching Magnus Carlsen.

We are human. This means we care about what other humans do. We only care about machines insofar as it serves us.

This principle is broadly extensible to work and art. Humans will always have a place in these realms as long as humans are around.

By @thrance - 6 months
Theorem proving is a single-player game with an insanely big search space, I always thouht it would be solved long before AGI.

IMHO, the largest contributors to AlphaProof were the people behind Lean and Mathlib, who took the daunting task of formalizing the entirety of mathematics to themselves.

This lack of formalizing in math papers was what killed any attempt at automation, because AI researcher had to wrestle with the human element of figuring out the author's own notations, implicit knowledge, skipped proof steps...

By @zone411 - 6 months
By @Jun8 - 6 months
Tangentially: I found it fascinating to follow along the solution to Problem 6: https://youtu.be/7h3gJfWnDoc (aquaesulian is a node to ancient name of Bath). There’s no advanced math and each step is quite simple, I’d guess on a medium 8th grader level.

Note that the 6th question is generally the hardest (“final boss”) and many top performers couldn’t solve it.

I don’t know what Lean is or how see AI’s proofs but an AI system that can explain such a question on par with the YouTuber above would be fantastic!

By @nopinsight - 6 months
Once Gemini, the LLM, integrates with AlphaProof and AlphaGeometry 2, it might be able to reliably perform logical reasoning. If that's the case, software development might be revolutionized.

"... We'll be bringing all the goodness of AlphaProof and AlphaGeometry 2 to our mainstream #Gemini models very soon. Watch this space!" -- Demis Hassabis, CEO of Google DeepMind. https://x.com/demishassabis/status/1816499055880437909

By @adverbly - 6 months
> First, the problems were manually translated into formal mathematical language for our systems to understand. In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others.

Three days is interesting... Not technically silver medal performance I guess, but let's be real I'd be okay waiting a month for the cure to cancer.

By @riku_iki - 6 months
By @SJC_Hacker - 6 months
The kicker with some of those math competition problems, there will be problems that reduce to finding all natural numbers for which some statement is true. These are almost always small numbers, less than 100 in most circumstances.

Which means these problems are trivial to solve if you have a computer - you can simply check all possibilities. And is precisely the reason why calculators aren't allowed.

But exhaustive searches are not feasible by hand in the time span the problems are supposed to be solved - roughly 30 minutes per problem. You are not supposed to use brute force, but recognize a key insight which simplifies the problem. And I believe even if you did do an exhaustive search, simply giving the answer is not enough for full points. You would have to give adequate justification.

By @robinhouston - 6 months
Some more context is provided by Tim Gowers on Twitter [1].

Since I think you need an account to read threads now, here's a transcript:

Google DeepMind have produced a program that in a certain sense has achieved a silver-medal peformance at this year's International Mathematical Olympiad.

It did this by solving four of the six problems completely, which got it 28 points out of a possible total of 42. I'm not quite sure, but I think that put it ahead of all but around 60 competitors.

However, that statement needs a bit of qualifying.

The main qualification is that the program needed a lot longer than the human competitors -- for some of the problems over 60 hours -- and of course much faster processing speed than the poor old human brain.

If the human competitors had been allowed that sort of time per problem they would undoubtedly have scored higher.

Nevertheless, (i) this is well beyond what automatic theorem provers could do before, and (ii) these times are likely to come down as efficiency gains are made.

Another qualification is that the problems were manually translated into the proof assistant Lean, and only then did the program get to work. But the essential mathematics was done by the program: just the autoformalization part was done by humans.

As with AlphaGo, the program learnt to do what it did by teaching itself. But for that it needed a big collection of problems to work on. They achieved that in an interesting way: they took a huge database of IMO-type problems and got a large language model to formalize them.

However, LLMs are not able to autoformalize reliably, so they got them to autoformalize each problem many times. Some of the formalizations were correct, but even the incorrect ones were useful as training data, as often they were easier problems.

It's not clear what the implications of this are for mathematical research. Since the method used was very general, there would seem to be no obvious obstacle to adapting it to other mathematical domains, apart perhaps from insufficient data.

So we might be close to having a program that would enable mathematicians to get answers to a wide range of questions, provided those questions weren't too difficult -- the kind of thing one can do in a couple of hours.

That would be massively useful as a research tool, even if it wasn't itself capable of solving open problems.

Are we close to the point where mathematicians are redundant? It's hard to say. I would guess that we're still a breakthrough or two short of that.

It will be interesting to see how the time the program takes scales as the difficulty of the problems it solves increases. If it scales with a similar ratio to that of a human mathematician, then we might have to get worried.

But if the function human time taken --> computer time taken grows a lot faster than linearly, then more AI work will be needed.

The fact that the program takes as long as it does suggests that it hasn't "solved mathematics".

However, what it does is way beyond what a pure brute-force search would be capable of, so there is clearly something interesting going on when it operates. We'll all have to watch this space.

1. https://x.com/wtgowers/status/1816509803407040909?s=46

By @amarant - 6 months
This is quite cool! I've found logical reasoning to be one of the biggest weak points of LLMs, nice to see that an alternative approach works better! I've tried to enlist gpt to help me play a android game called 4=10, where you solve simple math problems, and gpt was hilariously terrible at it. It would both break the rules I described, and make math mistakes, such as claiming 6*5-5+8=10

I wonder if this new model could be integrated with an LLM somehow? I get the feeling that combining those two powers would result in a fairly capable programmer.

Also perhaps a LLM could do the translation step that is currently manual?

By @petters - 6 months
The problems were first converted into a formal language. So they were partly solved by the AI
By @nitrobeast - 6 months
Reading into the details, the system is more impressive than the title. 100% of the algebra and geometry problems were solved. The remaining problems are of combinatorial types, which ironically more closely resembles software engineering work.
By @gallerdude - 6 months
Sometimes I wonder if in 100 years, it's going to be surprising to people that computers had a use before AI...
By @StefanBatory - 6 months
Wow, that's absolutely impressive to hear!

Also it's making me think that in 5-10 years almost all tasks involving computer scientists or mathematicians will be done in AI. Perhaps people going into trades had a point.

By @_heimdall - 6 months
I'm still unclear whether the system used here is actually reasoning through the process of solving the problem, or brute forcing solutions with reasoning coming in during the mathematical proof of each potential proof.

Is it clear whether the algorithm is actually learning from why previously attempted solutions failed to prove out, or is it statistically generating potential answers similar to an LLM and then trying to apply reasoning to prove out the potential solution?

By @HL33tibCe7 - 6 months
This is kind of an ideal use-case for AI, because we can say with absolute certainty whether their solution is correct, completely eliminating the problem of hallucination.
By @majikaja - 6 months
It would be nice if on the page they included detailed descriptions of the proofs it came up with, more information about the capabilities of the system and insights into the training process...

If the data is synthetic and covers a limited class of problems I would imagine what it's doing mostly reduces to some basic search pattern heuristics which would be of more value to understand than just being told it can solve a few problems in three days.

By @seydor - 6 months
We need to up the ante: Getting human-like performance on any task is not impressive in itself, what matters is superhuman, orders of magnitude above. These comparisons with humans in order create impressive sounding titles are disguising the fact that we are still at the stone age of intelligence.
By @zhiQ - 6 months
Coincidentally, I just posted about how well LLMs handle adding long strings of numbers: https://userfriendly.substack.com/p/discover-how-mistral-lar...
By @dan_mctree - 6 months
I'm curious if we'll see a world where computers could solve math problems so easily, that we'll be overwhelmed by all the results and stop caring. The role of humans might change to asking the computer interesting questions that we care about.
By @0xd1r - 6 months
> As part of our IMO work, we also experimented with a natural language reasoning system, built upon Gemini and our latest research to enable advanced problem-solving skills. This system doesn’t require the problems to be translated into a formal language and could be combined with other AI systems. We also tested this approach on this year’s IMO problems and the results showed great promise.

Wonder what "great promise" entails. Because it's hard to imagine Gemini and other transformer-based models solving these problems with reasonable accuracy, as there is no elimination of hallucination. At least in the generally available products.

By @skywhopper - 6 months
Except it didn’t. The problem statements were hand-encoded into a formal language by human experts, and even then only one problem was actually solved within the time limit. So, claiming the work was “silver medal” quality is outright fraudulent.
By @1024core - 6 months
> The system was allowed unlimited time; for some problems it took up to three days. The students were allotted only 4.5 hours per exam.

I know speed is just a matter of engineering, but looks like we still have a ways to go. Hold the gong...

By @stonethrowaway - 6 months
It’s like bringing a rocket launcher to a fist fight but I’d like to use these math language models to find gaps in logic when people are making online arguments. It would be an excellent way to verify who has done their homework.
By @PaulHoule - 6 months
See https://en.wikipedia.org/wiki/Automated_Mathematician for an early system that seems similar in some way.
By @nybsjytm - 6 months
To what extent is the training and structure of AlphaProof tailored specifically to IMO-type problems, which typically have short solutions using combinations of a small handful of specific techniques?

(It's not my main point, but it's always worth remembering - even aside from any AI context - that many top mathematicians can't do IMO-type problems, and many top IMO medalists turn out to be unable to solve actual problems in research mathematics. IMO problems are generally regarded as somewhat niche.)

By @osti - 6 months
So they weren't able to solve the combinatorics problem. I'm not super well versed in competition math, but combinatorics always seem to be the most interesting problems to me.
By @lo_fye - 6 months
Remember when people thought computers would never be able to beat a human Grand Master at chess? Ohhh, pre-2000 life, how I miss thee.
By @quirino - 6 months
I honestly expected the IOI (International Olympiad of Informatics) to be "beaten" much earlier than the IMO. There's AlphaCode, of course, but on the latest update I don't think it was quite on "silver medal" level. And available LLM's are probably not even on "honourable mention" level.

I wonder if some class of problems will emerge that human competitors are able to solve but are particularly tricky for machines. And which characteristics these problems will have (e.g. they'll require some sort of intuition or visualization that is not easily formalized).

Given how much of a dent LLM's are already making on beginner competitions (AtCoder recently banned using them on ABC rounds [1]), I can't help but think that soon these competitions will be very different.

[1] https://info.atcoder.jp/entry/llm-abc-rules-en

By @ckcheng - 6 months
There doesn’t seem to be much information on how they attempted and failed to solve the combinatorial type problems.

Anyone know any details?

By @pnjunction - 6 months
Brilliant and so encouraging!

>because of limitations in reasoning skills and training data

One would assume that mathematical literature and training data would be abundant. Is there a simple example that could help appreciate the Gemini bridge layer mentioned in the blog which produces the input for RL in Lean?

By @jerb - 6 months
Is the score of 28 comparable to the score of 29 here? https://www.kaggle.com/competitions/ai-mathematical-olympiad...
By @djaouen - 6 months
Is it really such a smart thing to train a non-human "entity" to beat humans at math?
By @brap - 6 months
Are all of these specialized models available for use? Like, does it have an API?

I wonder because on one hand they seem very impressive and groundbreaking, on the other it’s hard to imagine why more than a handful of researchers would use them

By @11101010001100 - 6 months
Can anyone comment on how different the AI generated proofs are when compared to those of humans? Recent chess engines have had some 'different' ideas.
By @imranhou - 6 months
If the system took 3 days to solve a problem, how different is this approach than a bruteforce attempt at the problem with educated guesses? Thats not reasoning in my mind.
By @gowld - 6 months
Why is it so hard to make an AI that can translate an informally specified math problem (and Geometry isn't even so informal) into a formal representation?
By @sssummer - 6 months
Why frontier models can both achieve silver medal in Math Olympiad but also fail to answer "which number is bigger, 9.11 or 9.9"?
By @quantum_state - 6 months
It’s as impressive as if not more than AI beating a chess master. But are we or should we be really impressed?
By @rowanG077 - 6 months
Is this just google blowing up their own asses or is this actually useable with some sane license?
By @c0l0 - 6 months
That's great, but does that particular model also know if/when/that it does not know?
By @atum47 - 6 months
Oh, the title was changed to international math Olympiad. I was reading IMO as in my opinion, haha
By @fovc - 6 months
6 months ago I predicted Algebra would be next after geometry. Nice to see that was right. I thought number theory would come before combinatorics, but this seems to have solved one of those. Excited to dig into how it was done

https://news.ycombinator.com/item?id=39037512

By @lumb63 - 6 months
Can someone explain why proving and math problem solving is not a far easier problem for computers? Why does it require any “artificial intelligence” at all?

For example, suppose a computer is asked to prove the sum of two even numbers is an even number. It could pull up its list of “things it knows about even numbers”, namely that an even number modulo 2 is 0. Assuming the first number is “a” and the second is “b”, then it knows a=2x and b=2y for some x and y. It then knows via the distributive property that the sum is 2(x+y), which satisfies the definition of an even number.

What am I missing that makes this problem so much harder than applying a finite and known set of axioms and manipulations?

By @mathinaly - 6 months
How do they know their formalization of the informal problems into formal ones was correct?
By @__0x01 - 6 months
Please could someone explain, very simply, what the training data was composed of?
By @m3kw9 - 6 months
Is it one of those slowly slowly then suddenly things? I hope so
By @mupuff1234 - 6 months
Can it / did it solve problems that weren't solved yet?
By @amelius - 6 months
How long until this tech is integrated into compilers?
By @dmitrygr - 6 months
> First, the problems were manually translated into formal mathematical language

That is more than half the work of solving them. Headline should read "AI solves the simple part of each IMA problem at silver medal level"

By @refulgentis - 6 months
Goalposts at the moon, FUD at "but what if its obviously fake?".

Real, exact, quotes from the top comments at 1 PM EST.

"I want to believe that the computer found it, but I can't find anything to confirm."

"Curing cancer will require new ideas"

"Maybe they used 10% of all of GCP [Google compute]"

By @gerdesj - 6 months
Why on earth did the "beastie" need the questions translating?

So it failed at the first step (comprehension) and hence I think we can request a better effort next time.

By @badrunaway - 6 months
This will in a few months change everything forever. Exponential growth incoming soon from Deepmind systems.
By @szundi - 6 months
Like it understands any of it
By @thoiwer23423 - 6 months
And yet it thinks 3.11 is greater than 3.9

(probably confused by version numbers)

By @mik09 - 6 months
how long before it solves the last two problems?
By @lolinder - 6 months
This is a fun result for AI, but a very disingenuous way to market it.

IMO contestants aren't allowed to bring in paper tables, much less a whole theorem prover. They're given two 4.5 hour sessions (9 hours total) to solve all the problems with nothing but pencils, rulers, and compasses [0].

This model, meanwhile, was wired up to a theorem proover and took three solid days to solve the problems. The article is extremely light on details, but I'm assuming that most of that time was guess-and-check: feed the theorem prover a possible answer, get feedback, adjust accordingly.

If the IMO contestants were given a theorem prover and three days (even counting breaks for sleeping and eating!), how would AlphaProof have ranked?

Don't get me wrong, this is a fun project and an exciting result, but their comparison to silver medalists at the IMO is just feeding into the excessive hype around AI, not accurately representing its current state relative to humanity.

[0] 5.1 and 5.4 in the regulations: https://www.imo-official.org/documents/RegulationsIMO.pdf

By @gyudin - 6 months
Haha, what a dumb tincan (c) somebody on Twitter right now :D
By @hendler - 6 months
By @machiaweliczny - 6 months
Good, now use DiffusER on algebra somehow please
By @pmcf - 6 months
I read this as “In My Opinion” and really thought this about AI dealing with opinionated people. Nope. HN is still safe. For now…
By @Sparkyte - 6 months
In other news today calculator solves math problem.
By @cs702 - 6 months
In 1997, machines defeated a World Chess Champion for the first time, using brute-force "dumb search." Critics noted that while "dumb search" worked for chess, it might not necessarily be a general strategy applicable to other cognitive tasks.[a]

In 2016, machines defeated a World Go Champion for the first time, using a clever form of "dumb search" that leverages compute, DNNs, reinforcement learning (RL), and self-play. Critics noted that while this fancy form of "dumb search" worked for Go, it might not necessarily be a general strategy applicable to other cognitive tasks.[a]

In 2024, machines solved insanely hard math problems at the Silver Medal level in an International Math Olympiad for the first time, using a clever form of "dumb search" that leverages compute, DNNs, RL, and a formal language. Perhaps "dumb search" over cleverly pruned spaces isn't as dumb as the critics would like it to be?

---

[a] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

By @hulitu - 6 months
> AI solves International Math Olympiad problems at silver medal level

> In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others.

Why not compare with students who are given 3 days to submit an answer ? /s

By @Davidzheng - 6 months
HOLY SHIT. It's amazing
By @data_maan - 6 months
It's bullshit. AlphaGeometry can't even solve Pythagoras theorem.

Not opensourcing anything.

This is a dead end on which no further research can be built.

It violates pretty much every principle of incremental improvement on which science is based. It's here just for hype, and the 300+ comments prove it.

By @NoblePublius - 6 months
A lot of words to say second place
By @iamronaldo - 6 months
No benchmarks of any kind?
By @myspeed - 6 months
This means we may need to remove or replace the Olympiad..It has no practical significance..Winners never contributed to any major scientific breakthroughs.
By @xyst - 6 months
Billions of dollars spent building this, gW of energy used to train it. And the best it could do is “silver”?

Got to be kidding me. We are fucked

By @AyyEye - 6 months
Parlor tricks. Wake me up AI can reliably identify which number is circled at the level of my two year old.
By @dinobones - 6 months
I see DeepMind is still playing around with RL + search algorithms, except now it looks like they're using an LLM to generate state candidates.

I don't really find that this impressive. With enough compute you could just do n-of-10,000 LLM generations to "brute force" a difficult problem and you'll get there eventually.

By @piombisallow - 6 months
IMO problems aren't fundamentally different from chess or other games, in that the answer is already known.
By @rich_sasha - 6 months
I'm actually not that surprised. Maths Olympiads IME have always been 80% preparation, 20% skill - if not more heavily tuned to preparation. It was all about solving as many problems as possible ahead of the papers, and having a good short term memory. Since Olympiads are for kids, the amount of actual fundamental mathematical theorems required is actually not that great.

Sounds perfect for a GPT model, with lots of input training data (problem books and solutions).

By @gowld - 6 months
This shows a major gap in AI.

The proofs of these problems aren't interesting. They were already known before the AI started work.

What's interesting is how the AI found the proof. The only answer we have is "slurped data into a neural network, matched patterns, and did some brute search".

What were the ideas it brainstormed? What were the dead-end paths? What were the "activations" where the problem seemed similar to a certain piece of input, which led to a guess of a step in the solution?

By @balls187 - 6 months
What was the total energy consumption required to acheive this result (both training and running)

And, how much CO2 was released into earths atmosphere?