Google's AI solves 56-year-old math problems autonomously but DeepMind CEO says this is still not AGI

Google DeepMind said AlphaProof Nexus solved nine open Erdős problems using Lean-verified proofs. The claim has sharpened debate over hallucinations in AI maths and what counts as real progress towards AGI.

Advertisement
Google's AI reportedly solved decades-old mathematical problems autonomously. (Photo: AI generated)

Just days after OpenAI said one of its AI models had cracked the famous “planar unit distance problem” first posed by legendary mathematician Paul Erdos in 1946, researchers at Google DeepMind have made an even bigger claim.

According to Google researchers, DeepMind’s AI system called AlphaProof Nexus has autonomously solved nine open Erdos problems, some of which had remained unsolved for as long as 56 years. Researchers also claimed the cost of solving each problem was only a few hundred dollars in computing resources.

advertisement

Google researchers said AlphaProof Nexus also proved 44 open OEIS conjectures, resolved a 15-year-old question in algebraic geometry, and even discovered a new algorithmic parameter in optimisation theory that humans had not previously found.

The work stands out because the system reportedly operated autonomously, generating proofs and checking them with computer-level verification tools instead of relying entirely on human mathematicians.

But the announcement is not just about solving hard math. It is also becoming part of a growing debate in the AI world: how do you actually prove that an AI-generated mathematical proof is correct? That is where Google’s approach differs from many earlier AI proof systems.

Why Google is talking about “hallucinations” in AI math

When OpenAI earlier claimed its AI had solved an Erdos problem, the company said external mathematicians later reviewed and verified the proof.

Google, without directly naming OpenAI, pointed to a major issue in AI-generated mathematics: hallucinations.

Researchers explained that AI systems can sometimes produce proofs that sound convincing but contain serious logical mistakes. In some cases, the AI invents mathematical statements, known as lemmas, and falsely presents them as already established results.

At other times, the AI may avoid solving the hardest part of a problem by simply renaming it as a “helper lemma,” making the overall proof appear complete even though the core difficulty remains unsolved.

These kinds of errors can slip through during informal proof reviews because the arguments may sound technically convincing to humans.

Google used Lean to verify every mathematical step

To avoid those issues, DeepMind combined large language model reasoning , reportedly powered by Gemini 3.1 Pro, with a formal verification system called Lean.

Here, the AI generates proof attempts, while Lean checks every single logical step automatically using strict mathematical rules.

That means unsupported claims, fake lemmas, and missing logic are rejected instantly by the system itself, without requiring humans to manually validate the proof.

Google researchers said this combination of AI reasoning and formal verification could change how mathematicians work in the future.

“Our results support this vision,” the researchers said. “Our mathematician collaborators found that proof attempts by our agents enhanced their understanding of a problem, even when an agent could not prove the claim at hand.”

advertisement

They added that because the proof sketches were formally verified, human experts could focus directly on unresolved parts instead of spending time re-checking the entire argument.

Does this mean AI is reaching AGI?

The announcement naturally sparked another big question: if AI can solve decades-old mathematical problems, is it approaching artificial general intelligence, or AGI?

According to Google DeepMind CEO Demis Hassabis, the answer is still no.

While speaking on the Big Technology Podcast, Hassabis said today’s AI systems remain “nowhere near” true AGI despite these impressive mathematical achievements.

“It doesn't matter how many Erdos problems you solve,” Hassabis said, adding that current systems are still far from the kind of deep originality shown by legendary thinkers such as Srinivasa Ramanujan.

He argued that true AGI would require systems capable of genuine invention and broad intelligence across many domains, not just solving specialised mathematical tasks.

For now, AI may be becoming an extremely powerful mathematical assistant — but even DeepMind’s own CEO believes it is still far from matching human genius.

- Ends
Published By:
OM Gupta
Published On:
May 25, 2026 09:35 IST