Google's Aletheia system solved 6 out of 10 novel mathematical problems in the FirstProof challenge. No human guidance. No hints. Just an AI agent working through research-level proofs that professional mathematicians are actively trying to crack.
This isn't incremental progress. This is autonomous discovery.
What Aletheia Actually Did
The system uses Google's Gemini 3 Deep Think model - a reasoning engine built specifically for complex problem-solving. It tackled problems from the FirstProof challenge, a set of open mathematical questions that don't have known solutions yet.
On top of that, Aletheia scored roughly 92% on IMO-ProofBench, a benchmark based on International Mathematical Olympiad problems. These are the questions that make teenage maths prodigies sweat. Aletheia worked through them like routine exercises.
The difference between solving known problems and discovering new proofs is enormous. One is pattern-matching against existing solutions. The other requires genuine mathematical creativity - forming conjectures, testing approaches, abandoning dead ends, and constructing novel arguments from first principles.
Aletheia did the second thing.
Why This Feels Different
We've had AI that assists mathematicians for years. Tools that suggest lemmas, check proofs for errors, search databases for relevant theorems. Those are powerful, but they're co-pilots. The human is still flying the plane.
Aletheia doesn't need the human. It formulates its own approach, explores its own proof strategies, and produces complete solutions without intervention. The mathematician shows up at the end to verify the work, not to guide it.
This is what fully autonomous agentic research looks like. The system has goals (solve the problem), constraints (formal mathematical rules), and the capacity to iterate independently until it succeeds or exhausts its options.
For researchers, this changes the game. Instead of spending months on a single proof attempt, you could set Aletheia loose on ten different problems overnight. Some will fail. Some might succeed. Either way, you wake up to results, not just more open questions.
The Practical Implications
Mathematics underpins everything from cryptography to physics to machine learning itself. Breakthroughs in pure maths often take decades to find practical applications - and then they become foundational.
If AI systems can generate novel proofs, the pace of mathematical discovery could accelerate dramatically. Problems that have been open for years might get solved in sprints. New conjectures could be tested faster than human researchers can propose them.
But there's a subtler shift here. Mathematics has always been a deeply human discipline - intuition, creativity, aesthetic judgement about which approaches feel promising. Aletheia doesn't have intuition in the human sense. It has computation and search at a scale no human can match.
That creates a strange dynamic. The proofs it produces might be correct but utterly unintuitive to human mathematicians. We might end up with a growing body of proven theorems that we struggle to understand conceptually, even though we can verify them mechanically.
What Comes Next
The next step is obvious: point Aletheia at the big unsolved problems. The Riemann Hypothesis. The P vs NP question. The Birch and Swinnerton-Dyer conjecture. These are the kind of problems that define careers and sometimes entire fields.
Will Aletheia crack them? Maybe not tomorrow. But the fact that we're even asking the question tells you how far this has come.
For now, the key result is this: AI systems can do original mathematical research. Not assist with it. Not speed it up. Actually do it, from scratch, without a human in the loop.
That's a different world from the one we were in six months ago.