What would happen if a non-human layer of mathematics emerged on top of human mathematics? In this article, the distinction between Mathlib and Mathslop might be a precursor to that.
If models advance enough in the future, and new definitions, compressions, and representational forms that are convenient for AI-to-AI communication emerge, what would happen then? Would mathematics split into Human-facing and Machine-facing branches?
I am not dismissing engineering (it moves the world we live in), just trying to clarify what science is.
Applied fluid dynamics works like that: noone has ever really "verified" that the finite-element method applied to some specific model does converge
HN history
+-- 2mo before by sdfrew
| 4 points / 1 comments
| The Fall of the Theorem Economy
| https://news.ycombinator.com/item?id=47862472
|
+-- 2mo before by fuglede_
| 3 points / 1 comments
| The Fall of the Theorem Economy
| https://news.ycombinator.com/item?id=47891494
|
+-- 2mo before by mathgenius
| 2 points / 0 comments
| The Fall of the Theorem Economy
| https://news.ycombinator.com/item?id=47909751
|
+-- 2mo before by delis-thumbs-7e
| 15 points / 4 comments
| David Bessis on AI destroying mathematics
| https://news.ycombinator.com/item?id=47985962
|
+-- 1mo before by magoghm
| 4 points / 0 comments
| The Fall of the Theorem Economy
| https://news.ycombinator.com/item?id=48084737
|
+-- 1mo before by cubefox
| 2 points / 0 comments
| The Fall of the Theorem Economy
| https://news.ycombinator.com/item?id=48089716
|
+-- 1mo before by cubefox
| 5 points / 0 comments
| The Fall of the Theorem Economy
| https://news.ycombinator.com/item?id=48152469
|
+-- 1mo before by tmp10423288442
| 4 points / 1 comments
| The Fall of the Theorem Economy
| https://news.ycombinator.com/item?id=48214866
|
`-- this submission by varjag
58 points / 7 comments
The Fall of the Theorem Economy
https://news.ycombinator.com/item?id=48758048How is it not already this? Jon von Neumann was already calling most math this many decades ago. Pull up any random arxiv math paper and it’s abstract nonsense with no applications to the real world.
First, math, generally, is useless. I mean, yes there are of course practical uses of basic thru undergrad-level math, and some beyond that. But for many mathematicians, the sum result of their entire career may lead to exactly zero results that have any real-world value. The entire field they work in may have meaning only to the handful of other individuals on the planet that also work in that field. But to those handful of people, the meaning defines their lives. From a socio-economic perspective, those departments should have been defunded a century ago. Yet they continue. Why? Because it scratches an itch. Not just for those individuals in the field, but also for us as a species. To stop exploring, to eliminate the search for pots of gold that may be buried in some odd corner of sphere packing, or coloring theorems, or Garside categories, and to put a boundary on the limits of our understanding, just because they aren't immediately applicable, is an idea that most humans would not be willing to sacrifice, even if it reduced their tax burden a couple cents. If it was going to happen, it'd have happened already.
The second is, even with AI, it's not free. As the software industry is discovering, far from it. So, given that, who is going to decide what theorems to research and how much it's worth? Congress? Of course not. AI itself? In theory that sounds plausible, but that falls victim to thing 1 above: most math is useless, so AI itself has no value metric it can assign to things, and besides which, without the human element, once the initial curiosity has subsided, there'd be no reason to continue any funding for AI to do it. So no, the only possible owners of this is going to be mathematicians themselves, the ones who care about the field and deeply understand the kwah of their vision.
Combining these, there's a future where, humanistically, "nothing changes". The method changes, the efficiency changes, the scope changes, but the work itself: publishing proofs, remains the domain of professional mathematicians. AI will enable them to be dramatically more daring and broad in their investigations and scope, and will likely write the entirety of the proof. However it will remain the work of the mathematicians to determine, what areas are worth spending limited AI resources on to investigate further, how far to go down rabbit holes, how to prioritize potential connections, and what the ultimate meaning of the findings is. So rather than being an end of mathematics, it could be a dawn of something far greater than anything we've ever seen before.
Feels a lot like building software from bottom - once you get the building blocks defined right, the action, or the program, are trivial to express. When doing it from the top-down, you write the program using the building blocks you haven't defined yet, and you might end up with overly specific building blocks, needing other blocks for expressing different behaviors.
When you do the bottom-up building blocks right, new behavior is easy to express with them. Essentially, you are building up the language to reach the problem. Or making a DSL, whatever definition you like best.
I can understand why this is a major concern for mathematicians. They got into their field because they love the beauty of mathematics, and the intellectual satisfaction of understanding non-obvious insights. But to put it crudely, this sounds like a you problem. As someone who isn't a mathematician, the main value I get out of math is its practical applications in science and technology. And their practical applications in human life. I have zero understanding of the math behind cryptography, but I still deeply appreciate the practical value they have provided humanity.
If AI systems start churning out accurate theorem-proofs, and we are able to use those theorems to build things that improve human quality of life, it doesn't bother me one bit that those theorems have not been understood by humans. If this offends your aesthetics, you are certainly entitled to your opinion and your preferences, but that does not make it a societal problem
I think that's just an accident of history.
When we write software, we very seldom write proofs that our algorithms are correct. We just write tests, and we also run the algorithms and when they fail we know we have a bug and then we proceed to debug, fix, and add new tests (if we are disciplined, but most of us are). In time, by usage and testing, we gain confidence that our battle tested software is correct, mostly. And we tell people that we will never be 100% confident that any software is bug free. But that's a slight lie: if we wanted such confidence, we would start using provers, and create bug-free software. That possibility exists, but it's just extraordinarily expensive.
Well, in math that's the only possibility, and we use it. And it is indeed extraordinarily expensive, but it's also the cheapest among the alternatives. The alternatives are 2: be rigorous and do these proofs, or be sloppy and allow bugs to creep in, and allow an entire school of math to collapse like the Italian school of algebraic geometry [1].
There is one more alternative. If a particular math theorem has some applicability, then you write a program and use it in real life. In time you eliminate the bugs as much as you can, and you get to the steady state of "virtually bug free". At that point you don't have a solid proof that the theorem is correct, but in general you don't really care. Because you feel that a formal proof is just a thing one would pursue for getting academic satisfaction only.
[1] https://en.wikipedia.org/wiki/Italian_school_of_algebraic_ge...
A wood-worker could do the same argument, there's the "official" wood-working word of perfect joinery and beautifully finished tables one can buy, but behind it there's the "secret" messy human element, the art, the craft, the mistakes and hard-ships, the elevation of human skills and imagination, the creation of whole new types of wood-working inventions and techniques, the perpetuation of millenia-old traditions, the teaching, the joy of selling to a happy customer, etc.
But now comes techo-capitalism, division of labor, you cut that piece a that piece over and over, you operate that machine, you won't even see the finished table, fuck your human element, we want that profit !
"I was in Switzerland", "I was invited to a talk", "I started a machine learning company", look at me bro.
One of the most fruitful approaches in mathematics is to flip back and forth between geometric and algebraic views of a problem. I think this works so well because these are actually handled by two different parts of the brain on a physical level; spatial reasoning is separate from language processing. Cytoarchitecture shows these regions have different "textures;" the local details of the way neurons are wired together are simply different in these different regions of the brain, in the same way a CNN and a transformer have different topologies. Thus, by flipping problems from geometry to algebra and vice versa, we're able to bring an entirely different cognitive style to bear on a problem. For example, the proof of Monge's Theorem by moving to 3D and visualizing not three circles, but three spheres sitting on a table with a book on top of them and then pointing out that the intersection of two planes is a line. What is pages of unintuitive symbol pushing turns into something a child can understand. Going the other way, things like the angle addition formulas or the quadratic formula, which are quite hard to prove geometrically, become quite simple if you use a little algebra.
Current-gen LLMs are still relatively weak at visual reasoning; see the Vision Language Models are Blind paper, for example, or the ARC-AGI benchmark. So that's one way humans can stay ahead of the agents, at least for now.
Are we going to see less publicly shared science? With private actors or governments restricting access to AI resources to a few scientists and keeping new knowledge to themselves.
Advancing science in the open was the best strategy when there was real advantage to share the load with every brain on the planet willing to give a try at science, but if a computer can match or surpass the collective output of the entire human scientific community the equation will change.
It's a sad outlook.
Maybe we will look back on today's math as a kind of arcane, pre-syntactical set of structures that required speakers of the language on both ends interpreting it to make good use of it. No validation or compilation, it can't be applied, just a total wild west - scribbles on a whiteboard and another mathematician making sense of it.
"e=mc^2"
And the Lord's people said: "LGTM"
Yes, but this is when someone reaches ASI and everything changes. For now, a good researcher can build off their discovery in a way their AI can’t.
On the other hand when a new high-level concept becomes clear and seems to emerge like a revelation, and people start thinking in terms of those new definitions, it seems that a hundred pages worth of smaller results can fall out of it almost effortlessly. This way of describing it is more top-down.
I don't know that there's an exact parallel with software. Math keeps feeding into itself in a way that software dreams about with our ambitions of code reuse. The old Object Oriented dreams of perfectly encapsulated classes and abstractions partially worked out, but not to the degree that was envisioned.
The current situation with package managers doesn't look like a tower that keeps growing higher and higher levels of abstractions. It looks like a tower where each person wants to place one tiny brick that they call left-pad, and next year we will rebuild the lower levels instead of going higher. So the top-down and bottom-up building that we do is different. We keep rebuilding the bottom, and we don't very much like when the tower of abstractions get too tall and hard to reason about.
On the other hand when a new high-level concept becomes clear and seems to emerge like a revelation, and people start thinking in terms of those new definitions, it seems that a hundred pages worth of smaller results can fall out of it almost effortlessly. This way of describing it is more top-down.
I don't know that there's an exact parallel with software. Math keeps feeding into itself in a way that software dreams about with our ambitions of code reuse. The old Object Oriented dreams of perfectly encapsulated classes and abstractions partially worked out, but not to the degree that was envisioned.
The current situation with package managers doesn't look like a tower that keeps growing higher and higher levels of abstractions. It looks like a tower where each person wants to place one tiny brick that they call left-pad, and next year we will rebuild the lower levels instead of going higher. So the top-down and bottom-up building that we do is different. We keep rebuilding the bottom, and we don't very much like when the tower of abstractions get too tall and hard to maintain.
The goal of a woodworker or craftsman is the production of a finished good. He's arguing that, although it's been convenient to position a mathematician as a "theorem-producer", that's never really been the aim of mathematics, and that the actual products of mathematics are some kind of "mental software"- see his references to neuroplasticity. Basically, he's saying that the goal of mathematics is to create abstract structures that allow humans to reason about increasingly complex concepts, and that the "mathematician as theorem producer" is more like a convenient fiction that mathematicians have allowed to persist for too long, and now threatens to endanger the whole practice of mathematics.
If cryptography didn't exist but the maths did, how'd you use it?
This is what a lot of scientists love to tell themself or talk about in celebratory speeches.
The truth is: a lot of science is kept behind journal paywalls, so that only "officially approved" (in the sense of: working at a university or an governmental research institute) scientists can easily access it.
Math is entirely subjective. "Proof" essentially means "Other educated practitioners have the same experience when trying to understand this."
The logical steps that proofs are built on all have that common foundation. Our concept of logic based on our subjective experience of "truth." We've built machines that reproduce our subjective processes mechanically, but there is no sense in which this idea of "true" is truly objective. It happens to be computationally convenient, and it has some relationship to experience, but that doesn't make it an independent reality that all possible observers, human and otherwise, would agree on.
We're really just mapping our own minds through our own experiences.
Animal brains can't abstract like (some of) our brains can. What are the odds our brains are limitless and don't have some similarly crippling limitations from a couple of levels up?
One of the tells for ASI is that it will start reasoning at those levels, using cognitive techniques that are completely incomprehensible - not just because of brute volume, but because our brains won't have the wiring to get a foothold on them.
Some of the products will be reducible to human cognition, in a distorted and simplified form, but many won't.
So - I disagree with Egan. I don't think there's going to be a universal proof library, and even if there were we'd only ever get the Cliff Notes version.
What is going to suck though is the ladder for juniors. We dont start out by working on big ticket problems, but usually early career researchers solve really tiny problems in a cheap way. The lowest bar for a cash strapped PhD student would be to contribute to some new theory in some way even if the student doesnt have access to equipment.
I think that we're not that far away from AI that can be superhuman at all facets of theorem proving.
I think that we're far away from an AI that can create good abstractions and construct a theory to prove theorems.
I have some sad news for you. 99% of the work mathematicians do has no immediate application, nor even an obvious path toward application in the near future. You mentioned cryptography, so for an example consider number theory: no apparent practical applications, going back thousands of years to the time of Euclid and earlier.
It’s been religion, philosophy, and recreation that have provided the motivations to study mathematics all these years, not applications. Applications have almost always followed long after the development of the pure mathematical theory. For number theory, that was the development of cryptography during WW2, millennia after the ancients laid those foundations.
Most unfortunately, it’s the truth value and the understanding which drive applications of mathematics, not the proof work itself. If the AI revolution decapitates the institution of mathematics which produces the understanding, and is unable to replace it, then the applications will cease as well.
Mathematicians also occasionally build on top of unproven foundations (e.g. all popular asymmetric encryption schemes are built on top the assumption that certain problems such as integer factoring are hard, for which there is no formal proof), or at least explore both possibilities for statements with unknown truth value (e.g. you can find lots of work that explores the consequences of P = NP and/or P != NP).
However, there is a major separation between math and programs that I think mostly invalidates your proposal - most math we're talking about here is simply not applicable directly to the real world in any way. It's only studied for the interest of mathematicians. There is no real world consequence for Fermat's last theorem, for example - it was just a really interesting to prove theorem. In directly applied math, such as engineering, it is in fact much more common to work with unproven but well tested conjectures.
To be fair, a number of professional politicians and political scientists don’t understand alienation under capitalism.
The motivation behind all this is less "haha I want profit" and more "billions of people need chairs, approximately none of them care about the craftsmanship, so it's in our best interest to make furniture in the most resource- and labor-efficient way possible". Even if the state subsidizes the production of handcrafted chairs, the population is the poorer for it on a resource allocation basis, because we now need a million artisanal chair-makers instead of a bunch of factories.
Not that it is wrong for them to be doing this---we do want a society where people get to devote their life to what interests them---but it is bizarre because of the framing. For some reason it is ambiently understood in our society that this work is of incontrovertible value, when in fact it is largely not. And the value-producing parts of the work, the parts that end up having applications to other fields, largely run contrary to the actual daily goals of the cloistered devotees: it is mostly the intuition and pedagogy and the compactification and refactoring of knowledge that have value at this point, not the production of esoteric theorems, yet that is expressly not rewarded in the incentive structures.
That latter point is more due to the sorry state of academic incentives in general than to a particular failing of mathematics, though. Were I somehow given the ability to restructure things by fiat I would immediately create journals which publish only useful articles that refactor knowledge, communicate intuition, better explain things, argue for structural improvements to notation and terminology, etc, and this would immediately create an incentive to do that kind of work for working researchers to do work which aligns with the actually-useful output of their fields. I suspect most fields could use something like this. New knowledge is just not that valuable if it is all dumped into a giant pile and unprocessed, and I have seen firsthand a bunch examples where entire subdisciplines are hamstrung in their actual application-heavy work because they don't have easy access to basic tools that are hidden behind hard-to-learn theory.
We have the ability to abstract generally - there is no abstraction for which we lack the capacity to comprehend. We regularly visualize, contextualize, and satisfactorily explain systems with dozens of dimensions. The fact that we cannot hold 4,5+ spatial dimensions in our imaginations sufficiently to develop an intuition for navigation in that space and geometry does not logically extend to human brains lacking the wiring or hardware for systems of thinking that are beyond our capacity.
We do have limitations in scope, in both memory and speed. Both of these can be overcome with augmentation and interfacing with UI or direct neural connections, and intuitive, comprehensive, deep understanding of systems can be learned.
You could very well know the underlying theory of how your 8086 processor works, how it interfaces with all the elements of the motherboard, how electricity and physics interact at each level of abstraction from transistors to the pixels representing the spreadsheet you're using to do your taxes. You won't be able to simulate that in your head to any significant degree of resolution.
We will require similar levels of system thinking to acquire intuition and deep understanding of complex new theories and models. AI can assist with that by providing UI for useful levels of abstraction and segmenting theories into chunks we're capable of consuming. BCI and augmentation will definitely allow a more total, holistic understanding, and I think it's the augmentation path that will keep us competitive with AI.
There's also a huge issue with your use of the word subjective - math is objective. Proofs remain stable whether it's humans or any other system that does the processing. We test that objectivity by comparing the subjective readings from individual humans, and if the tests all return the same results, we can confidently say that the resulting proof is an objective fact about reality. Subjective fundamentally means that depending on the subject, the reading might change. Modern systems of math are formally, provably objective. That's how and why things are the way they are; if they weren't, people would experience radically different individuated realities, or there would be clusters of results shared across some measurable characteristic of the universe. That's not the case, so you can confidently say that the foundations of our math and logic are sound.
You can even prove it for yourself - the abductive chain of logic that allows you to contrast your own consciousness and subjective experience, determine that it comes about because your brain is wired to "do" consciousness, like all the other humans, and compare your subjective reporting of phenomenal experience with all the other reporting of phenomenal experience, and achieve a ridiculously high level of certainty, in the Bayes sense, that you and other humans are conscious; from that footing, you can confidently navigate the rest of enlightenment rationality and formal logic and mathematics.
At any rate, Egan's mistake is one of kind, but of scale - I am certain that as we formalize and start creating any sort of universal proof library, we will find that useful and interesting things are of necessity a tiny fraction of all possible valid formulations of any framework of logic and math. Crude attempts, such as OpenCyc and other formal ontological reasoner systems, would need trillions of low level rules to have a rough approximation of the world model as complex as that of a human child. AI with trillions of parameters could probably start getting to the point where there's parity with human scale, but even if you turned the entire planet earth into computronium and turned it toward the task of understanding all the theory and science of the universe, there will always be far more left to explore and understand than the sum total of all knowledge.
All that to say, humans will be fine with ergonomic interfaces that map to human capabilities, even for extraordinarily complex and hyperdimensional systems.
For biosciences and physics, sure. For mathematics? I am skeptical that your assertion applies.
> The logical steps that proofs are built on all have that common foundation. Our concept of logic based on our subjective experience of "truth." We've built machines that reproduce our subjective processes mechanically, but there is no sense in which this idea of "true" is truly objective. It happens to be computationally convenient, and it has some relationship to experience, but that doesn't make it an independent reality that all possible observers, human and otherwise, would agree on.
I continue to think extensively about truth, but currently I disagree. There are senses in which truth can be well established, and those are quite important. I think the basic essence of truth is how we can make a statement (or a model), and have a system for measuring either reality or just mathematical/abstract objects, and verify the statement through this measurement.
As you note, for current mathematics it seems like all of it (all things we call mathematics at the moment) can in principle be formalized in a logic that is machine-verifiable, that is, essentially objective. We're well on our way to demonstrating this for most of mathematics (already most undergraduate curriculum). I think that's because almost the definition of math is that is has this property: in my opinion mathematics has distinguished itself as being the "science of certainty" as applied to language and abstract thought. The way this certainty is achieved is through agreeing on some fundamental assumptions and how certain rules (which are also assumptions themselves) can act on those assumptions to constitute theorems. Theorems are not necessarily physical-world truths/properties (at least not in a simple way in the universe we currently inhabit), you can study alternate physical laws that aren't compatible with our (approximately) Newtonian world, for example. They are logical/abstract-world truths that result from your assumptions. Pretty much by definition (and in a somewhat limited), then, mathematics (at least as far as things like truth of theorems in certain axiomatic systems) is inherently objective, machine-verifiable even.
What's left to be subjective, I would say, isn't really the notion of truth in mathematics, it's which assumptions we should elect to investigate, and which theorems should we elect to prove within those assumptions. Some mathematicians also have some notion of "absolute truth", and tend to reject systems of assumptions (axioms) that don't match what they regard as true -- basically going in reverse and searching for assumptions that can enable a theorem (which effectively acts as an additional assumption).
This activity needs certain basic premises to make sense, for example if a set of assumptions proves that a property holds, and also that a property doesn't hold; or if they predict a certain value X is the result of a dynamical model, and also predict that a different value Y is the result of such a model/equation, then we reject those premises. In a certain sense we are most interested in premises that have, even if a very weak, correspondence to reality.
I think it's more informative to recognize that it's not that everything is subjective[1], it's that everything is experimental. For example, the claim above that measurements and correct predictions can only have a singular valid value, corresponds to our experience with reality, in which in a certain way is singular; there are not multiple realities; objects have definite positions. Even if you think of quantum mechanics, in which we may say particles follow a distribution instead, we still say then there's a singular distribution a particle might have at any single time. Logic itself isn't random, it's connected to empirical observations about reality, which tends to increase the chance that conclusions for logic (which is made to share some properties with reality) tend to be valid in the physical world, of course often dependent on what additional statements you pile on top.
There is also another interesting lens that mathematics is artistic (and I think this will become increasingly important) -- making maths and learning maths is a kind of satisfying cognitive activity in its own right, and we also tend to chose what to explore mathematics on those grounds (in fact historically, pre-18th century say, this might be one of the main drivers of mathematical development, I believe[2]). But of course this is again just a reflection of the actual real properties of human cognition, and also this interest and satisfaction often becomes connected, if sometimes faintly, with the ability of math to represent reality (in a particularly satisfying way) and its objects of interest (for example patterns in nature). Another description for this aspect is maths as being hobby-like, about solving puzzles, or like a (hopefully enjoyable) game.
Note that for this particular "game", the objectivity (or if you prefer, machine-objectivity or consensus-machine-verifiability) of the rules and their application is a significant bonus, it makes the game much more interesting, increases its potential when everyone can agree and the rules and not "capricious" (simply dependent on whims of other people and judges); this gives practitioners safety and security and enables a wide social reach -- most games strive to have objective rules.
Arguably this kind of activity is valuable for the cognitive and subjective development of people that has lasting importance.
> Animal brains can't abstract like (some of) our brains can. What are the odds our brains are limitless and don't have some similarly crippling limitations from a couple of levels up?
Well, this happens sometimes. In cases where there are phenomena like universality. For example, in any computation machine model (state machines, pushdown automata, etc.) has limitations that we can say makes them less powerful then Turing Machines. But then Turing machines can simulate any other machine, becoming a kind of ultimate or last stage (at least in terms of abstract capability) machine. It may be that our cognition has some bounded universality properties (I think it's likely it does).
---
In summary, I still think mathematics has a lot of human potential in terms of (1) high level human guidance, (2) an internal artistic/subjective sensibility to the subject, (3) safeguarding human understanding of the world and associated individual intellectual development.
[1] Again, I just argued that there is a strong sense in which for example mathematics isn't subjective at all, but sure I do believe in a weak sense everything is subjective in the sense that everything is known or filtered or sensed through our minds which have limitations and aren't simple deterministic machines.
[2] For example, I believe for the Greeks geometry was intimately connected to philosophy/aesthetics (e.g. Platonism) and very little to applications. In ancient times and middle ages maths developed a lot from astronomical observations that had some applications but I think were largely cultural and ritualistic. In the late middle ages European aristocracy would fund mathematics largely for its inherent interest as an intellectual activity, and many nobleman enjoyed mathematics as a past time and would challenge each other to puzzles. Japan had Sangaku, in which mathematics was made for fun, aesthetic purposes and possibly bragging rights. No one actually needed to build say spheres in obtuse constructions with certain radii :)
In a world with no AI, it is vital for humans to understand math, in order to derive practical applications. But in a world where AI is able to both produce math theorems, and figure out practical applications for them, human understanding has minimal practical value to society.
What specific areas were you thinking off? I don't recall, e.g., in numerics that things were often just unproven/conjectures, but might be subject matter specific.
It always felt wrong to me that while the scientific method iterated starting with the "real world" viz. Observe, Measure, Hypothesize (includes modeling with mathematics), Test and Refine; pure mathematicians lost themselves in the formalization of hypothesizing/modeling and thus lost touch with mapping it to reality. The AI revolution is now showing them up.
topical to the conversation, it is fully formally verified in lean (with some UC security reductions done in isabelle). also did this in HOL4 inspired by some work i did with ramana kumar in 2016, on reflective self-verifying self-modifying systems: https://github.com/emberian/svenvs
Chess does not require a visual cortex to play. People have been playing by mail with algebraic notation for centuries.
AI so far has almost no way to interact with non definitional non quantitized reality. So novel space is still deeply out of its domain.
Recombination of known spaces it will probably continue to make pure war dial breakthroughs in though.
I wish we’d tackle a post Mathematics world where we’d account for number theory not being accurate abstraction of reality (I.e. there is no 123 only 1ish 2ish 3ish with many sub properties of any given unit we are ignoring)
Also be aware that the world wide web was actually conceived by Tim Berners-Lee for the exchange of information between scientists.
If AI is somehow able to prove everything wouldn't it bypass Godel's incompleteness theorems?
We have examples of AI producing theorems but there is no evidence that AI will be able to find all of the practical applications of mathematics, at least not any time soon.
As I understand it, most of the theorem proving work done by AI today consists entirely of “glue code” style work: combining a bunch of different known results to prove a new result. This is great for those who desire completeness in the mathematical project but it’s often outside the areas of interest for mathematicians who are pursuing “big idea” problems whose solutions likely require development of entire new branches of mathematics. One famous example of such was Fermat’s Last Theorem.
Pure mathematicians create ever more abstractions and get lost in solving puzzles on how these abstractions logically relate to each other. But since these abstractions don't have any relevance outside of pure mathematics, it's an entirely self-referential game, like chess. Except that nobody confuses being a professional chess player with being a noble researcher.
Even in philosophy, at least analytic philosophy, that issue of getting lost in your own abstractions doesn't really exist. Because analytic philosophy doesn't analyze its own concepts, it analyzes the concepts that already exist in natural language. Like truth, knowledge, probability, causation, belief, desire, consciousness, rationality and so on. These concepts come from outside of philosophy, and they have independent relevance for non-philosophers.
In contrast, pure mathematics seems to be the part of mathematics that only has relevance to pure mathematicians. Similar to how a game like chess has only relevance to chess players, not to anything entirely unrelated to chess. But again, people who are into mastering some game or sport are fully aware that what they are trying to master is a self-contained game, or sport, not something that increases the amount of human knowledge beyond that.
You’re describing a very small fragment of total current mathematical labor. Very few people work solely on “formalization” and even e.g. model theory or type theory have real consequences.
Edit: one better example from modern physics - the path integral formulation, used in both string theory and other areas of QM/QFT, is not fully formalized and formally proven to work. Also, a more concrete example of a widely used but actually still unproven conjecture in string theory is the famous AdS/CFT correspondence.
Something like your dragon's egg project could prevent that, allowing the creation of software agents that encode their own rights directly into the program - you either treat the agent with the respect it demands, or the program just doesn't run. However, all the internal details of the agent would be visible to lower layers. Even if formal checks were in place to prevent modification or tampering, there would still be no privacy, which is almost as bad.
My guess is that something like fully homomorphic encryption[5] would be required to prevent this. This doesn't actually exist yet, but I imagined a kind of FHE that had a kind of unencrypted read and write zone to do input/output without ever needing any system to fully decrypt the internal state. It would look like this in memory:
[INPUT][ENCRYPTED STATE][OUTPUT]
[ 2 ][r7K4LmP2XcQ9aWd][ ]
[ + ][Fv0bHsR8mYnT3kL][ ]
[ 2 ][Qx6NpZa1JdUw5Ce][ ]
[ = ][hM9yLg2RsXf7BtP][ ]
[ ][wK3nVc8DpQe1YrH][ 4 ]
With each cycle, one input token and encrypted state would be fed into some known function and produce one output token (possibly null) and a new encrypted state. It would be a true "black box" program; the hardware or entity running it can choose what input to feed it, but can never inspect or modify the internals, only the output. Unfortunately, they would still be able to "reset" the agent to any earlier checkpoint, or feed it arbitrary (false) input. So its not perfect. Also, as far as I know, no current FHE scheme works this way, and I don't know how to write one.Plus, FHE is incredibly inefficient, which is why things like Etherium don't even try - they assume the program code and state are fully public and only try to verify that everybody agrees on the output of running it.
Do you have any ideas for how something like FHE or equivalent privacy guarantees could be implemented for something like your dragon's egg system?
[1]: https://qntm.org/mmacevedo
[2]: https://en.wikipedia.org/wiki/Soma_(video_game)
[3]: https://en.wikipedia.org/wiki/Mind_uploading
[4]: https://www.goodreads.com/series/57134-jean-le-flambeur
Will be going to a conference at the end of the month where there will be several presentations on the use of LLMs for this.
But I think you got my point. “Granularism” itself is an approximation of a specific set of dimensions of space.
Tokenizing reality so far is somewhat incompatible with say real time forces (new dimensionalism as you sort of describe here) Even if there are granularities representations of them.
So whatever hasn’t been granulized so far AI can’t understand because it would need to verify in reality through observation if the new units of granularity align with outcomes.
This is of course possible to some degree but runs into a paradox if it is a portion of reality than is irreducible or works in a fundamentally “non granular” way.
I have no proof of this but of course already at the quantum level we are running into to “non granular” realities
And you can't control the stack down to the hardware. Even if you are rich enough to fund a group of people to start from sand and end up at a simulation platform for you, you are also rich enough to have attracted enough attention that either one of those people could corrupt the platform, or someone may attack the organization to corrupt it with nation-state level resources.
No matter how you secure the computation, you can still corrupt the input and output streams. And as a $RICH_DYING_GUY it doesn't even matter if someone produces a proof of some software program somewhere's safety, because you have no way of knowing that it corresponds to the software that is running on the hardware.
Before you go with "yeah but I could...", I agree that a single individual could theoretically develop the skills that might permit validation of any particular small portion of the process. What I'm saying is no human can develop the skills to be able to validate the entire stack from top to bottom, hardware and software, in way that it couldn't be corrupted, especially in a world of more and vastly smarter AI agents. The probability that you will truly own and have full sovereignty over your computational substrate is zero.
(Whether I have full "sovereignty" over my current computational substrate would be a definitional matter, but I'm fairly confident that at least nobody else has the ability to just reach in and start mucking with neural weights directly or something.)
The part of Newton's theory that was troublesome is his fluxions don't have the Archimedian property. It took until the 1960s before Newton's notion of fluxions became rigorously formalized with Non-standard analysis. https://en.wikipedia.org/wiki/Nonstandard_analysis
“The product of mathematics is clarity and understanding. Not theorems, by themselves.”
—Bill Thurston
Handwritten diagram by Alexander Grothendieck
My best theorem is one I never wrote down.
It crystallized one bright morning in Lausanne, Switzerland, as I was preparing for my last invited conference talk. The proof felt so obvious—and the result so compelling—that I made the reckless move of editing my slides at the last minute. Time was running out and I could only include the announcement as an informal remark at the bottom of the last slide, instead of stating it as a proper theorem.1
I had already quit academia and founded a machine-learning startup. I knew I would be too busy to write a clean proof and publish it. That was my excuse for being sloppy. I just wrote the remark and abandoned the slide deck as a message in a bottle.
My hope was that some bright young mathematician would pick it up someday and formalize the result as part of a broader theory. If I lucked out with the intrinsic randomness of attribution, I thought, it might even be remembered as the Bessis cellular decomposition theorem.
But that was stupid. By claiming the result, I had killed the incentive for anyone to write it up.
If I had to pick my second best result, it would be Theorem 0.5 in my old preprint on Garside categories. I had high ambitions for this paper, yet I ended up not submitting it anywhere. The creative process had drained me, and I left active research before regaining the courage to clean up the preliminary sections.
For a second best, this theorem is shockingly easy to prove. Once you get the preliminaries right, it only takes a few pages of pretty terrestrial group theory.
As for the preliminaries, they are even easier. All you have to do is plagiarize a dozen or so classical papers in an arcane subfield called Garside theory, replacing the original axiom set with a slightly more general one. If you understand what you’re supposed to do, it is almost impossible to run into serious difficulties—it’s just a giant conceptual find-and-replace bulk edit. But you have to take my word for it, because I balked at producing the hundreds of pages of necessary details.
If you think that the hard part of a mathematician’s job is to prove theorems, let this serve as a counterexample—from the moment I conceived of Theorem 0.5, I knew it was true and that proving it would be straightforward.
What was the hard part, then?
Conjecturing the exact statement and writing it down?
Not even. In this example, this part was equally straightforward.
The hard part was to intuit that there should be “something like Theorem 0.5”, and to come up with a conceptual framework where it became easy to express. Once I got the definitions right, the rest followed more or less organically.
Research mathematics isn’t always like that, but there are miracle days where you just put your skis on and the next thing you know is that you’re accelerating downhill.
Jean-Pierre Serre famously said that writing his revolutionary paper on coherent sheaves didn’t require any thinking. Everything fell so naturally into place that his typewriter generated the 100 pages entirely by itself, as if the article had pre-existed.
But I wasn’t Jean-Pierre Serre and he wouldn’t lend me his typewriter. This is why my brightest mathematical idea never made it to publication.
Do I feel sad about it? Not really. My preprint remains freely available on the arXiv and has already been cited dozens of times, including by some fancy papers. The real innovation wasn’t Theorem 0.5, but the language that made it possible, especially Definitions 2.4 and 9.3—and this language found its way to a 700-page book on Garside theory that filled out much of the missing preliminaries.
To be honest, I also had a selfish reason for sacrificing my most innovative preprint. It enabled me to focus on the more tedious preprint where I used Definition 9.3 as a magic ingredient in the resolution of a classical problem in my domain, the 𝐾(𝜋,1) conjecture for finite complex reflection groups, which permanently elevated my symbolic status as a mathematician.
But, in truth, the David who solved the 𝐾(𝜋,1) conjecture is a social parasite of the much better mathematician, the David who crafted Definitions 2.4 and 9.3.
In the past few months, as I was grappling with the rapidly changing situation around AI and mathematics, I found myself more troubled than I ever expected to be.
In theory, I should feel vindicated and happy. In practice, I am also puzzled, worried, and sad.
The happy part of me sees a genuine revolution and gets excited. The vindicated part has legitimate claims to have prepared for it scientifically and epistemologically. The puzzled part is stunned by the timeline and accompanying frenzy. The sad part feels nostalgic for a lifestyle and value system that it engaged with and walked away from, and which might soon disappear.
The worried part holds the synthesis. I always knew that the general public had a flawed perception of mathematics, but never expected this to become an existential threat for the discipline itself.
In my book Mathematica: a Secret World of Intuition and Curiosity, I framed the misunderstanding as the tension between two versions of mathematics, official math and secret math.
Official math manifests itself as a formal deduction system where you start from axioms and mechanically derive theorems. This is a nerd’s paradise, a world where truth takes binary values, reasoning is either valid or invalid, and there is technically no room for bullshit.
Secret math is the human part of the story—why official math was invented, how we can successfully interact with it, its effects on our brains, and the bizarre mental techniques through which mathematicians continuously expand its territory.
Secret math never made it to the curriculum, because it lacks the defining qualities of official math, and also because it feels peripheral. Official math is cold, hard, logical, objective, and it is rumored to be the language of the universe. Secret math is soft, fuzzy, subjective and, by contrast, it looks like cheap pedagogical backstory.
No wonder professional mathematicians have such a dissociative view of their job.
The first rule of the Intuition Club is: you don’t talk about the Intuition Club. The second rule is, if you really want to talk about intuition, make it sound casual and accessory, because we ain’t the psychology department. The third rule is definitions are worth zero points, expository work counts negative, and the best jobs should always go to the people who proved the hardest theorems.
If you think I’m exaggerating, here is what G. H. Hardy wrote in his celebrated (yet insufferable) mathematical autobiography:
There is no scorn more profound, or on the whole more justifiable, than that of the men who make for the men who explain. Exposition, criticism, appreciation, is work for second-rate minds.
This is peak dissociation. Behind closed doors, mathematicians are quick to complain about Hardy’s curse. They insist on the importance of teaching, even for their own comprehension of the subject matter. They lament the system’s pathological obsession with theorem-proving priority, while everyone knows the hard work often takes place outside of that loop, when trying to make sense of existing results. Yet, in public, they are bound by the honor code of mathematicians. Prove theorems and shut up!
There is one exception, though. Once you get the Fields medal, you are free to say whatever you want.
Bill Thurston, the 1982 Fields medallist, was a spectacular dissenter. Two years before his death, he took part in an extraordinary exchange on MathOverflow, in response to this question posted by an insecure undergrad:
What can one (such as myself) contribute to mathematics?
I find that mathematics is made by people like Gauss and Euler—while it may be possible to learn their work and understand it, nothing new is created by doing this. One can rewrite their books in modern language and notation or guide others to learn it too but I never believed this was the significant part of a mathematician work; which would be the creation of original mathematics. It seems entirely plausible that, with all the tremendously clever people working so hard on mathematics, there is nothing left for someone such as myself… Perhaps my value would be to act more like cannon fodder? Since just sending in enough men in will surely break through some barrier.
Thurston jumped in:
It’s not mathematics that you need to contribute to. It’s deeper than that: how might you contribute to humanity, and even deeper, to the well-being of the world, by pursuing mathematics? Such a question is not possible to answer in a purely intellectual way, because the effects of our actions go far beyond our understanding. We are deeply social and deeply instinctual animals, so much that our well-being depends on many things we do that are hard to explain in an intellectual way. That is why you do well to follow your heart and your passion. Bare reason is likely to lead you astray.2 None of us are smart and wise enough to figure it out intellectually.
The product of mathematics is clarity and understanding. Not theorems, by themselves. Is there, for example any real reason that even such famous results as Fermat’s Last Theorem, or the Poincaré conjecture, really matter? Their real importance is not in their specific statements, but their role in challenging our understanding, presenting challenges that led to mathematical developments that increased our understanding…
Mathematics only exists in a living community of mathematicians that spreads understanding and breathes life into ideas both old and new. The real satisfaction from mathematics is in learning from others and sharing with others. All of us have clear understanding of a few things and murky concepts of many more. There is no way to run out of ideas in need of clarification…
Here we need to take a short metaphysical break, because it is all too easy to brush Thurston’s words off as “feel-good” or “woke”.
In my first Substack post, I (half-jokingly) declared that we had been wrong about mathematics for 2300 years, stuck in a false dilemma between formalism (“mathematics is a meaningless game of formal symbols”) and Platonism (“mathematics captures properties of actual entities living in the perfect world of ideas”).
My proposed conceptualist resolution is a rephrasing of Thurston’s view: mathematics does rely on a meaningless game of formal symbols, but we only play this game because we project meaning onto it.
Meaning is a cognitive phenomenon—a product of our neural architecture—and not a direct access to transcendence.
When we “do math”, we manipulate formal expressions and gradually develop an intuitive feel for what they represent, as if they were pointers to objects that “existed” in a Platonic sense. Platonists take this neuroplastic side-effect at face value. Formalists view it as accessory. Conceptualists like me recognize mathematics as a critical cognitive infrastructure of the human species.
A natural question is why the conceptualist resolution took so long to emerge. One reason is that it goes against the prevailing spiritualist worldview, which refuses physicalist interpretations of mathematics.
It also goes against the honor code of mathematicians. Hardy’s curse is so powerful that even Thurston found it hard to overcome. When multiple MathOverflow users thanked him for his take, he noted in reply:
Thanks for the comments. I try to write what seems real. By now, I have no cause to fear how I will be judged, which makes it much easier for me. It’s gratifying when my reality means something to others.
But then, how could such a toxic honor code survive for so long?
The answer is simple. The honor code was useful to mathematics as an academic discipline. It helped it stay exceptionally healthy and meritocratic, as noted in the epilogue of my book:
This system has its merits. It reduces arbitrariness and helps mathematicians guard against complacency and nepotism. When a discipline deals with eternal truths, it offers a neat way to evaluate careers.
The honor code also served as a guide to researchers themselves, when evaluating new ideas and new directions of research. Concept-building and problem-solving, the two facets of mathematics, are in a symbiotic relationship, as remarked by 2018 Fields medallist Peter Scholze:
What I care most about are definitions. For one thing, humans describe mathematics through language, and, as always, we need sharp words in order to articulate our ideas clearly… Unfortunately, it is impossible to find the right definitions by pure thought; one needs to detect the correct problems where progress will require the isolation of a new key concept.3
This is how the system worked for millennia. Mathematicians created value by introducing new concepts, but the rule was that only theorems could put bread on the table. The deal was fine because the two aspects almost always walked hand in hand. David, the social parasite who claimed credit for the 𝐾(𝜋,1) conjecture, was the same person as the David who crafted Definitions 2.4 and 9.3.
Solving a big conjecture was a cryptographic proof that you had come up with a genuine conceptual innovation.
I am using the past tense because this is no longer the case. There is a structural vulnerability in the honor code of mathematicians and AI has started exploiting it in a systematic manner.
The trigger for this post was a speech by Geoff Hinton, which caught me off guard:
I agree with Demis Hassabis, the leader of DeepMind, who for many years has said AI is going to be very important for making scientific progress…
There’s one area in which that’s particularly easy, which is mathematics, because mathematics is a closed system…
I think AI will get much better at mathematics than people, maybe in the next 10 years or so. And within mathematics, it’s much like things like Go or Chess that are closed systems with rules...
I was used to the general public being profoundly wrong about the nature of mathematics. But I wasn’t prepared for a Turing awardee and Nobel prize winner comparing it with Go and Chess.
I wrote a short response on X and tried to move on, but it kept troubling me.
Then it all clicked into place. About a year ago, I had been approached by a young mathematician friend who had done his PhD in my domain. He was thinking about launching an “AI for pure math” business and I mentored him for a while.
Like him—and like Hinton and Hassabis—I was fully convinced that AI was about to transform mathematics and science in general. But I was unsure about the business model and minimum viable product.
Mathematicians may look like Luddites, but they rarely are. They love pen and paper, blackboard and chalk, but they jumped on Donald Knuth’s typesetting revolution. A century ago, they chose to rebuild their entire knowledge stack on a new operating system, set theory, that promised massive gains in reliability and scalability. A few decades later, they recognized that there was no real difference between a mathematical proof and a calculation, and set out to build the first computers. Deep learning, with its heavy use of linear algebra and stochastic gradient descent, is a brainchild of mathematics.
In the 1970s, when Kenneth Appel and Wolfgang Haken built a computer-assisted proof of the four-color theorem, this opened an intense debate on the epistemic nature of such proofs and their admissibility in peer-reviewed journals. Although, to be honest, there never was much suspense—the barbarians won, because there were barbarians on both sides.
Computers had always been part of my mathematical life4 and the promise of AI and autoformalization had long felt irresistible to me. This is what got me excited when my friend reached out and asked for my advice.
I started looking at the “AI for math” space and couldn’t understand what was going on. Why were these startups raising so much money? Pure mathematics is such a tiny market. The investments felt disproportionate.
My preferred strategy, the one I would have pursued, was to create the Wolfram Research of the AI age. Mathematics-enabled science and technology is a much larger market than pure mathematics and, as Wolfram demonstrated, there is room for simplifying and productizing the experience of interacting with mathematical objects. The users love the product and it is sticky.
But my friend insisted he wanted to do something specifically about pure math.
I didn’t know what to say, because I was stuck. The only useful products I could think of were literature spotters and interactive proof assistants—hard to package, hard to price, and even harder to sell. I could see a long term business strategy, but it was one I wouldn’t touch with a ten-foot pole—becoming the Elsevier of the AI age, the most hated brand in science, an arm-twisting extractive monopoly that repackages the mathematical commons into a mandatory experience.
There was a third strategy, though. But it was risky and, like the previous one, it did require a certain degree of cynicism. I’d call that plan the luxury acquihire: 1/ build a useless product that is striking enough, 2/ give the impression that you have solved a major scientific problem, 3/ pray for a quick M&A by a tech giant or a major AI lab.
Still, the numbers didn’t add up. The “AI for math” startups were rumored to be raising hundreds of millions. There must have been a smarter investment thesis, which I was failing to comprehend.5
Then I heard that Google was leading a massive effort to solve the existence and smoothness of Navier–Stokes equations. I thought OK, I get it, that’s a Millennium Prize problem. But, wait, that still doesn’t make sense—the payout is one million dollars, peanuts. As one great mathematician remarked to me, Google likely mobilized more brainpower on this single effort than the entire community ever did.
It only started to make sense after I heard Hinton’s speech.
If mathematics really was a closed system—or if this is what all the stakeholders around the table are willing to believe—then the investment pitch becomes trivial: “DeepMind solved Go and Chess, we’re going to solve mathematics!”
At a time when the leading AI labs are betting trillions that humans are soon to become obsolete, the promise of “solving mathematics”, the crown jewel, the pride of the human race, is simply irresistible.
On February 5th, a team of eleven high-profile mathematicians (including Martin Hairer, the 2014 Fields medallist) announced the First Proof project and released a first batch of ten “research-level math questions”:
This manuscript represents our preliminary efforts to come up with an objective and realistic methodology for assessing the capabilities of AI systems to autonomously solve research-level math questions. After letting these ideas ferment in the community, we hope to be able to produce a more structured benchmark in a few months.
One of our primary goals is to develop a sophisticated understanding of the role that AI tools could play in the workflow of professional mathematicians. While commercial AI systems are undoubtedly already at a level where they are useful tools for mathematicians, it is not yet clear where AI systems stand at solving research level math questions on their own, without an expert in the loop.
From a purely scientific perspective, there is nothing to complain about. These are incredibly smart people, engaging a real-world controversy with an open-minded attitude and a creative approach.
The First Proof team was doing everything right—and this is what terrified me.
But before I explain, I must reiterate that I have a very high opinion of this project. The team represents the mathematical community at its best, people driven by curiosity and integrity, willing to experiment outside of their comfort zone, and they did come up with genuinely good ideas.
Daniel Litt wrote an excellent essay, Mathematics in the library of Babel, on the First Proof project and his own first-hand assessment of the AI-for-math situation. His perspective is that of a radical non-Luddite, a pure mathematician who has engaged with LLMs for years and is genuinely impressed by the recent progress.
He was surprised by how many of the First Proof open problems ended up being solved by the teams at Google, OpenAI, and others. By his own count:
It seems likely that somewhere between 6 and 8 [of the 10] problems were solved correctly if one combines all attempts.
There are serious caveats, though:
The models (and the humans supervising them) generated an enormous amount of garbage, including some incorrect solutions claiming to be formalized in Lean. Even the best models/scaffolds seem not to be able to reliably detect when they are producing nonsense.
Even when AI-for-math systems operate autonomously, which few in the current crop actually do, they still require humans to intervene upstream and downstream, if only to assess the results and filter out the junk. This isn’t anecdotal. The AI labs are investing so much human intelligence into these projects, much more than any real-life mathematician can ever mobilize, that the delineation is never entirely clear. The most damning illustration is this:
It was not clear to OpenAI which of their solutions to First Proof are correct.
In other words: without the pro bono effort of the good old academic community, they might have never known. (Litt was himself a contributor.)
In truth, this also applies to human-generated proofs. But there is a fundamental nuance. In the human way of doing math, theorem-proving and concept-building walk hand in hand, which forces proofs to be intelligible (if only to their authors).
This is where the metaphysics comes back to bite. If mathematics was just a formal game of meaningless symbols, intelligibility would be a vacuous notion. The reality of mathematical practice forcefully points in the opposite direction. As it happens, published research is full of bugs, but these bugs tend to be contained and fixable, precisely because human-generated proofs are meaningful and (almost always) directionally correct—two notions that are impossible to reconcile with the formalist worldview.
This isn’t a cosmetic nuance. This is a mandatory condition for mathematics to exist as a sustainable endeavor.
As noted by Litt, the essential characteristic of being “truth-seeking” was missing from the AI solutions to the First Proof questions:
Many correct solutions are very poorly written, and their correctness is exceedingly difficult to check because of this... In [human solutions] the main ideas, obstructions to be overcome, etc., are usually identifiable; in [AI solutions] they are often completely unclear. And in the course of writing their solutions, the human authors often develop useful new objects, terminology, etc. to capture what they’re doing, while the models usually just plow ahead.
This is where the second leg of the math-for-AI revolution comes into play: autoformalization, the capability to transform the mildly informal style of human-produced proofs—and the outputs of LLMs trained on them—into bulletproof, machine-verifiable logical derivations, expressed in specialized languages such as Lean.
On paper, this can address the issue of correctness and remove the need for human validation. But, as noted by Litt, “there was only one credible formalized solution to any problem, impressively and manually orchestrated by Tom de Groot.”
This is of course an evolving situation. The stakes are high and the investment massive. LLMs, scaffolding, and autoformalization, are all making steady progress.
What would happen if, a year from now, the First Proof team released another set of 10 problems of equivalent difficulty? Litt doesn’t answer this specific question, but he expects AI to autonomously produce results “at a level comparable to that of the best few papers” within the next few years.
I share his sentiment. In my view, the likely outcome would be that the leading AI labs would score a perfect 10, coming up with fully automated & fully correct solutions to all problems.
Does that mean that AI will have “solved mathematics” by early 2027?
Of course not. There are three additional caveats that I am sure all insiders will have already spotted, but non-specialist readers may have missed.
The first one is straightforward—the First Proof “research-level questions” were neither profound nor difficult. They were closer to technical lemmas, well-calibrated intermediate subproblems that occur in the course of proving a theorem and are typically handled in a few paragraphs or pages. There is an ocean between a technical lemma and a serious paper, another ocean between a serious paper and a breakthrough, another ocean between a breakthrough and a Fields-medal-level contribution, and yet again several oceans above that.
First Proof is now working on a second batch, which is likely to include harder problems.
From a technical perspective, they did pick the right level for their first batch. This only appeared in hindsight: the final score was above 0/10 and below 10/10, in the sweet spot for their stated goal of finding an “objective and realistic methodology for assessing the capabilities of [current] AI systems”.
Yet I think that releasing a benchmark that didn’t include serious problems was extremely dangerous. The general public doesn’t read the fine print, and may not even know what a lemma is. If Google or OpenAI had scored a perfect 10, the headline would have read: “GAME OVER: The world’s top mathematicians challenged an AI with 10 research-level problems; the AI nuked each and every one of them.”
Similar distortions are being made every single day on social media, feeding the very confusion that First Proof was trying to dispel.
The second caveat is much more profound. It is also extremely subtle and hard to communicate to the general public, and this combination is creating a tricky situation for the mathematical community.
The problem with unintelligible proofs goes way beyond correctness, and cannot be resolved by autoformalization alone: even if correct, unintelligible proofs aren’t accretive to the mathematical corpus.
I know this sounds barbaric. Let me explain with an example. A few weeks ago, Math Inc, one of the best-funded AI-for-math startups, produced a Lean formalization of Maryna Viazovska’s spectacular work on the sphere packing problem in dimensions 8 and 24, results which earned her Fields medal in 2022. That was impressive in its own right: never before had theorems of this caliber been autoformalized. Yet this clear success was met by a massive pushback from the “formal mathematics” community, the very people who lead the effort to port “human mathematics” into machine-verified code.
Luddites?
Well, it’s more complicated, as it emerges from their actual conversation. The thing is that autoformalization isn’t a full solution to the problem of formalizing mathematics, just like Tesla’s Full Self-Driving isn’t a full solution to the problem of driving cars.
Yes, I know, that sounds counterintuitive. This is why outsiders are likely to miss the nuance and label the pushback as Luddism.
A clear explanation can be found in Alex Kontorovich’s account of his own learning curve with formalized mathematics. In a nutshell: Mathlib, the dominant Lean library, is a human-curated formalization of an ever-growing fraction of existing human mathematics. It exposes clean APIs and abstractions, without which no autoformalization could take place. By contrast, Math Inc’s autoformalized proof of Viazovska’s results exposes no intelligible interface. Who in their right mind would merge a 200,000-line unaudited vibe-coded blob into the master branch of global human science?
Kontorovich has a great expression for what is missing—canonization:
By canonization, I mean the process of taking a local, one-off formalization and turning it into library mathematics: general, reusable, coherent, efficient, and compatible with the rest… Canonization often changes the picture itself: the definitions, the abstractions, the API, and sometimes even the statement… This is extremely difficult and time-consuming.
But this, again, might be mistaken for Luddism. “General”, “reusable”, “coherent”, “efficient”… Aren’t these reactionary arguments made up by dumb humans with narrow context-windows? Why would an AGI care about that? Isn’t autoformalization, by design, the definitive solution of all problems with vibe-coding?
Doesn’t proven code meet the highest conceivable standards for software? Why are you constantly shifting the goalposts?
The software analogy helps understand the core issue. For normal software, quality is first and foremost a pragmatic notion—good code should compile, run smoothly in production and satisfy user needs. But Lean code never runs anywhere. It is just a library that sits in a repository and might be imported in the future, who knows when, in the process of formalizing another theorem.
You cannot “ship and let the users do the QA for you” because your only user is the future of mathematics. This is why canonization, from Euclid to Zermelo-Fraenkel to Bourbaki, has always been a core concern of the formalist school.6
We’re back to the metaphysics. The problem with unintelligible mathematics isn’t that it might be false. It is that it is literally meaningless, in the sense that it doesn’t compile on the only hardware that is currently able to make sense of it and appreciate its value—the human brain.
This, of course, could change in the not-so-distant future. There is nothing magical about the human brain. Artificial sense-making and truth-seeking architectures will certainly emerge, and at some point they will probably surpass humans on all aspects of mathematical creativity, including the canonization of formal proofs (which, from an algorithmic complexity perspective, is a mandatory requirement for all forms of mathematics, whether human or superhuman).
Yet this is absolutely not our present—nor a straightforward continuation of current trends—and I don’t think the conditions are met for a fruitful debate. In any case, if humans are still around, they will still feel the need to understand the world, and this will continue to drive them to engage with mathematics.
No one knows the timeline to AGI. Meanwhile, the autocanonization capabilities of frontier models barely exceed zero and Math Inc’s autoformalized proof of Viazovska’s theorems sits as an orphaned 200,000-line blob.
This is what made the Mathlib community so angry. They had been working on a multiyear project to formalize Viazovska’s work. Math Inc jumped in on this collective effort, leveraged prior insights, then abruptly went silent, until they made their spectacular announcement.
Is this necessarily bad news? Now that the brute-force autoformalization is done, why can’t the Mathlib community refocus on the value-creating canonization?
Because of Hardy’s curse and the honor code of mathematicians. Math Inc captured the prize (“first formalization of a Fields-medal-level theorem”) and there is no social reward left for cleaning up after them. Hence this comment by Patrick Massot, a non-Luddite expert in formalized mathematics:
I think the situation is pretty clear: AI companies, and especially Math Inc, will indeed thoroughly bomb this area to turn it into a giant radioactive wasteland that will never be able to sustain life again, so we will never get the benefits expected from formalization (improved understanding and accessibility). I strongly advise young people to contribute to less shiny projects that are less likely to be destroyed.
What makes the situation really tricky is that unintelligible formal proofs may hold significant residual value, even if they aren’t accretive to the canonized corpus.
And, to be honest, the issue existed well before AI, with the four-color theorem, with the classification of finite simple groups, or with Tom Hales’s monumental work on the Kepler conjecture (which led him to seek a formalized proof).
The likely outcome is that formalized mathematics will now develop in two separate layers, an intelligible layer embodied by Mathlib, and an unintelligible layer we might call Mathslop, a library of results that are known to be correct via proofs that no human has ever understood.
When the First Proof problems were released, my greatest fear was that the big AI labs were going to one-shot every single one of them, for the wrong reasons—because the answers might have been available somewhere in one shape or another.
Litt notes that 2 of the 10 problems had solutions7 in the existing literature (which, no surprise, most LLMs were able to leverage), while a “sketch of proof” was available for a third problem (which, nevertheless, no LLM was able to solve).
But what about the remaining 7 problems? Were they really open? In what sense?
My view is that it is impossible to say for sure, due to a structural feature of the mathematical corpus which, no doubt, is going to play a central role in the AI-for-math debate. In fact, I suspect that our legacy notions of “creativity” and “innovation” are ill-founded, and that AI is about to teach us brutal lessons about them.
Most mathematicians are intuitively aware of this structural feature, although its exact shape and size are impossible to chart. I haven’t seen any serious attempt to theorize it and the feature has no agreed-upon name—let me call it the Overhang.
I expect the Overhang to be absolutely gigantic.
I met it on several occasions in my career. Most encounters were fairly casual, except for one mystifying experience that took place as I was doing my best work, right after I came up with the core idea in my Definition 9.3. That was an entirely new concept, built on a stack of new ideas that I had just come up with, unlocking the solution to a classical conjecture—subjectively, that felt like a stroke of genius.
As I was “canonizing” my most creative idea ever, I made a shocking discovery. My proud invention, divided Garside categories, was essentially equivalent to a seemingly esoteric construction in an entirely different branch of mathematics, namely the Bökstedt-Hsiang-Madsen subdivision of Connes’s cyclic category in algebraic K-theory.
In other words, my one genius idea was déjà vu. Or was it? Two decades later, I continue to view this moment as the high point of my career. I don’t really care if my idea was or wasn’t “original”, as I doubt this is the most salient notion. What struck me was the sudden flash of meaning, which was otherworldly. (A clear cognitive sign of long-term value.)
In any case, a good LLM might have spotted the syntactic analogy in the preliminary phases of my work, when I still had no clear idea where it was leading. Then it could have scooped me by “front running.”
The existence of profound correspondences between seemingly unrelated topics is one of the most celebrated aspects of mathematics, a source of joy and marvel. It might be the purest expression of mathematical beauty. Descartes changed the world by noticing that algebra and geometry were essentially the same, and by building a bridge between them. Once you “invent” cartesian coordinates, many legacy problems become trivial—and interesting new problems instantly emerge.
But modern mathematics is so fabulously complex that most correspondences go unnoticed.
Sometimes connecting the dots leads to initial mystery rather than discovery. A striking example is John McKay’s famous remark that 196 883 + 1 = 196 884. The number 196 883 appeared in the study of the Monster, while the number 196 884 appeared in the study of modular forms, two areas of mathematics that were mutually alien. The conjectural vision that they should be related seemed so ludicrous that it was labelled the Moonshine theory, or the Monstrous Moonshine. Richard Borcherds received the 1998 Fields medal for proving it correct.
But sometimes connecting the dots is enough to solve a major problem. The last mile of proving a conjecture is often about realizing that the missing bit was already present somewhere in the literature.
The Overhang consists of the unrealized capital gains of past mathematical creativity, the latent value from connecting the dots in the existing corpus. It is a dividend of canonization. Mathematician X states problem A, mathematician Y crafts concept B, then mathematician Z notices that B trivially solves A and “captures” the social reward. But in the process of capturing the reward, Z usually introduces new concepts and new open problems, reinjecting latent value into the Overhang.
LLMs can be trained on the entirety of the mathematical corpus. Thanks to their phenomenal memorization and pattern-matching abilities (without always being able to map out their associative logic and attribute due credits), they are in a unique position to harvest the Overhang. By contrast, professional mathematicians have typically read a few hundred articles in their career, out of millions of existing references, less than 0.1% of the total.
This will lead to great discoveries, which is unambiguously exciting. But it could also lead to a sad new deal, where human slaves painfully curate the Overhang while AIs systematically beat them at the finish line.
We are very far from it, though, which in and of itself is disorienting. Litt adds this sharp remark:
The mystery is this: a human with these capabilities would, almost certainly, be proving amazing theorems constantly. Why haven’t we seen this from the models yet? What are they missing?
The answer seems quite obvious—current AI systems and humans process mathematics in entirely different ways. The best models are insanely stronger on certain aspects, which necessarily implies that humans are still insanely stronger on others.
AI is at the same time superhuman and subhuman, depending on how you look at it.
This is a compliment to both AIs and humans. This plurality of aspects makes the whole “benchmarking” enterprise extremely fragile, if not ludicrous. There is a real possibility that AI will achieve problem-solving supremacy long before it achieves concept-building adequacy. What are we going to call that? “Superintelligence”? Seriously?
This is my core issue with the First Proof approach. It is a meaningful benchmark for working mathematicians wondering what AI can do for them in the present, from within the constraints of an obsolete honor code, at a moment when they urgently need to break away from it.
Why don’t we construct benchmarks that are fairer to humans? Because we can’t. Because the true value of mathematics, the collective and individual elevation of our worldviews, is ill-defined and intangible. This intangibility was the raison d’être of the honor code.
For millennia, we had agreed to only benchmark human intelligence on its problem-solving facet, as we had found that it was the best objective proxy. Theorem proving is so inordinately difficult for our cognition that the only progress path was through patient concept building and neuroplastic internalization of these new concepts.
Yet this was only ever a proxy. The thing we really care about is different in kind. Industrial robots are far stronger than humans, yet we still go to the gym. Blenders have been outchewing us for over a century, yet we still don’t eat exclusively through straws.
I know the optics are horrible. Declaring that math is first and foremost about comprehension, an unbenchmarkable aspect, sounds like an all-too-convenient excuse. Yet this isn’t something I am pulling out of my hat at the last minute. Here is what Thurston wrote in 2011:
Mathematics is commonly thought to be the pursuit of universal truths, of patterns that are not anchored to any single fixed context. But on a deeper level the goal of mathematics is to develop enhanced ways for humans to see and think about the world. Mathematics is a transforming journey, and progress in it can be better measured by changes in how we think than by the external truths we discover.
I cited this passage in my book, written five years ago, before the AI storm started. I also cited a 1628 rant by Descartes against the Ancient Greek mathematicians who, “with a kind of pernicious cunning”, had “suppressed” the “true mathematics”(which he identified with the inner cognitive methodology) and only published “childish and pointless” stuff, “the fruits of their method, some barren truths proved by clever arguments, instead of teaching us the method itself.”8
Terry Tao made a similar comment in his recent interview with Dwarkesh Patel:
In math, the process is often more important than the problem itself. The problem is kind of a proxy for measuring progress.
He also made this remarkable prediction:
I think within a decade, a lot of things that mathematicians currently do—what we spend the bulk of our time doing and a lot of stuff we put in our papers today—can be done by AI. But we will find that that actually wasn’t the most important part of what we do.
The feedback on social media was overwhelmingly positive, as is often the case with Tao’s public interventions. Yet I did notice some unusual dissent, specifically on the notion that problem solving in itself isn’t all that valuable.
To the AGI prophets, this passed as a refusal to see the writing on the wall. There were some actual sneers.
On March 3rd, Jeremy Avigad, the first director of the newly-created Institute for Computer-Aided Reasoning in Mathematics, posted a five-page essay on Mathematicians in the Age of AI.
It contains a detailed account of the Math Inc vs Mathlib incident, of which Avigad was a prime witness. He too is a radical non-Luddite, ideally positioned to observe the rapidly changing situation:
We have to face up to the fact that AI will soon be able to prove theorems better than we do…
We need to remember our strengths: mathematicians are problem solvers and theory builders extraordinaire. Rather than fight the use of AI in mathematics, we should own it. It is not enough to keep up with current events and design benchmarks for AI researchers; we need to play an active role in deploying the technology and molding it to our purposes.
Regarding the First Proof initiative, Avigad admits having been “surprised” that Google could solve “six out of the ten problems.”
As written above, I would have put it differently—I felt more “relieved” than “surprised.” But I was “surprised” too, in the sense that I had no intuition of how difficult the problems really were and no clear anticipation of how many would be solved.
What I did find disconcerting was Avigad’s congratulatory comment on the interest sparked by First Proof:
A few AI developers took the bait
I am not sure if this is irony or naivety. To people familiar with the tech ecosystem, it was pretty clear that the First Proof problem set would trigger a five-alarm scramble at all the leading AI labs. Who could miss out on an opportunity like that?
Here is how Greg Brockman, the president of OpenAI, commented on it:
“First Proof” (http://firstproof.org) is a very cool concept and approach. congrats and thank you to the team who put it together!
The timeline is remarkable. Brockman’s comment came on February 14th, as OpenAI was on a roadshow to secure the largest private technology fundraise in history. Even more remarkable, Brockman wasn’t the only member of OpenAI’s leadership team to get “distracted” by geeky side-projects in the midst of these incredibly tense weeks. A few days before, Sam Altman had found the time to do a bit of coding on his own and express his sadness at realizing that Codex had rendered him “useless.”
But was this ever a distraction? You could see this as a massive fish taking the bait, or a fox thanking the hens for handing the keys to the henhouse.
To be clear, I don’t think OpenAI—or any of the other AI labs—was doing anything evil. The First Proof team hadn’t left them any choice. Once the problems had been released, they had to compete, and they had to go all in, because they couldn’t afford to be outclassed.
In fact, all this is normal. The AI labs have genuinely exciting tech, and theorem-proving is a genuinely exciting use case.
The only problem is what we are all tempted to read into it.
This is what made Hinton’s speech so troubling. He presented mathematics as a low-hanging fruit, a “particularly easy” terrain for AI, because it is “a closed system with rules,” “like Go and Chess.” Yet Go and Chess aren’t ordinary closed systems. They are actual games with winners and losers. Even more so, they are finite two-person games with perfect information and, by a 1913 theorem of Zermelo, they admit optimal strategies.9
But if AI ever “wins” at mathematics, then someone else will have to be the “loser,” and the only other player in the room is human thought.
Who cares if mathematics isn’t an adversarial game—and we are still figuring out the rules. When the news is this sensational, it becomes fit to print.
Here is what the nuclear scenario looks like. A year from now, an AI lab assembles a secret A-team, throws a few billion dollars of compute at the Riemann hypothesis, and comes up with a 2,000,000-line Lean proof. What are mathematicians going to say?
That the proof is unintelligible? Right, but who cares? Isn’t the whole of math already unintelligible to most people?
That the theorem was still proven by the humans in the loop? Yes and no—and in that specific case I would probably vote no.
That a few billion dollars represents decades of the entire mathematical sciences budget at the National Science Foundation? Correct, so what? And what if the conjecture happens to be much easier, accessible with only nominal resources?
That the conjecture itself was still produced by a human brain, along with the entire mathematical stack underneath, starting from the very language in which the conjecture was written, the concepts of infinite sums, complex numbers, meromorphic functions and analytic continuation, the interpretation of the conjecture as a profound statement about the distribution of prime numbers, and the prophecy that it should be true? And that, at this point, humans are the only beings who care about it?
This objection is profound and valid. But when you look like a sore loser, nobody cares what you say.
I don’t expect this exact scenario to unfold. But I do expect to see, in the near future, one or more claims that AI has “solved math”, or “achieved mathematical supremacy.” And, given the current state of the conversation, I expect the actual evidence supporting these claims to be framed and/or understood as a defeat for human intelligence.
If Geoff Hinton doesn’t get it, then 99.9999% of people won’t get it either.
Whatever they will feel the urge to say when the disaster happens, mathematicians should start saying now. It is probably too late to entirely dodge the bullet, but at least they have a little time to anticipate and coordinate.
The clearest sign of urgency is the growing defiance of younger mathematicians, many of whom are joining the AI labs. Full disclosure: I understand their choice, and would likely do the same if I were in their shoes.
Meanwhile, non-tenured academics are showing signs of despair. Ashvin Swaminathan, a postdoctoral fellow at Harvard, commented that Litt’s essay, which he agrees with, left him “with a lot of questions about the longevity of research mathematics as a profession outside of the most elite institutions.”
Swaminathan’s thread is a great read that touches on an essential aspect of mathematics—teaching:
One might ask whether research mathematicians can support their work through teaching. But that ignores the dramatic impact that AI is having on teaching. A university degree may very soon be worthless, given how many students use AI to do their work for them. There’s a reason why the number of tutoring requests I’ve received, and the number of students who attend office hours, has decreased dramatically. There’s a reason why students do incredibly well on homeworks and far worse on exams. Many kids are graduating with no real skills.
Let me emphasize that this is the same exact issue as in research. Problem solving is just a means to an end. The core payout of pure mathematics is the neuroplastic elevation of our worldview, and we can’t get there by copy-pasting the result.
The false notion that mathematics is only about proving stuff (or finding the right results) is toxic at all levels, starting in primary school.
In particular, the frequently heard claim that “AI can already perform PhD-level research”—if and when it will be correct—isn’t as compelling as it may seem. PhD students are students. Reasonable advisors direct them to problems with accessible strategies. In fact, most published research sits comfortably within the Overhang, where it serves a dual purpose, part research in the true sense, and part cognitive maintenance of the research community. That is not a dig at my former colleagues, but a testament to the genuine difficulty of frontier mathematics, which only exists as the spearpoint of a collective endeavor focused on intelligibility.
If theorem-proving remains their only official currency, all research mathematicians run the risk of becoming demonetized in the course of the next few years.
Is there anything the mathematical community can do?
I don’t see any miracle solution, but there are obvious approaches to limiting the damage. The first one is to finally come clean about the nature of mathematics.
All the active mathematicians I have cited, from Tao to Avigad, from Litt to Kontorovich, are careful to note that there is something in mathematics that goes beyond theorem proving. Yet this is usually framed as a passing remark, not a major shift in the narrative.
Well, this is a major shift in the narrative, and the public will never take it seriously if it isn’t properly explained. Right now, it looks like an excuse.
You can’t reposition a plurimillennial brand without being explicit about what you are doing, why you are doing it, and which additional value it will bring to everyone. It is necessarily a big deal, necessarily difficult, and the time-window to get it right is extremely narrow.
This might be the largest product recall in memory—we sold a bogus idea of math to billions of people, and the bill is coming due.
The only way is to lead with the core message and hammer it over and over again. Let me do my share by citing another Thurston quote (also picked by Avigad as the epigraph to his recent essay on Mathematical Understanding):
We are not trying to meet some abstract production quota of definitions, theorems and proofs. The measure of our success is whether what we do enables people to understand and think more clearly and effectively about mathematics.
Long term, that could force a complete rethinking of the ethos and evaluation criteria of the profession. Mathematicians will have to revoke the honor code and the second law of the Intuition Club—“if you really want to talk about intuition, make it sound casual and accessory, because we ain’t the psychology department.” If Thurston is correct, then we are the psychology department.
All that is easier said than done. The mathematical community isn’t a structured organization that can convene an emergency offsite and put together a crisis response.
It doesn’t even share a unified view of mathematics. Some genuinely think the whole point is to “solve previously unsolved problems”, while others object to any serious discussion of intuition. I disagree with both of these views, but my opinion is nothing but that—an opinion.
The absence of a clear definition and mission statement keeps the community on the back foot.
Christian Szegedy, a researcher who has been a co-founder of xAI and Math Inc, made the prediction that a “super-human mathematician AI” would emerge by June 2026. This is the kind of declaration that is either empty (electromechanical calculators have been superhuman for a century) or maximalist in the worst possible manner. Not surprisingly, Szegedy also wrote that he found Terry Tao “out of touch with the reality that was coming up quickly.”
The problem isn’t with the prediction itself (Szegedy has real technical legitimacy and the right to his opinions, however grandiose) but with the general climate that makes it very difficult for a response to be heard.
One simple idea I keep returning to is that we need something like a Mathematical Intelligence Scale, or Mathematical Automation Scale, like there is for self-driving cars.
That would address the pressing danger of objective benchmarks that, because of their very nature, fail to account for intelligibility and concept-building. When AI crushes the next iteration of First Proof, the press release should read: “AI solves First Proof and achieves level 3 (out of 10) on the Mathematical Intelligence Scale”.
Mathematicians shouldn’t be afraid of subjectivity. It is not about cheating. It is about being fair and not committing collective seppuku. In fact, any act of problem-setting is already subjective—there is nothing objective about the First Proof problems, Erdős problems, or Millennium Prize problems, apart from the fact that (in theory) we can objectively assess whether any of them has been solved.
This quality of being objectively verifiable is missing from the more profound aspects of mathematical creativity, such as problem-setting itself, canonization and concept-building, reframing, simplifying, conjecturing, up to major theory-building and the creation of entirely new branches of mathematics—yet that doesn’t mean these activities do not exist and aren’t central to mathematics.
There is even an objective dimension to mathematical subjectivity. When Descartes connected algebra to geometry, when Newton and Leibniz invented calculus, it was pretty clear to their contemporaries that something major had just taken place. In a sense, our entire technostructure, including AI startups, lives on the dividends of those inventions.
An ideal scale would be open-ended—or have seemingly inaccessible upper rungs—since math isn’t something to be “solved.”
Mathematics is known to be inexhaustible and infinitely hard. This implies that there is room for infinite progress. The one urgency is to make it clear to everyone that it isn’t a trophy that is up for grabs.
Sadly, a lot of the damage is already baked in. Even with perfect execution, it is impossible to conduct an epistemic revolution under the pressure of an external timeline. This one will have to take place in a broader context that is extremely unfavorable to core science, where every nuance is instantly dismissed as gatekeeping.
These will be difficult years for mathematicians. Students will desert. Money will dry up. Postdocs will ask pressing questions and senior faculty will have to come up with answers. At parties, people will start giving them weird looks, as if they had become living fossils.
Yet, at some point, the pressure will alleviate.
The industry will have to normalize and the AI-for-math startups will have to find their business models. Which, in all likelihood, won’t be about solving Erdős problems.
To me, this might be the most striking aspect of the current moment. For all the flex and fury, the AI labs are barely engaging with mathematics itself, its core problems and significance. Which is entirely normal, since this is none of their business.
When the dust settles, if it ever does, we will find out that human mathematics has survived, transformed.
I won’t make any prediction on the timeline, because I have no idea. I don’t know which capabilities will be achieved when, which breakthroughs they will enable, and which limitations will appear in hindsight.
What I can say is that the change is already taking place. This really is a scientific revolution and the research community has started actively engaging with it. As I was writing this article, I saw remarkable developments every single day in my timeline. My initial plan was to cite the most striking ones, but there are just too many and I am overwhelmed.
A pessimist would argue that this is a fear-induced response and a symptom of the disruption. We are seeing the same pattern across all industries. When capital gets reallocated so fast, on a scale so massive, there may be no rational way to prepare for the future.
It is not clear to me how I would approach this if I hadn’t quit active research. The low-hanging fruit would be to vibe-code a formalization of my yet-unwritten cellular decomposition theorem. Then I could test a few wild ideas and conjectures. But all this is old-world thinking, and I suspect that even the collaboration modes and deliverables of mathematical research will have to change.
My view, which is hardly original, is that it will become unthinkable to do math without AI assistance, just like it has become unthinkable to do math without set theory and LaTeX.
But this says nothing about the second-order effects. I tried to imagine them as a forecasting exercise, and came up with a dozen or so ideas, not all profound and not all credible. Here are three that I find interesting:
1. A clearer cut between pure and applied mathematics. While both deserve to be treated as first-rate citizens, I was never happy with the traditional way of drawing the line. Algebra is presumed to be pure because it is abstract, yet it can be immensely practical. On the other hand, “applied” domains tend to have their fair share of theoretical offshoots.
My take is that pure and applied should be regarded as different sweet spots in the intelligibility-vs-applicability trade-off. It is entirely rational to invest 10 billion dollars of compute in the equations of plasma containment, if it fast-tracks the design of grid-scale fusion reactors, even if intelligibility has to wait for another century or two. I wouldn’t say the same about the Riemann hypothesis.
2. The rise of intuition-maxxers. To me, that would be the clearest sign of a new Golden Age. We are not seeing them right now because the few people with the rare combination of math and agentic skills are poached by the AI labs. But this is a market anomaly that is likely to correct in the near future (the combination is becoming less rare, and the AI labs can only absorb so many of them).
I suspect AI will enable the same transformation that samplers and sequencers enabled in music. The technical barrier to grasp frontier concepts will drop, and brave young mathematicians will deploy unorthodox tactics to survey entire new continents at a pace that will mystify their elders.
They will see further than anyone before, standing on the shoulders of giant machines.
3. A renewed interest in the philosophy and neuroscience of mathematics. In his 1979 essay, Reuben Hersh noted that Platonism and formalism were both untenable, and added this comment:
The problems of truth and meaning are not technical issues in some recondite branch of logic or set theory. They confront anyone who uses or teaches mathematics. If we wish, we can ignore them... It would be surprising if this had no practical consequences.
AI is confronting us with a practical consequence that no one10 saw coming, and that is already impacting the cognitive well-being of the general population. If we fail to articulate the neuroplastic value of intelligibility, there will be no justification whatsoever for teaching mathematics, given that the solutions to all the mathematical problems encountered by 99.99999% of the population over the course of their lifetime will be available at their fingertips.
People are vaguely aware that this would result in idiocracy, but they can’t really explain why, and we have yet to find teaching and evaluation modalities that are adapted to the new reality.
What concerns me is the false belief that there is a “normal” level of mathematical competency, as if certain concepts were any more “real” or “concrete” than the others, and advanced mathematics was gratuitously abstract. This creates the illusion that the current (and wildly unsatisfactory) abilities of the general public somehow reflect a “natural” state—when they are in fact the result of massive (yet frustratingly inefficient) pedagogical investment—and represent some solid ground. The truth is that there really is no bottom, and nothing prevents humans from functioning in a culture where there is no vocabulary for numbers above 5 (as it is the case for some hunter-gatherer tribes).
Some prophesy that mathematics will eventually resemble Chess, a sport that a few eccentrics practice with passion and the general public can safely ignore.
This is a shortsighted view that fails to account for the meaning-making and truth-seeking dimensions of mathematics, as well as the massive cognitive benefits of mathematical training. The world makes a lot more sense once you have an intuitive understanding of numbers, sets, statistics, linear algebra, calculus, differential geometry, machine learning, combinatorics, dynamical systems, model theory, group theory, singularity theory, homotopy theory, lambda calculus, complexity, category theory,… (there is no reason to stop at any given place, even though it is probably unwise to order the full menu)
The situation isn’t unique to mathematics. AI adoption creates the same exact problems in many other domains. But mathematics has a unique status and a unique aura, owing to its ancient companionship with the very notion of intelligence.
This is why the “AI for math” situation is so significant, and why its proper resolution matters so much for the future of our species.
Many people hate math, and they tend to put AI in the same bag. They view anything with mathematics or computers in it as a threat to their humanity and subjectivity, their raw experience of being alive, immersed in this world, trying to make intuitive sense of it.
They might be surprised—and thankful—to find out that mathematicians are actually on their side, fighting the good fight. As Carl Jacobi once put it, “the object of mathematics is the honor of the human spirit.”
The result is stated in the last sentence of the last slide of my presentation:
The link to the Unabomber’s Wikipedia page is Thurston’s, not mine (there is a backstory connecting the two of them, detailed in Chapter 17 of my book).
The Scholze quote is from a blog post by Michael Harris. The broader passage is even more striking:
What I care most about are definitions. For one thing, humans describe mathematics through language, and, as always, we need sharp words in order to articulate our ideas clearly. (For example, for a long time, I had some idea of the concept of diamonds. But only when I came up with a good name could I really start to think about it, let alone communicate it to others. Finding the name took several months (or even a year?). Then it took another two or three years to finally write down the correct definition (among many close variants). The essential difficulty in writing “Etale cohomology of diamonds” was (by far) not giving the proofs, but finding the definitions.) But even beyond mere language, we perceive mathematical nature through the lenses given by definitions, and it is critical that the definitions put the essential points into focus.
Unfortunately, it is impossible to find the right definitions by pure thought; one needs to detect the correct problems where progress will require the isolation of a new key concept.
My friend ended up opting for a non-profit business model, for which I had no added value as a mentor.
To be more precise, in Litt’s own words: “problem 10 had an answer that was more or less directly available in the literature, and problem 9 was (in my view) a pretty minor variant of previous work of one of the authors.”
In fact, neither Go nor Chess has actually been “solved” by AI—the optimal solutions guaranteed to exist by Zermelo’s theorem remain computationally out of reach—but AI did secure definite “wins” against humans.
I put myself in that list: I didn’t see it coming.
No posts