- Even for billion-parameter theories, a small amount of vectors might dominate the behaviour. A coordinate shift approach (PCA) might surface new concepts that enable us to model that phenomenon. "A change in perspective is worth 80 IQ points", said Alan Kay.
- There is analogue of how we come up with cognitive metaphors of the mind ("our models of the mind resemble our latest technology (abacus, mechanisms, computer, neural network)"), to be applied to other complicated areas of reality.
Gpt nano vs gpt 5 for example.
What we can do is to approximate. Newton had a good approximation some time ago about gravitation (force equals a constant times two masses divided by distance squared. Super readable indeed) But nowadays there's a better one that doesn't look like Newton's theory (Einstein's field equations which look compact but nothing like Newton's). So, what if in a 1000 years we have yet a better approximation to gravity in the universe but it's encoded in millions of variables? (perhaps in the form of a neural network of some futuristic AI model?)
My point is: whatever we know about the universe now doesn't necessarily mean that it has "captured" the underlaying essence of the universe. We approximate. Approximations are useful and handy and will move humanity forward, but let's not forget that "approximations != truth"
If we ever discover the underlaying "truth" of the universe, we would look back and confidently say "Newton was wrong". But I don't think we will ever discover such a thing, thereore sure approximations are our "truth" but sometimes people forget.
They did not.
They showed that for certain problems one could not do more than figure out some invariant and scaling laws. Showing what is impossible is not failure.
For the rest: Modern gene networks and lots of biological modelling is based on their work as well as quite a few other things. That’s also not failure.
I agree that modern AI is alchemy.
The admiration for "remarkable" things puts humanity on a dangerous path that is disconnected from the real goals of human progress as a species. You don't need any of this compression of knowledge or truths. Folklore tales about celestial bodies are fine and hood enough. The vulgar pursuit for knowledge is paving the way for extinction of humans as biological creatures.
There's a parallel in linguistics. Chomsky showed that all human languages share deep recursive structure. True, and essentially irrelevant to the language modeling that actually learned to do something with language.
...this is so absurdly and blatantly wrong that it's hard to move past. Has the author ever heard of programming languages??Simplicity brings us closer to truth — Occam's razor has underpinned the development of our species for centuries. It's enterprise, empire, and capital that feed off of complexity.
We're entering a period of human history where engineers and businesspeople drive academic discourse, rather than scientists or philosophers. The result is intellectual chicken scratch like this article.
It strikes me that many of these complex systems have indeterminate boundaries, and a fair amount of distortion might be baked into the choice of training data. Poverty (to take an example from this post) probably has causes at economic, psychological, ecological, physiological, historical, and political levels of description (commenters please note I didn't think too hard about this list). What data we feed into our models, and how those data are understood as operationalizations of the qualitative phenomena we care about, might matter.
For example - global warming. It's nice to have AOGCMs that have everything and the carbon sink in them. But if you want to understand, a two layer model of atmosphere with CO2 and water vapor feedback will do a decent job, and gives similar first-order predictions.
I also don't think poverty is a complex problem, but that's a minor point.
“No need to study the world around you and wonder about its rules, peasant - it’s far beyond your understanding! Only ~the gods~ computers can ever know the truth!”
I shudder to think about a future where people give up on working to understand complex systems because it’s hard and a machine can do it better, so why bother.
I've never understood why the idea of linguistic nativism is so upsetting to people.
I'm not sure it's a minor point. I don't think poverty is a "complex" problem either, as that term is used in the article, but that doesn't mean I think it fits into one of the other two categories in the article. I think it is in a fourth category that the article doesn't even consider.
For lack of a better term, I'll call that category "political". The key thing with this category of problems is that they are about fundamental conflicts of interest and values, and that's a different kind of problem from the kind the article talks about. We don't have poverty in the world because we lack accurate enough knowledge of how to create the wealth that brings people out of poverty. We have poverty in the world because there are people in positions of power all over the world who literally don't care about ending poverty, and who subvert attempts to do so--who make a living by stealing wealth instead of creating it, and don't care that that means making lots of other people poor.
Or a dinosaur that looks like it might:
I can write a program (call it a simulation of some artificial phenomenon) whose internal logic is arbitrarily complex. The result is irreducible: the entire byzantine program with all of its convoluted logic is the smallest possible theory to describe the phenomenon, and yet the theory is not reasonably small for any reasonable definition.
When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
Also see Minsky's "Perceptrons"
The problem with almost all such proofs is that people (even those who know better) read them as "this can't be done" when in fact they tell you "it can't be done unless you break one of the following assumptions."
I agree that it's unfair to say they failed, but it's likewise unfair to say that their success was in telling us our limits rather than exploring what we need to do to get around the roadblocks.
Though I think it's fair to say that the torch was picked up and carried by others with a different set of strategies.
" There are 2 types of people using AI: Those who use it so they can know everything, and those who use it so they don't have to know anything. " :-
IMHO, a lot of the more specifically anti-nativist sentiments of today are based in linguistics itself rather than philosophy, CS, or CogSci, where again it is part of a broader (and much dumber) debate: whether linguistics is the empirical study of languages or the theoretical study of language itself. People get really nasty when they're told that they work in an offshoot field for some reason, which is why I blame them for the ever-too-common misunderstandings of Chomsky -- the most common being "Universal Grammar has been disproven because babies don't speak English in the womb".
If Chomsky weren't so obviously right, this would be a worrying development! Luckily I expect it to be little more than a footnote in history, so it's merely infuriating rather than depressing.
[1] Minsky, 1991: https://ojs.aaai.org/aimagazine/index.php/aimagazine/article...
For most of human history, the things we couldn't explain, we called mystical. The movement of stars, the trajectories of projectiles, the behavior of gases. Then, over the course of a few centuries, we pulled these phenomena into the domain of human inquiry. We called it science.
What's remarkable, in retrospect, is how terse those explanations turned out to be. F=ma. E=mc². PV=nRT.
The universe, or at least vast swaths of it, submitted to compression ratios that seem almost unreasonable. You could capture the behavior of every falling object on Earth in three variables and describe the relationship between matter and energy in five characters.
The deepest truths fit on a napkin.
They had to. When your tools are pencils, chalkboards, and human working memory, a theory has to be small or you can't use it. The decompression happens in a human brain in real time. So theories needed to be not just correct, but operable at human scale. A physicist scribbling equations on paper needs to be able to hold the model in her head while she works through implications.
And so we developed an implicit belief that good theories are small. If a theory was elegant, we learned to trust it. If you couldn't express it concisely, you probably didn't understand it well enough.
This worked extraordinarily well for a certain class of problems. Call them the complicated.
A complicated system is one with many parts that interact in structured ways, but that ultimately yields to decomposition. A jet engine is complicated, and so are orbital mechanics and the circuit board in your laptop. You can break these systems into components, study each one, and reassemble your understanding into a coherent picture. The picture might be intricate, but it is, in principle, completable.
The Enlightenment and its intellectual descendants gave us a powerful toolkit for taming the complicated. And then we made the natural mistake of assuming that toolkit would scale to everything.
Poverty is not complicated. It is complex.
So is climate change. So is drug addiction, mental health, immune response, urban decay, ecosystem collapse, and the behavior of financial markets.
These are systems where the interactions between dimensions are themselves dynamic. Feedback loops create emergent behavior that isn't derivable from studying the components in isolation. Interventions in one area produce non-obvious cascading effects in others. And in many cases, like markets or public health, studying the system can cause changes to the system itself through reflexivity.
We've known about this distinction for decades. The Santa Fe Institute, founded in 1984 by scientists who realized their own disciplines couldn't speak to each other about the problems that actually mattered, was built around precisely this insight.
Researchers there, working across physics, biology, economics, and computer science, identified recurring features of complex systems, from power law distributions and self-organized criticality to sensitivity to initial conditions and phase transitions. They created a vocabulary and a set of concepts that advanced our understanding.
But they also ran into a wall.
The concepts they developed were descriptive rather than prescriptive. Knowing that a system exhibits power law behavior tells you the shape of what will happen without telling you the specifics. You couldn't pick these principles up and use them to intervene in the world with precision.
There's a parallel in linguistics. Chomsky showed that all human languages share deep recursive structure. True, and essentially irrelevant to the language modeling that actually learned to do something with language. The universal principles were correct, but too general to be operable.
Complex systems remained resistant to science. But we tried anyway. Economics attempted to become the physics of human markets. We built elegant mathematical models with perfectly rational agents and perpetual equilibrium. The models were so mathematically pristine that physicists who encountered them marveled at the technique while questioning whether any of it described the actual world.
Pharmacology tried to treat the body as a complicated machine, targeting individual pathways with individual molecules. Sometimes it works brilliantly. Sometimes it works partially. And often it doesn't work at all, because the body is a web of interactions that doesn't respect the boundaries we draw around individual mechanisms.
The pattern repeated everywhere we applied Enlightenment tools to complex problems. Partial success, persistent failure, and the lingering sense that we were missing something fundamental.
There's an old pattern in science. Practice comes first.
Blacksmiths worked metal for millennia before metallurgy existed as a discipline. Medieval architects built Gothic cathedrals that still stand today without any formal understanding of structural engineering. Farmers selectively bred crops for thousands of years before anyone had heard of genetics.
In each case, practitioners developed reliable and useful capabilities without any theoretical understanding of the underlying mechanisms. And then, when theory finally caught up, it didn't just explain what practitioners were already doing. It blew the doors open. Metallurgy didn't just explain blacksmithing, it gave us titanium alloys and semiconductors. Structural engineering didn't just explain cathedrals, it gave us skyscrapers.
I think we're in an analogous moment with complexity.
The tools of modern AI, from deep neural networks to transformer architectures, let us build compressed models of complex systems that actually work. We can do things with them. But we are, in a meaningful sense, the blacksmiths. We make improvements through intuition and experiment. We know what works without fully understanding why.
The Santa Fe Institute spent the late 1980s building early prototypes of exactly these tools. Researchers there created artificial stock markets with adaptive agents that spontaneously produced bubbles and crashes. They built self-organizing networks and genetic algorithms. But the models remained too small to be operable, and the elegant law of self-organization they hoped to discover never materialized.
So why do today's models work when SFI's didn't?
Not because we found better equations. Because the theory these problems require is simply very large, and we finally have tools that can hold it.
Elegant equations might not exist for complex systems. The most compressed possible representation of how a complex system behaves might still be billions of parameters large. Larger than anything a human brain can hold in working memory. For as long as our only tool for operationalizing theories was the human mind armed with pencil and paper, these problems were simply beyond our reach.
They aren't anymore.
Take large language models. Fundamentally, a large language model is a compressed model of an extraordinarily complex system, the totality of human language use, which itself reflects human thought, culture, social dynamics, and reasoning. The compression ratio is enormous. The model is unimaginably smaller than the system it represents. That makes it a theory of that system, in every sense that matters, a lossy but useful representation that lets you make predictions and run counterfactuals.
It's just not a theory that fits on a t-shirt.
There's a reasonable objection to everything I've argued so far, and it comes from the physicist and philosopher David Deutsch. Deutsch holds that good explanations are compact and general, hard to vary without breaking. The more caveats and carve-outs a theory requires, the worse it smells. E=mc² has reach because it applies universally and you can't tinker with it. A lookup table of experimental results does not.
By this standard, a billion-parameter neural network doesn't look like a theory. It might give you useful predictions about a particular complex system, but it offers no portable understanding. You can't pick it up and carry it to a new problem.
Deutsch would look at "the model is the theory" and see capitulation.
This objection has force. But it rests on a conflation.
When we talk about a trained model, we're talking about the weights. Billions of numerical parameters encoding what the model learned from a specific dataset. Those weights are large and parochial.
But the architecture of the model, the structure that made learning possible in the first place, is something else entirely.
The architecture of a transformer can be described on a few sheets of paper. Attention mechanisms, feed-forward layers, residual connections, layer normalization. And this same compact structure, when trained on language, learns language. Trained on protein structures, it learns protein folding. Trained on weather patterns, it learns weather.
That's reach.
So perhaps there are two layers of theory here. The system-specific layer, the trained weights, is large and particular to its domain. This will likely always be true. The theory of this economy or this climate will always be vast.
But the meta-layer, the minimal architecture that can learn to represent arbitrary complex systems, might be compact and universal. It might be exactly the kind of good explanation Deutsch would champion.
If that's right, the physics of complexity would look different from what anyone at the Santa Fe Institute expected. It would not be a law about how complex systems behave. It would be a description of what structure can learn them.
Andrej Karpathy's work on nanoGPT is, in a practical sense, a search for exactly this. The smallest possible implementation that can still be trained to model complex phenomena. Strip away everything that isn't load-bearing. What's left?
We haven't found it yet. The transformer might not be the final answer. But for the first time, we have candidate architectures that demonstrably work across wildly different domains of complexity.
The architecture might be compact, but the trained models remain vast and opaque. And there's a tempting conclusion to draw from this. We've built useful oracles, but oracles aren't science.
The emerging field of mechanistic interpretability suggests otherwise. Researchers are developing tools to understand how neural networks do what they do, from network ablation and selective activation to feature visualization and circuit tracing. These techniques let you study a trained model the way a biologist studies an organism, through careful experimentation and observation.
By studying how these models internally represent complex phenomena, we may extract more compressible truths about the phenomena themselves. If a neural network trained on climate data develops internal representations that cluster certain variables together in unexpected ways, that's a clue about the structure of the underlying system.
The model becomes not just a tool for prediction, but a specimen for study.
In this light, mechanistic interpretability might be the actual emerging science of complexity. The method is different from anything in the Enlightenment toolkit. You don't start with first principles and derive equations. You train a model that captures the behavior of a complex system, and then you study the model to discover what structure it found.
The theory is extracted from the compression, rather than the compression being derived from the theory.
It's early, but the direction is promising.
If this framing is right, many of the hardest problems facing humanity, from chronic disease and addiction to poverty and climate, were never fundamentally intractable. They were just too complex for the only medium of theory we had.
And now we have a new medium.
The problems remain hard. Building a sufficiently rich model of a complex system is an enormous undertaking. And the epistemology shifts in ways that might be uncomfortable. Instead of "I understand the causal mechanism and can predict what happens if I change X," you get something more like "I have a sufficiently rich model that I can simulate what happens if I change X, with probabilistic confidence." The answers are distributions, not deterministic outputs. That's a different kind of knowing.
But it might be the kind of knowing these problems actually admit.
We spent centuries wishing complex systems would yield to terse, elegant theories. The models that capture any particular complex system will probably always be large. But the structure that can learn them all might yet prove to be small.
It's remarkable how much of reality turned out to be modelable by theories that fit in a few symbols. Perhaps it shouldn't be remarkable at all that not everything can be.