Meta and xAI just spent enormous sums proving a point Gary Marcus has been making for years: scaling alone isn't enough. Both companies ran large-scale experiments designed to push the boundaries of what's possible with massive compute and data. Both came up short.
In his latest piece, Marcus documents the specifics. These weren't small pilots. These were costly, high-profile bets on the scaling hypothesis, the idea that bigger models trained on more data will eventually solve AI's fundamental limitations.
The hypothesis didn't hold.
What Actually Happened
Marcus is careful with the details here. The failures aren't about models performing badly in absolute terms. They're about models failing to achieve the expected improvements despite massive resource investment. Performance curves that were supposed to climb steeply... plateaued.
This is the expensive evidence that matters. When you're operating at Meta or xAI's scale, plateaus aren't just disappointing, they're existential. If scaling doesn't deliver the promised gains, the entire strategic direction needs reconsideration.
For context, Marcus has long advocated for neurosymbolic approaches, systems that combine neural networks with symbolic reasoning. The argument: pure scaling hits diminishing returns because current architectures lack fundamental capabilities like logical reasoning, causal understanding, and robust generalisation.
The Meta and xAI results suggest he might be right.
Why This Matters Beyond AI Labs
If you're building with AI, this has practical implications. The scaling hypothesis influenced everything from investment decisions to product roadmaps. The assumption was: models will keep getting better, capabilities will emerge naturally, just wait for the next generation.
That assumption is now questionable. Which means builders need different strategies. Instead of waiting for the next GPT-N to solve your problem, you might need hybrid approaches. Combine language models with structured reasoning. Use retrieval systems. Build in explicit verification steps.
This is actually good news for practical builders. It means the advantage shifts from whoever has the most compute to whoever combines tools most effectively. You don't need Google's infrastructure to build something useful.
The Neurosymbolic Alternative
Marcus has been talking about neurosymbolic AI for years, often to scepticism from the scaling-first camp. The idea is straightforward: neural networks excel at pattern recognition, symbolic systems excel at logical reasoning. Combine them.
In practice, this means language models that can interface with knowledge graphs, reasoning engines, and verification systems. It means architectures that don't just predict the next token but can actually check their work against logical rules.
This isn't a radical new idea. It's AI's version of "use the right tool for the job." But it was overlooked during the scaling boom because scaling kept delivering results. Now that scaling is hitting limits, hybrid approaches look more attractive.
What Comes Next
The Meta and xAI experiments don't invalidate scaling entirely. Larger models are still more capable than smaller ones, all else being equal. But "all else being equal" is doing a lot of work in that sentence.
What these results suggest is that architecture matters as much as scale. How you train matters as much as how much you train. And sometimes, adding more compute doesn't solve the underlying problem, it just makes the limitations more expensive.
For researchers, this is a call to explore alternative architectures. For builders, it's validation that clever engineering beats brute force. For business owners evaluating AI investments, it's a reminder that bigger models don't automatically mean better results for your specific use case.
Marcus frames this as vindication, and on the technical merits, he's earned it. But the more interesting story is what happens now. If scaling isn't the path forward, what is? The answer is probably messier, more diverse, and more interesting than anyone expected.