Zhipu AI - one of China's major language model providers - processes 5.5 trillion tokens per day. That's not an estimate. That's reported usage. And it changes the conversation about AI adoption entirely.
Azeem Azhar spent time in China recently and came back with numbers that don't fit the Western narrative. While US tech discourse focuses on model capabilities and API pricing, China is running AI at a scale that suggests something fundamentally different is happening. This isn't experimentation. This is production infrastructure.
The Compute Constraint Nobody's Talking About
5.5 trillion tokens per day means Zhipu is processing roughly 64 million tokens per second. For context, that's the equivalent of generating the entire text of "War and Peace" every second, continuously, all day. And that's just one provider in one country.
The compute required for this is staggering. Even with aggressive optimisation - batch inference, model quantisation, edge deployment - you're looking at tens of thousands of GPUs running flat out. And Zhipu isn't the only player. Alibaba, Baidu, and ByteDance are all running similar scale operations.
The constraint isn't model quality anymore. It's compute availability. The bottleneck has shifted from "can we build a model that works?" to "can we get enough GPUs to serve the demand?" And that shift has downstream effects on everything from chip supply chains to data centre energy consumption.
The Paradox of AI Engineers
Azhar highlights something uncomfortable: many AI engineers privately believe their work will displace significant portions of the workforce, but won't say it publicly. It's not malice. It's cognitive dissonance. You can't build tools designed to automate human tasks while simultaneously denying those tools will automate human tasks.
The problem isn't the technology. The problem is the gap between what builders know and what they're willing to say. If the engineers building the systems think displacement is coming, but the public conversation remains focused on augmentation and productivity, there's a credibility issue. And that issue makes it harder to prepare for the actual impact.
This isn't about fear-mongering. It's about honest assessment. If AI tools are genuinely capable of replacing entire categories of work - and the usage numbers from China suggest they are - then the social, economic, and policy responses need to match that reality. Pretending it's just a productivity boost doesn't help anyone.
What 5.5 Trillion Tokens Actually Means
Token consumption at this scale reveals what people are actually using AI for. It's not just chatbots. It's not just creative writing. It's customer service automation, code generation, document processing, real-time translation, and content moderation. The usage patterns show AI embedded into operational infrastructure, not sitting on top of it as a novelty.
For businesses watching from outside China, the implication is clear: AI adoption isn't a future trend. It's happening now, at scale, in production environments. The gap between experimental use and operational dependence is closing faster than most organisations realise.
The second implication is about compute costs. If token consumption continues growing at this rate, inference costs become a major line item. Serving billions of requests per day isn't cheap, even with optimised models. The companies that figure out how to run inference efficiently - locally, on-device, with smaller models - will have a cost advantage that compounds over time.
The Moral Loophole
Azhar's piece touches on what he calls "AI's moral loopholes" - the ways builders justify potentially harmful outcomes by focusing on immediate benefits. It's the classic trolley problem, but distributed across millions of deployment decisions. Each individual choice seems reasonable. The aggregate effect is less clear.
The challenge is that nobody has a good framework for evaluating these trade-offs at scale. When does productivity enhancement become workforce displacement? When does automation become deskilling? When does efficiency become dependency? These aren't rhetorical questions. They're design decisions baked into every AI deployment.
The usage data from China suggests we're past the point of theoretical debate. AI is operational infrastructure now. The question isn't whether it will displace work - it already is. The question is what we do about it.
Read Azeem Azhar's full analysis at Exponential View.