Intelligence is foundation
Subscribe
  • Luma
  • About
  • Sources
  • Ecosystem
  • Nura
  • Marbl Codes
00:00
Contact
[email protected]
Connect
  • YouTube
  • LinkedIn
  • GitHub
Legal
Privacy Cookies Terms
  1. Home›
  2. Featured›
  3. Voices & Thought Leaders›
  4. 5.5 Trillion Tokens a Day: What China's AI Usage Reveals
Voices & Thought Leaders Sunday, 3 May 2026

5.5 Trillion Tokens a Day: What China's AI Usage Reveals

Share: LinkedIn
5.5 Trillion Tokens a Day: What China's AI Usage Reveals

Zhipu AI - one of China's major language model providers - processes 5.5 trillion tokens per day. That's not an estimate. That's reported usage. And it changes the conversation about AI adoption entirely.

Azeem Azhar spent time in China recently and came back with numbers that don't fit the Western narrative. While US tech discourse focuses on model capabilities and API pricing, China is running AI at a scale that suggests something fundamentally different is happening. This isn't experimentation. This is production infrastructure.

The Compute Constraint Nobody's Talking About

5.5 trillion tokens per day means Zhipu is processing roughly 64 million tokens per second. For context, that's the equivalent of generating the entire text of "War and Peace" every second, continuously, all day. And that's just one provider in one country.

The compute required for this is staggering. Even with aggressive optimisation - batch inference, model quantisation, edge deployment - you're looking at tens of thousands of GPUs running flat out. And Zhipu isn't the only player. Alibaba, Baidu, and ByteDance are all running similar scale operations.

The constraint isn't model quality anymore. It's compute availability. The bottleneck has shifted from "can we build a model that works?" to "can we get enough GPUs to serve the demand?" And that shift has downstream effects on everything from chip supply chains to data centre energy consumption.

The Paradox of AI Engineers

Azhar highlights something uncomfortable: many AI engineers privately believe their work will displace significant portions of the workforce, but won't say it publicly. It's not malice. It's cognitive dissonance. You can't build tools designed to automate human tasks while simultaneously denying those tools will automate human tasks.

The problem isn't the technology. The problem is the gap between what builders know and what they're willing to say. If the engineers building the systems think displacement is coming, but the public conversation remains focused on augmentation and productivity, there's a credibility issue. And that issue makes it harder to prepare for the actual impact.

This isn't about fear-mongering. It's about honest assessment. If AI tools are genuinely capable of replacing entire categories of work - and the usage numbers from China suggest they are - then the social, economic, and policy responses need to match that reality. Pretending it's just a productivity boost doesn't help anyone.

What 5.5 Trillion Tokens Actually Means

Token consumption at this scale reveals what people are actually using AI for. It's not just chatbots. It's not just creative writing. It's customer service automation, code generation, document processing, real-time translation, and content moderation. The usage patterns show AI embedded into operational infrastructure, not sitting on top of it as a novelty.

For businesses watching from outside China, the implication is clear: AI adoption isn't a future trend. It's happening now, at scale, in production environments. The gap between experimental use and operational dependence is closing faster than most organisations realise.

The second implication is about compute costs. If token consumption continues growing at this rate, inference costs become a major line item. Serving billions of requests per day isn't cheap, even with optimised models. The companies that figure out how to run inference efficiently - locally, on-device, with smaller models - will have a cost advantage that compounds over time.

The Moral Loophole

Azhar's piece touches on what he calls "AI's moral loopholes" - the ways builders justify potentially harmful outcomes by focusing on immediate benefits. It's the classic trolley problem, but distributed across millions of deployment decisions. Each individual choice seems reasonable. The aggregate effect is less clear.

The challenge is that nobody has a good framework for evaluating these trade-offs at scale. When does productivity enhancement become workforce displacement? When does automation become deskilling? When does efficiency become dependency? These aren't rhetorical questions. They're design decisions baked into every AI deployment.

The usage data from China suggests we're past the point of theoretical debate. AI is operational infrastructure now. The question isn't whether it will displace work - it already is. The question is what we do about it.

Read Azeem Azhar's full analysis at Exponential View.

More Featured Insights

Builders & Makers
Planning Replaced Coding as the Bottleneck
Robotics & Automation
The Two Numbers Keeping Robots from Freezing Mid-Task

Video Sources

AI Engineer
Software Engineering Is Becoming Plan and Review - Louis Knight-Webb, Vibe Kanban
AI Engineer
I Gave an AI Agent the Keys to My Life (Here's What Happened) - Radek Sienkiewicz
AI Engineer
Shipping complex AI applications - Braintrust & Trainline
AI Engineer
Human-in-the-Loop Automation with n8n - Liam McGarrigle
AI Revolution
This AI Is Scarier Than AGI, ASI and Terminator
Dwarkesh Patel
What Is the Pentagon's Plan With Anthropic?

Today's Sources

DEV.to AI
I Stopped Restarting HTTP Connections Between AI Models. Here Is What I Use Instead.
DEV.to AI
How to Build a Code Assistant Chatbot with the Claude API and Python
The Robot Report
Phase stability regulator based on two dynamic parameters for autonomous mobile robots
The Robot Report
Why deformable materials are physical AI's real manufacturing test
Robohub
Robot Talk Episode 154 - Visual navigation in insects and robots, with Andrew Philippides
The Robot Report
FAULHABER designs DualGear for autonomous logistics systems
Azeem Azhar
Exponential View #572: AI's moats, myths and moral loopholes
Gary Marcus
Richard Dawkins and The Claude Delusion

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Richard Bland
About Sources Privacy Cookies Terms Thou Art That
MEM Digital Ltd t/a Marbl Codes
Co. 13753194 (England & Wales)
VAT: 400325657
3-4 Brittens Court, Clifton Reynes, Olney, MK46 5LG
© 2026 MEM Digital Ltd