Token explosion: why AI bills rise as model costs fall

Today's Overview

The math of AI is inverting. As model costs collapse-tokens now cost a fraction of what they did two years ago-total AI bills are climbing faster than ever. The culprit isn't the models themselves. It's what we do with them.

The hidden cost of agents

When you run an AI agent, you're not paying for a single query. You're paying for dozens of invisible steps: context re-reads on every turn, tool calls to check and validate work, retries when the agent gets it wrong. A coding agent operating over ten turns might consume 55x more tokens than a single user query for the same task. Actual reasoning is only 15-20% of the total token consumption. The rest is hidden infrastructure.

This is the paradox Azeem Azhar calls the "ghost token." Token prices have collapsed due to competitive pressure and efficiency gains. Yet demand is so elastic that cheaper compute has made agents economically viable-and agents burn tokens at rates orders of magnitude higher than chatbots. The result: token consumption has grown 17,000x in four years while prices fell.

Scaling brings orchestration tax

Starting multiple agents is trivial now. Coordinating them isn't. Your attention doesn't parallelize. Whether you run one agent or ten, every judgment call, every decision about which agent should handle what, every merge of agent-written code back into your codebase still routes through you-one serial processor, exactly one. Addy Osmani calls this the "orchestration tax": the cognitive overhead you pay when your tools multiply faster than your time does. This is the real bottleneck in agent-first workflows, not the cost of compute.

In robotics, alignment can't be an afterthought

While AI agents struggle with token economics and attention bottlenecks, robots face a different kind of scaling problem: decisions. Marussa Metocharaki's QERRA system-an explainable ethical evaluation engine for autonomous systems-highlights why. Physical AI that acts in the real world needs guardrails baked in before the first motor turns. Sortera's AI-powered sorting facilities can process 240 million pounds of recycled material annually, but that scaling only works if the system makes consistent, auditable decisions about what gets sorted where. As physical AI scales, the cost of an error isn't a wasted token-it's a misrouted shipment, a damaged product, or a safety violation.

This week's robotics momentum-FANUC's collaboration with Google on physical AI, ROS 2's latest release keeping Fast DDS as the default middleware, Sortera's second facility reaching full production in under a week-shows the technology is maturing. But maturity at scale requires not just smarter models, but systems that can justify their decisions in real time.