Uber and ServiceNow exhausted their 2026 token budgets in four months. Not exceeded them. Exhausted them. The tokens they'd allocated for next year are already gone, and we're barely into 2025.
That's not an outlier. According to Azeem Azhar's latest data, 71% of companies exceeded their AI budgets in 2025. Enterprise monthly AI spending hit $85,000 on average - a 36% increase from the previous year. And for half of finance leaders, cost management is now their primary AI concern. Not capability. Not integration. Cost.
The Token Economics Problem
Here's what's happening: companies adopt AI tools expecting incremental costs. A bit of GPT here, some Claude there. The budget feels manageable. Then usage explodes. Every team wants access. Every workflow gets an AI layer. Every customer interaction becomes an API call. The per-token cost is tiny, but tokens add up faster than anyone anticipated.
The term "tokenmaxxing" captures it perfectly - organisations are optimising for maximum token usage without understanding the cumulative cost. It's the cloud computing bill problem all over again, but faster. At least with cloud computing, you could see the server costs mounting. With AI, you're burning through tokens in the background of every Slack message, every document summary, every code completion.
For finance teams, this is a nightmare. You can't forecast something that scales this unpredictably. You can't budget for a tool where usage is determined by how many people in your organisation discover they can ask it questions. Traditional software had per-seat pricing. AI has per-thought pricing. That's a fundamentally different cost model.
Who This Hurts Most
Large enterprises can absorb the overrun. Annoying, but manageable. For smaller businesses and startups, this is existential. If your product relies on AI and your costs are scaling faster than your revenue, you don't have a product - you have a subsidy waiting to run out.
Developers building on top of AI APIs are caught in the same trap. You can ship a feature in a weekend using GPT-4, but if every user interaction costs you money and you haven't figured out monetisation, you're just funding OpenAI's growth with your own runway. The economics only work if you're charging enough to cover the token costs plus margin. Most aren't.
The companies that ARE making this work are the ones who saw the cost curve early and adapted. They're caching aggressively. They're using smaller models for simple tasks and reserving the expensive ones for where they actually add value. They're prompt-engineering for efficiency, not just capability. They're treating tokens like a finite resource, because in any given budget cycle, they are.
The Self-Hosting Conversation
This is why the conversation around local models and self-hosting is intensifying. If you're Uber-scale, burning through 2026's budget in four months, at some point you do the maths on running your own infrastructure. The upfront cost is higher, but the per-token cost drops to nearly zero once you've built it.
Meta releasing Llama. Mistral pushing open weights. Anthropic's focus on efficiency. None of this is altruism. It's a land grab for the moment when enterprises realise API costs don't scale and start looking for alternatives. The companies that make self-hosting easy and economically viable will capture the next wave.
For now, though, most businesses are still in the "figure out the bill later" phase. They're prioritising capability over cost because the capability is too compelling to ignore. But finance teams are starting to push back. When half of them name cost management as their top concern, that's not background noise. That's a forcing function.
What Happens Next
One of three things: prices drop, usage gets controlled, or companies move to self-hosted models. Probably all three in parallel. OpenAI and Anthropic know this. They're already cutting prices to stay competitive. But they can only drop prices so far before their own economics stop working.
The businesses that survive this phase will be the ones who treat AI costs like cloud costs were treated five years ago - as something that requires active management, not passive acceptance. That means instrumentation, monitoring, and governance. Boring infrastructure work. But necessary.
Uber and ServiceNow burning through 2026's budget isn't a failure. It's a signal. The AI adoption curve is steeper than anyone priced for. And the companies still pretending token costs don't matter are about to have a very uncomfortable budget review.