Intelligence is foundation
Subscribe
  • Luma
  • About
  • Sources
  • Ecosystem
  • Nura
  • Marbl Codes
00:00
Contact
[email protected]
Connect
  • YouTube
  • LinkedIn
  • GitHub
Legal
Privacy Cookies Terms
  1. Home›
  2. Featured›
  3. Voices & Thought Leaders›
  4. Companies Are Burning Through 2026 AI Budgets Already
Voices & Thought Leaders Monday, 18 May 2026

Companies Are Burning Through 2026 AI Budgets Already

Share: LinkedIn
Companies Are Burning Through 2026 AI Budgets Already

Uber and ServiceNow exhausted their 2026 token budgets in four months. Not exceeded them. Exhausted them. The tokens they'd allocated for next year are already gone, and we're barely into 2025.

That's not an outlier. According to Azeem Azhar's latest data, 71% of companies exceeded their AI budgets in 2025. Enterprise monthly AI spending hit $85,000 on average - a 36% increase from the previous year. And for half of finance leaders, cost management is now their primary AI concern. Not capability. Not integration. Cost.

The Token Economics Problem

Here's what's happening: companies adopt AI tools expecting incremental costs. A bit of GPT here, some Claude there. The budget feels manageable. Then usage explodes. Every team wants access. Every workflow gets an AI layer. Every customer interaction becomes an API call. The per-token cost is tiny, but tokens add up faster than anyone anticipated.

The term "tokenmaxxing" captures it perfectly - organisations are optimising for maximum token usage without understanding the cumulative cost. It's the cloud computing bill problem all over again, but faster. At least with cloud computing, you could see the server costs mounting. With AI, you're burning through tokens in the background of every Slack message, every document summary, every code completion.

For finance teams, this is a nightmare. You can't forecast something that scales this unpredictably. You can't budget for a tool where usage is determined by how many people in your organisation discover they can ask it questions. Traditional software had per-seat pricing. AI has per-thought pricing. That's a fundamentally different cost model.

Who This Hurts Most

Large enterprises can absorb the overrun. Annoying, but manageable. For smaller businesses and startups, this is existential. If your product relies on AI and your costs are scaling faster than your revenue, you don't have a product - you have a subsidy waiting to run out.

Developers building on top of AI APIs are caught in the same trap. You can ship a feature in a weekend using GPT-4, but if every user interaction costs you money and you haven't figured out monetisation, you're just funding OpenAI's growth with your own runway. The economics only work if you're charging enough to cover the token costs plus margin. Most aren't.

The companies that ARE making this work are the ones who saw the cost curve early and adapted. They're caching aggressively. They're using smaller models for simple tasks and reserving the expensive ones for where they actually add value. They're prompt-engineering for efficiency, not just capability. They're treating tokens like a finite resource, because in any given budget cycle, they are.

The Self-Hosting Conversation

This is why the conversation around local models and self-hosting is intensifying. If you're Uber-scale, burning through 2026's budget in four months, at some point you do the maths on running your own infrastructure. The upfront cost is higher, but the per-token cost drops to nearly zero once you've built it.

Meta releasing Llama. Mistral pushing open weights. Anthropic's focus on efficiency. None of this is altruism. It's a land grab for the moment when enterprises realise API costs don't scale and start looking for alternatives. The companies that make self-hosting easy and economically viable will capture the next wave.

For now, though, most businesses are still in the "figure out the bill later" phase. They're prioritising capability over cost because the capability is too compelling to ignore. But finance teams are starting to push back. When half of them name cost management as their top concern, that's not background noise. That's a forcing function.

What Happens Next

One of three things: prices drop, usage gets controlled, or companies move to self-hosted models. Probably all three in parallel. OpenAI and Anthropic know this. They're already cutting prices to stay competitive. But they can only drop prices so far before their own economics stop working.

The businesses that survive this phase will be the ones who treat AI costs like cloud costs were treated five years ago - as something that requires active management, not passive acceptance. That means instrumentation, monitoring, and governance. Boring infrastructure work. But necessary.

Uber and ServiceNow burning through 2026's budget isn't a failure. It's a signal. The AI adoption curve is steeper than anyone priced for. And the companies still pretending token costs don't matter are about to have a very uncomfortable budget review.

More Featured Insights

Builders & Makers
Why Your AI Agent Keeps Lying to Itself
Robotics & Automation
Boston Dynamics Shows How Atlas Actually Learns

Video Sources

AI Engineer
Build Agents That Run for Hours (Without Losing the Plot) - Ash Prabaker & Andrew Wilson, Anthropic
AI Engineer
Harnesses in AI: A Deep Dive - Tejas Kumar, IBM
AI Engineer
Fighting AI with AI - Lawrence Jones, Incident
Boston Dynamics YouTube
How does Atlas learn? | Inside the Lab | Boston Dynamics

Today's Sources

PyImageSearch
LLM Observability with Self-Hosted Langfuse and vLLM
Towards Data Science
Why Your AI Demo Will Die in Production
Towards Data Science
The Next AI Bottleneck Isn't the Model: It's the Inference System
Robohub
Table tennis robot defeats some of world's best players - why this has major implications for robotics
The Robot Report
Fraunhofer IPA offers new test benchmark for humanoids
The Robot Report
Mind Robotics raises $400M to scale AI-powered robots in manufacturing
Azeem Azhar
📈 Data to start your week: The cost of tokenmaxxing
Jack Clark Import AI
Import AI 457: AI stuxnet; cursed Muon optimizer; and positive alignment
Ben Thompson Stratechery
Data Center Discontent, Understanding the Opposition, Fixing the Problem

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Richard Bland
About Sources Privacy Cookies Terms Thou Art That
MEM Digital Ltd t/a Marbl Codes
Co. 13753194 (England & Wales)
VAT: 400325657
24-25 High Street, Wellingborough, NN8 4JZ
© 2026 MEM Digital Ltd