Intelligence is foundation
Subscribe
  • Luma
  • About
  • Sources
  • Ecosystem
  • Nura
  • Marbl Codes
00:00
Contact
[email protected]
Connect
  • YouTube
  • LinkedIn
  • GitHub
Legal
Privacy Cookies Terms
  1. Home›
  2. Featured›
  3. Voices & Thought Leaders›
  4. Token Prices Fell 90% But Your AI Bill Went Up - The Ghost Token Problem
Voices & Thought Leaders Monday, 25 May 2026

Token Prices Fell 90% But Your AI Bill Went Up - The Ghost Token Problem

Share: LinkedIn
Token Prices Fell 90% But Your AI Bill Went Up - The Ghost Token Problem

OpenAI's API prices have dropped 90% in two years. So why are AI bills climbing?

The answer, according to Azeem Azhar's analysis, is ghost tokens - the massive context re-reads and tool calls that agents consume behind the scenes, invisible to the developer until the invoice arrives.

Agents Are Not Chatbots

A chatbot conversation might use 500 tokens per exchange. An agent performing the same task can consume 50,000 tokens - or more.

The difference is in how agents work. They don't just respond to a prompt. They read context repeatedly, call tools, process results, re-read context again to decide what to do next, then loop through that cycle until the task completes.

Every time an agent re-reads your context window to make a decision, that's another few thousand tokens consumed. Every tool call response gets fed back into the model, adding more tokens. A single task can trigger dozens of these cycles.

That's where ghost tokens live - in the hidden overhead of agentic workflows that developers don't see until they check usage metrics.

The Token Economics Nobody Warned You About

Here's the maths that breaks budgets: if you're building a chatbot, you can estimate costs pretty accurately. User asks question, model responds, done. Predictable token usage, predictable costs.

But if you're building an agent that researches a topic, calls APIs, synthesises results, and produces a report, token usage becomes unpredictable. One query might trigger five tool calls. Another might trigger fifty. The user sees the same interface. The bill reflects the difference.

Azhar's piece highlights the asymmetry: token prices dropped because cloud providers got more efficient. But token consumption per task increased by orders of magnitude because agents changed how we use AI.

Lower unit cost, higher volume, net result: higher bills.

Why This Matters for Builders

If you're building on AI APIs, this economics shift changes your pricing model. You can't charge per query anymore - not when one query might cost 10x more than another depending on how many tool calls the agent triggers.

Usage-based pricing becomes unpredictable for your customers. Fixed pricing becomes unprofitable for you. That's a real problem for any business trying to build sustainable margins on agentic AI.

The builders solving this are the ones instrumenting everything. They're tracking not just total tokens, but tokens per tool call, tokens per context re-read, tokens per reasoning step. That visibility is the only way to understand where costs are actually going.

The Control Mechanisms That Work

Some teams are setting hard token limits per task and failing gracefully when agents hit them. Others are caching context aggressively to reduce re-reads. The most sophisticated are building routing layers that send simple queries to cheap models and complex tasks to expensive ones.

But all of these approaches require infrastructure that most teams haven't built yet. The assumption was that cheaper tokens would make AI costs a non-issue. The reality is that agentic patterns made token consumption the new bottleneck.

The companies that figure out ghost token management early will have a significant cost advantage. The ones that don't will find their margins disappearing into context re-reads they didn't know were happening.

Read Azeem Azhar's full analysis

More Featured Insights

Builders & Makers
Making Architecture Legible to AI - The Three Layers of seed4j
Robotics & Automation
A Recycling Facility Went Live in a Week - How Physical AI Hit Industrial Scale

Video Sources

Theo (t3.gg)
Cursor just crushed Claude Code
AI Engineer
How Google DeepMind Runs Agents at Scale
AI Engineer
Scaling the Next Paradigm of Heterogeneous Intelligence

Today's Sources

DEV.to AI
The AI Triforce of seed4j: Power, Wisdom, and Courage for Your Dev Agent
DEV.to AI
The Voice-to-Material Magic: How AI Turns On-Site Dictation into Precise Parts Lists
Hacker News Best
Jira Is Turing-Complete
The Robot Report
Sortera uses physical AI to double capacity in a Tennessee sorting facility
The Robot Report
FANUC partners with Google to advance physical AI in its robots
ROS Discourse
ROS 2 Lyrical Luth and 11 years of Fast DDS as ROS 2 default middleware
ROS Discourse
QERRA-v2 Classical: Practical integration as a Behavior Tree Condition node
Azeem Azhar
Why AI bills rise as costs fall
Benedict Evans
Predicting AI job exposure
Addy Osmani
The Orchestration Tax is You
Gary Marcus
Checking the math behind OpenAI and Anthropic's latest headlines

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Richard Bland
About Sources Privacy Cookies Terms Thou Art That
MEM Digital Ltd t/a Marbl Codes
Co. 13753194 (England & Wales)
VAT: 400325657
24-25 High Street, Wellingborough, NN8 4JZ
© 2026 MEM Digital Ltd