Token Prices Fell 90% But Your AI Bill Went Up - The Ghost Token Problem

OpenAI's API prices have dropped 90% in two years. So why are AI bills climbing?

The answer, according to Azeem Azhar's analysis, is ghost tokens - the massive context re-reads and tool calls that agents consume behind the scenes, invisible to the developer until the invoice arrives.

Agents Are Not Chatbots

A chatbot conversation might use 500 tokens per exchange. An agent performing the same task can consume 50,000 tokens - or more.

The difference is in how agents work. They don't just respond to a prompt. They read context repeatedly, call tools, process results, re-read context again to decide what to do next, then loop through that cycle until the task completes.

Every time an agent re-reads your context window to make a decision, that's another few thousand tokens consumed. Every tool call response gets fed back into the model, adding more tokens. A single task can trigger dozens of these cycles.

That's where ghost tokens live - in the hidden overhead of agentic workflows that developers don't see until they check usage metrics.

The Token Economics Nobody Warned You About

Here's the maths that breaks budgets: if you're building a chatbot, you can estimate costs pretty accurately. User asks question, model responds, done. Predictable token usage, predictable costs.

But if you're building an agent that researches a topic, calls APIs, synthesises results, and produces a report, token usage becomes unpredictable. One query might trigger five tool calls. Another might trigger fifty. The user sees the same interface. The bill reflects the difference.

Azhar's piece highlights the asymmetry: token prices dropped because cloud providers got more efficient. But token consumption per task increased by orders of magnitude because agents changed how we use AI.

Lower unit cost, higher volume, net result: higher bills.

Why This Matters for Builders

If you're building on AI APIs, this economics shift changes your pricing model. You can't charge per query anymore - not when one query might cost 10x more than another depending on how many tool calls the agent triggers.

Usage-based pricing becomes unpredictable for your customers. Fixed pricing becomes unprofitable for you. That's a real problem for any business trying to build sustainable margins on agentic AI.

The builders solving this are the ones instrumenting everything. They're tracking not just total tokens, but tokens per tool call, tokens per context re-read, tokens per reasoning step. That visibility is the only way to understand where costs are actually going.

The Control Mechanisms That Work

Some teams are setting hard token limits per task and failing gracefully when agents hit them. Others are caching context aggressively to reduce re-reads. The most sophisticated are building routing layers that send simple queries to cheap models and complex tasks to expensive ones.

But all of these approaches require infrastructure that most teams haven't built yet. The assumption was that cheaper tokens would make AI costs a non-issue. The reality is that agentic patterns made token consumption the new bottleneck.

The companies that figure out ghost token management early will have a significant cost advantage. The ones that don't will find their margins disappearing into context re-reads they didn't know were happening.

Read Azeem Azhar's full analysis