Token prices have fallen dramatically. OpenAI, Anthropic, Google - they've all slashed pricing over the past year. By some measures, tokens are now 10x cheaper than they were 18 months ago.
So why are AI bills going up?
This is one of those situations where the obvious answer turns out to be completely wrong. Lower prices should mean lower costs. But in AI, task complexity is increasing faster than prices are falling. And that gap is swallowing every efficiency gain.
The LLMflation Problem
There's a term for this: LLMflation. As models get more capable, developers use them for more complex tasks. Those tasks consume more tokens. Even at lower per-token prices, the total cost climbs.
Here's what the numbers look like in practice. A simple Q&A chatbot might use 50-500 tokens per interaction. That's cheap, even at old pricing. But a multi-agent system - where several AI models collaborate, plan, and iterate on a task - can burn through 1 million tokens or more for a single workflow.
Think of it like mobile data pricing. When data got cheaper, people didn't use the same amount and pay less. They started streaming video. Usage exploded. Total bills stayed the same or went up, even though the cost per gigabyte dropped.
AI is following the exact same pattern. Cheaper tokens don't reduce usage. They enable new use cases that consume tokens at a completely different scale.
Token Consumption Scales: From Simple to Staggering
To understand why costs are rising, you need to see how token consumption scales with task complexity:
Simple Q&A: 50-500 tokens per interaction. This is what most people think of when they imagine using an AI. Quick question, quick answer. Cheap.
Document analysis: 5,000-50,000 tokens. Now you're feeding context - contracts, reports, technical documentation. The model needs to read, understand, and synthesise. Token count jumps.
Multi-step workflows: 100,000-500,000 tokens. An agent that researches a topic, drafts content, revises based on feedback, and formats output is making multiple passes. Each step consumes tokens. Each iteration adds cost.
Multi-agent collaboration: 1,000,000+ tokens. Multiple AI models working together, each with their own context, planning and reasoning in parallel. This is where costs spiral. And this is where the industry is heading.
The pattern is clear. As AI capabilities improve, developers build more ambitious systems. Those systems consume orders of magnitude more tokens. Even at 10x lower prices, the total bill climbs.
Margin Compression at Scale
For API providers - OpenAI, Anthropic, Google - this creates a brutal dynamic. They're cutting prices to stay competitive. But their infrastructure costs aren't falling nearly as fast.
Compute, energy, and network costs drop slowly. Token prices drop fast. The gap between what they charge and what it costs to deliver is narrowing. That's margin compression. And at scale, it becomes unsustainable.
Some providers are betting they can offset lower margins with higher volume. Others are banking on efficiency improvements in model inference. But there's a limit to how far you can compress margins before the economics stop working.
This is why you're seeing API providers start to differentiate on features, not just price. Faster response times. Better quality outputs. Specialised models for specific tasks. They're trying to shift the conversation away from cost-per-token and toward value-per-task.
AI FinOps: The Discipline Nobody Wanted
All of this has given birth to something new: AI FinOps. Financial operations for AI spending. It's like cloud FinOps, but messier because token consumption is harder to predict and optimise.
If you're running AI workloads at any meaningful scale, you need to start thinking like a FinOps engineer:
Track consumption by task type. Not all prompts are created equal. Know which workflows are burning through tokens and why.
Optimise prompt design. Verbose prompts cost more. Every unnecessary word is wasted tokens. Engineers are learning to write tight, efficient prompts the same way they used to optimise database queries.
Cache aggressively. If you're asking the same question repeatedly, don't reprocess it every time. Cache results. Reuse outputs. Reduce redundant API calls.
Choose the right model for the task. Don't use a frontier model for tasks a smaller, cheaper model can handle. Match capability to need. Overpaying for intelligence you don't need is wasteful.
Set budget alerts. Token consumption can spiral fast. Know when you're approaching limits before the bill arrives.
What This Means for Builders
If you're building with AI, the lesson is simple: falling token prices are not a free pass to ignore costs. In fact, they're the opposite. They're an invitation to build more complex systems that will cost more, even at lower unit prices.
The developers who will succeed in this environment are the ones who treat token consumption as a first-class concern - not an afterthought. Design for efficiency. Measure ruthlessly. Optimise continuously.
Because the cost of AI isn't going down. It's just changing shape. And if you're not paying attention, it'll catch you off guard.