Why GPU Scarcity Broke the AI Pricing Model

The economics of AI just flipped. OpenAI, Google, and Anthropic aren't competing on marginal cost anymore - they're competing on opportunity cost. There's a fixed amount of compute. Every query you run is a query someone else can't. That changes everything.

Ben Thompson's latest piece on Stratechery explore how compute constraints are reshaping the entire AI industry. Frontier labs now face a problem that's fundamentally different from the one they had 18 months ago. It's not "how cheap can we make this?" - it's "what's the best use of the GPUs we have?"

Marginal Cost vs Opportunity Cost

Most tech businesses run on marginal cost economics. Serving one more user costs you a bit more in server capacity, bandwidth, storage. The cost scales with usage, but it's predictable. You can lower prices to gain market share because your costs drop as you optimise.

AI labs are hitting a different constraint. They have a fixed pool of GPUs. Every query uses up capacity that could have been used for something else. That's opportunity cost. If you're running a cheap API call on GPT-4, you're using compute that could have powered a high-margin enterprise contract or trained the next model.

This matters because it breaks the usual competitive playbook. You can't just undercut competitors on price to gain volume - volume might actually hurt you if it fills your GPUs with low-margin work. The question becomes: what's the highest-value use of each GPU-hour?

The Compute Allocation Problem

Thompson points to the decisions labs are making right now. OpenAI could allocate more GPUs to ChatGPT API calls, but that pulls capacity away from training GPT-5. Google could prioritise Gemini inference, but that slows down research on the next breakthrough. Anthropic could open Claude to more developers, but that limits capacity for high-paying enterprise customers.

Every choice has a trade-off. And unlike software where you can spin up more servers, you can't just add more GPUs. Supply is constrained. NVIDIA's production is spoken for. Custom chips take years to develop. The compute you have today is the compute you're stuck with for the next 12-18 months.

This shifts competitive dynamics in unexpected ways. A lab with slightly worse models but better compute allocation could outcompete a lab with better models but poor prioritisation. It's not just about building the best AI - it's about deploying it in the smartest way.

What This Means for Pricing

The API price cuts we saw in 2024 don't make sense in an opportunity cost world. If compute is your constraint, dropping prices just fills your capacity with lower-margin work. The rational move is to raise prices or ration access, not compete on cost.

But labs are still dropping prices. Why? Thompson's argument: they're buying market position now, betting that compute constraints will ease later. Get developers building on your API today, even at a loss, so they're locked in when supply improves.

That's a gamble. If compute stays tight - if training the next generation of models keeps eating all available GPUs - then today's low-margin API customers become tomorrow's problem. You've filled your capacity with work that doesn't cover the opportunity cost.

The Longer Game

Thompson also flags the pressure this puts on model efficiency. If GPUs are scarce, the winning move is to get more out of each one. Smaller models that perform like larger ones. Faster inference. Better quantisation. The labs that solve this problem can serve more customers with the same hardware.

This is where we're seeing real innovation. Distillation techniques that compress GPT-4 performance into a model half the size. On-device inference that shifts compute off the cloud entirely. Edge deployment that reduces the load on centralised GPUs.

The scarcity isn't just a constraint - it's forcing better engineering. When compute was cheap and abundant, there was no pressure to optimise. Now there is. The models that win in 2025 won't just be the most capable - they'll be the most efficient.

What Changes for Builders

If you're building on AI APIs, this shift matters. Pricing might not keep dropping. Access might get rationed during peak periods. The labs with the best models might not be the ones with the most reliable availability.

The smart move: build on multiple providers, optimise for efficiency, and assume compute will stay expensive. The era of "throw more GPUs at it" is over. The era of "use the GPUs you have intelligently" is just starting.

Read the full analysis at Stratechery.