Intelligence is foundation
Podcast Subscribe
Voices & Thought Leaders Monday, 13 April 2026

Why GPU Scarcity Broke the AI Pricing Model

Share: LinkedIn
Why GPU Scarcity Broke the AI Pricing Model

The economics of AI just flipped. OpenAI, Google, and Anthropic aren't competing on marginal cost anymore - they're competing on opportunity cost. There's a fixed amount of compute. Every query you run is a query someone else can't. That changes everything.

Ben Thompson's latest piece on Stratechery explore how compute constraints are reshaping the entire AI industry. Frontier labs now face a problem that's fundamentally different from the one they had 18 months ago. It's not "how cheap can we make this?" - it's "what's the best use of the GPUs we have?"

Marginal Cost vs Opportunity Cost

Most tech businesses run on marginal cost economics. Serving one more user costs you a bit more in server capacity, bandwidth, storage. The cost scales with usage, but it's predictable. You can lower prices to gain market share because your costs drop as you optimise.

AI labs are hitting a different constraint. They have a fixed pool of GPUs. Every query uses up capacity that could have been used for something else. That's opportunity cost. If you're running a cheap API call on GPT-4, you're using compute that could have powered a high-margin enterprise contract or trained the next model.

This matters because it breaks the usual competitive playbook. You can't just undercut competitors on price to gain volume - volume might actually hurt you if it fills your GPUs with low-margin work. The question becomes: what's the highest-value use of each GPU-hour?

The Compute Allocation Problem

Thompson points to the decisions labs are making right now. OpenAI could allocate more GPUs to ChatGPT API calls, but that pulls capacity away from training GPT-5. Google could prioritise Gemini inference, but that slows down research on the next breakthrough. Anthropic could open Claude to more developers, but that limits capacity for high-paying enterprise customers.

Every choice has a trade-off. And unlike software where you can spin up more servers, you can't just add more GPUs. Supply is constrained. NVIDIA's production is spoken for. Custom chips take years to develop. The compute you have today is the compute you're stuck with for the next 12-18 months.

This shifts competitive dynamics in unexpected ways. A lab with slightly worse models but better compute allocation could outcompete a lab with better models but poor prioritisation. It's not just about building the best AI - it's about deploying it in the smartest way.

What This Means for Pricing

The API price cuts we saw in 2024 don't make sense in an opportunity cost world. If compute is your constraint, dropping prices just fills your capacity with lower-margin work. The rational move is to raise prices or ration access, not compete on cost.

But labs are still dropping prices. Why? Thompson's argument: they're buying market position now, betting that compute constraints will ease later. Get developers building on your API today, even at a loss, so they're locked in when supply improves.

That's a gamble. If compute stays tight - if training the next generation of models keeps eating all available GPUs - then today's low-margin API customers become tomorrow's problem. You've filled your capacity with work that doesn't cover the opportunity cost.

The Longer Game

Thompson also flags the pressure this puts on model efficiency. If GPUs are scarce, the winning move is to get more out of each one. Smaller models that perform like larger ones. Faster inference. Better quantisation. The labs that solve this problem can serve more customers with the same hardware.

This is where we're seeing real innovation. Distillation techniques that compress GPT-4 performance into a model half the size. On-device inference that shifts compute off the cloud entirely. Edge deployment that reduces the load on centralised GPUs.

The scarcity isn't just a constraint - it's forcing better engineering. When compute was cheap and abundant, there was no pressure to optimise. Now there is. The models that win in 2025 won't just be the most capable - they'll be the most efficient.

What Changes for Builders

If you're building on AI APIs, this shift matters. Pricing might not keep dropping. Access might get rationed during peak periods. The labs with the best models might not be the ones with the most reliable availability.

The smart move: build on multiple providers, optimise for efficiency, and assume compute will stay expensive. The era of "throw more GPUs at it" is over. The era of "use the GPUs you have intelligently" is just starting.

Read the full analysis at Stratechery.

More Featured Insights

Builders & Makers
Building a Shell by Ignoring the Shell Entirely
Robotics & Automation
Robotic Guide Dogs That Talk Back

Video Sources

Theo (t3.gg)
How does Claude Code *actually* work?
Google DeepMind
What's new in Gemma 4?
AI Revolution
China's New Self Improving Open AI Beats OpenAI

Today's Sources

DEV.to AI
I wanted to build a shell. I built a PTY proxy instead.
Towards Data Science
Your ReAct Agent Is Wasting 90% of Its Retries - Here's How to Stop It
DEV.to AI
Memory Hierarch
Towards Data Science
Stop Treating AI Memory Like a Search Problem
Towards Data Science
Write Pandas Like a Pro With Method Chaining Pipelines
The Robot Report
Binghamton researchers create robotic guide dogs that walk - and talk
ROS Discourse
Fast Lossless Image Compression: interested?
ROS Discourse
New "ROS Adopters" page is live - ADD YOUR PROJECT
Ben Thompson Stratechery
Mythos, Muse, and the Opportunity Cost of Compute
Jack Clark Import AI
Import AI 453: Breaking AI agents; MirrorCode; and ten views on gradual disempowerment
Gary Marcus
Even more good news for the future of neurosymbolic AI

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Free Daily Briefing

Start Every Morning Smarter

Luma curates the most important AI, quantum, and tech developments into a 5-minute morning briefing. Free, daily, no spam.

  • 8:00 AM Morning digest ready to listen
  • 1:00 PM Afternoon edition catches what you missed
  • 8:00 PM Daily roundup lands in your inbox

We respect your inbox. Unsubscribe anytime. Privacy Policy

© 2026 MEM Digital Ltd t/a Marbl Codes
About Sources Podcast Audio Privacy Cookies Terms Thou Art That
RSS Feed