The AI Labs Are Running Out of Computers

OpenAI is turning away customers. So is Anthropic. Not because demand dropped - because they physically cannot serve more requests. The AI boom just hit a hard constraint nobody was quite ready for.

Azeem Azhar reports that AI labs are rationing compute, passing on business opportunities they would have taken six months ago. OpenAI's CFO said it publicly. Anthropic tightened API limits. H100 GPU prices hit 18-month highs.

This isn't a temporary blip. It's a fundamental shift from hype to scarcity.

What Rationing Actually Looks Like

The signs have been visible for months if you knew where to look. Anthropic quietly reduced rate limits. OpenAI's enterprise tier started having "capacity conversations" with new customers. Smaller labs stopped accepting new API keys entirely.

Now it's official. The labs have more demand than they can serve with existing infrastructure. Building new datacentres takes years. Securing GPU allocations from NVIDIA takes months and significant capital. The gap between what customers want and what labs can deliver is widening.

For developers building on these platforms, this changes the maths entirely. Your application might work perfectly in testing, then hit capacity limits in production. Your users might get rate-limited during peak hours. Your costs might spike as labs prioritise enterprise customers willing to pay premium rates for guaranteed capacity.

The GPU Market Has Flipped

H100 prices tell the story. After falling steadily through late 2024, they've reversed course and hit 18-month highs. Not because the chips got better - because labs are competing for finite supply.

NVIDIA can only manufacture so many chips. Datacentres can only install them so quickly. Power infrastructure can only support so much compute density. These are physical constraints, not software problems you can patch.

The cloud providers - AWS, Azure, Google Cloud - are all facing the same crunch. They're prioritising existing enterprise customers over new workloads. Spot instance availability for GPU compute has dropped. Reserved instances now require longer commitments and higher minimums.

This affects every layer of the AI stack. If the foundation model providers can't get compute, the application developers can't get API access. If the cloud providers can't get GPUs, the companies trying to run their own models can't get infrastructure.

The Efficiency Opportunity

Constraints drive innovation. When compute is abundant, nobody optimises. When it's scarce and expensive, efficiency becomes valuable again.

We're already seeing it. Smaller, faster models that run on less hardware. Techniques like quantisation and distillation that preserve capability while cutting compute requirements. Architectures designed for inference efficiency, not just training performance.

The labs rationing compute are also the ones investing heavily in efficiency research. Not out of altruism - out of necessity. If you can serve 10x more customers on the same hardware, rationing becomes less of a problem.

For builders, this creates opportunities. The teams that figure out how to deliver value with less compute have a structural advantage. The applications that run locally instead of calling APIs avoid the capacity problem entirely. The startups that optimise for efficiency from day one won't hit the same scaling walls.

What This Means for the Hype Cycle

AI development is about to slow down. Not because the technology stopped improving - because the infrastructure can't keep pace with demand.

The companies that raised funding based on aggressive growth projections are going to struggle. You can't 10x your user base if your API provider is rationing access. You can't prove product-market fit if capacity constraints prevent you from onboarding customers.

This is different from previous tech infrastructure challenges. When web traffic exceeded server capacity, you could rent more servers. When mobile app usage exploded, you could spin up more cloud instances. With AI compute, you're competing for a finite resource that takes years to expand.

The winners in this environment won't be the ones with the most ambitious roadmaps. They'll be the ones with the most efficient implementations, the strongest relationships with compute providers, or the ability to deliver value without relying on external APIs.

The AI boom isn't over. But the phase where anyone could access unlimited compute at declining prices has ended. What comes next depends on who adapts fastest to the new constraints.