Cerebras went public at a $60 billion valuation. For a company that's spent years building chips most people have never heard of, that's a statement.
But the valuation isn't the story. It's what Cerebras is actually doing with those chips - and what their CFO accidentally confirmed in the process. They're serving OpenAI's trillion-parameter models. The ones that haven't been announced yet. Models 5.4 and 5.5, running on Cerebras infrastructure, handling inference at scale.
The Training Era is Over
For years, the AI hardware race was about training. Who could build the biggest cluster, train the largest model, hit the lowest loss curve. Nvidia won that game so decisively that it became boring to watch. The interesting question shifted: once you've trained a frontier model, how do you serve it to millions of users without bankrupting yourself on compute costs?
That's the inference problem. And it's where Cerebras has been quietly positioning itself while everyone else fought over training budgets.
Their wafer-scale chips - single silicon wafers the size of a dinner plate - were always overkill for training. But for inference? For serving a trillion-parameter model to users who expect sub-second responses? That's where wafer-scale architecture starts making sense. You get the entire model in one place, no inter-chip communication latency, no network bottlenecks. Just raw throughput.
What OpenAI Sees in Cerebras
OpenAI doesn't outsource inference lightly. The fact that they're running unreleased models on Cerebras hardware tells you something about the economics. Either Cerebras is significantly cheaper than their internal infrastructure, or it's significantly faster, or both.
The Latent Space breakdown suggests this is about serving cost per token. As models get larger, the traditional approach - scattering inference across a cluster of GPUs - gets expensive fast. Every hop between chips costs time and power. Cerebras eliminates those hops.
This matters beyond OpenAI. Every AI company faces the same problem: training costs are a one-time expense, but inference costs compound with every user. If you're serving millions of queries per day, shaving milliseconds and microdollars off each request changes the entire business model.
The Contrarian Bet That Paid Off
Cerebras has been building wafer-scale chips since 2016. For most of that time, it looked like an expensive science experiment. Why would you build a chip the size of a plate when you could just use more GPUs? Why bet on a completely different architecture when Nvidia's ecosystem was already mature?
The answer is starting to show up in the numbers. As the industry shifts from "can we train this model?" to "can we afford to serve this model?", the economics flip. What looked like over-engineering for training becomes essential infrastructure for inference.
The $60 billion valuation assumes Cerebras captures a meaningful slice of the inference market. That's a big assumption. But if they do - if wafer-scale becomes the standard for serving frontier models - then every AI lab and every enterprise deploying large models becomes a potential customer.
What This Means for Builders
If you're building on top of large language models, inference cost is probably your biggest variable expense. The models keep getting larger, the user expectations keep getting higher, and the bill keeps growing. Cerebras entering the public markets at this valuation signals that the big labs believe inference infrastructure is worth competing for.
That's good news for builders. Competition in inference infrastructure means downward price pressure, which means more ambitious products become economically viable. What costs too much to serve today might be feasible in six months.
The other signal: OpenAI trusting Cerebras with unreleased models suggests the technology is production-ready, not a research curiosity. If you're evaluating infrastructure for serving models at scale, wafer-scale is now a serious option - not just a future bet.
Cerebras spent years building hardware for a problem most people didn't realise they had yet. Now the problem is obvious, the hardware is proven, and the $60 billion valuation reflects how much the market thinks that infrastructure is worth. Whether they can defend that valuation depends on one thing: can they make inference cheap enough that trillion-parameter models become practical for everyone, not just OpenAI?