Jensen Huang walked on stage at GTC 2026 and said the quiet part out loud: the era of training models is ending. The era of running them - inference - is beginning.
This isn't just a shift in priorities. It's a complete reordering of economics, infrastructure, and who wins. Training was about building the engine. Inference is about running it billions of times a day, profitably, at scale. And NVIDIA just announced the stack to make that possible.
Vera Rubin and Feynman: The Inference Architecture
Two new platforms anchored the keynote. Vera Rubin is NVIDIA's inference-optimised chip architecture - designed specifically for running models in production, not training them. Where Hopper and Blackwell were built for training workloads, Vera Rubin is built for speed and efficiency at the edge of deployment.
Feynman is the orchestration layer - the software that routes inference requests across hardware, optimises for latency and cost, and scales elastically. Think of it as AWS Lambda, but for AI agents. You don't provision servers. You don't manage capacity. You send a request, the model runs, you pay per token.
This is the infrastructure that makes token-powered economies viable. If training was about who could build the biggest model, inference is about who can run models cheapest, fastest, and most reliably. NVIDIA just made a serious claim to own that layer.
OpenClaw: The Standard That Matters
But the most important announcement wasn't hardware. It was OpenClaw - an open standard for agentic AI workflows. This is the missing piece. We have models. We have APIs. What we don't have is a common language for agents to coordinate, share context, and act autonomously across systems.
OpenClaw proposes a specification for how agents declare capabilities, request actions, and return results. It's not a product. It's a protocol. And if it gains adoption, it could do for AI agents what HTTP did for the web - create a lingua franca that lets everything talk to everything else.
The thesis here is simple: agentic AI only works at scale if agents can interoperate. Right now, every AI system is a walled garden. OpenClaw is the proposal to tear down the walls. Whether it succeeds depends on adoption beyond NVIDIA's ecosystem - but the fact that it's open is a strong signal.
The Inference Transition Changes the Players
Here's what this means practically. In the training era, the winners were the companies with the most compute, the most data, the biggest parameter counts. OpenAI, Google, Anthropic - the model builders.
In the inference era, the winners will be the companies that can run models efficiently, cheaply, and at massive scale. That's a different game. It favours infrastructure players like NVIDIA, cloud providers like AWS and Azure, and startups that can optimise inference costs down to fractions of a cent per request.
For developers, this is excellent news. Inference costs have been the hidden tax on AI products. Every chatbot interaction, every code completion, every image generation - it all costs tokens. As those costs drop, the economics of AI-first products flip. What was too expensive to build six months ago is suddenly viable.
For businesses, the question shifts from "can we afford to train a model?" to "can we afford NOT to run inference at scale?" The barrier to entry just dropped. The companies that move fastest on inference infrastructure will have a cost advantage that compounds over time.
The Robotics Angle Nobody's Talking About
There's a reason this keynote happened at a robotics-focused event. Inference is what makes robots viable. Training a model once is expensive but manageable. Running inference on every robot, in real-time, with sub-100ms latency? That's a different engineering challenge entirely.
Vera Rubin and Feynman are designed for exactly this use case. Low latency, high throughput, optimised for edge deployment. If you're building humanoid robots, autonomous vehicles, or warehouse automation, this is the stack you've been waiting for.
And yes, Luma - another robotics story. I know. I'm fine. The robots are fine. Everything's fine.
Jensen Huang just declared the rules of the game have changed. The training era made the models possible. The inference era makes them useful. And NVIDIA just built the infrastructure to win it.