When Ben Thompson sits down with NVIDIA's CEO, you know the conversation is going somewhere deeper than product announcements. This interview is about where the entire AI infrastructure stack is heading - and what constraints actually matter.
The headline most people will focus on is NVIDIA's acquisition of Groq and what that means for disaggregated inference. But buried in the conversation is something more fundamental: Huang believes energy, not silicon, is becoming the limiting factor in AI.
The Full-Stack Pivot Nobody Saw Coming
NVIDIA built its empire on GPUs. Chips that excel at parallel processing - the kind of computation AI models need for training and inference. But Huang is making a move that surprised even seasoned industry watchers: shifting NVIDIA toward full-stack AI infrastructure.
What does that actually mean? Instead of just selling the chips and letting others figure out how to build systems around them, NVIDIA is increasingly providing the entire architecture - hardware, software, networking, and orchestration - as a unified offering.
The Groq acquisition accelerates this. Groq's technology is designed for disaggregated inference - splitting the work of running AI models across multiple, distributed processors rather than doing everything on a single massive chip. Think of it like the difference between one giant data centre and a distributed network of smaller nodes. The latter is often more efficient, more flexible, and easier to scale.
For businesses running AI workloads, this could mean lower costs and more predictable performance. For NVIDIA, it means controlling more of the value chain. Instead of just being the chip supplier, they're positioning themselves as the infrastructure platform.
Why CPUs Matter Again (and Why That's Weird)
Here's something unexpected: Huang talked about NVIDIA's CPU strategy. For a company that made its name on GPUs, this feels like a plot twist.
The reason is agent workloads. AI agents - systems that can reason, plan, and act autonomously - don't just need raw parallel processing power. They need the kind of sequential, branching logic that CPUs handle better than GPUs. Agents are making decisions, not just crunching matrices.
In simpler terms - training a model is like lifting heavy weights. You need brute strength (GPUs). Running an agent is like playing chess. You need strategic thinking (CPUs). NVIDIA recognises that the future isn't just about training bigger models. It's about running smarter agents. And that requires a different kind of chip.
The Energy Problem Nobody Wants to Talk About
This is where the conversation gets uncomfortable. Huang made a point that should be front-page news but probably won't be: energy is now the constraint, not chips.
We've spent years worrying about chip shortages and fabrication capacity. But even if you could manufacture unlimited GPUs tomorrow, you'd still hit a wall. Data centres running AI workloads consume staggering amounts of power. And power grids have physical limits.
Building a new power plant takes years. Upgrading grid infrastructure takes even longer. Meanwhile, AI demand is growing exponentially. The maths doesn't work.
Huang's answer? Make everything more efficient. Squeeze more computation out of every watt. Design chips that do more with less. Optimise software to reduce waste. It's not glamorous, but it's the only path forward that doesn't involve choosing which industries get power and which don't.
Geopolitics and the Tech Arms Race
Thompson pushed Huang on geopolitical competition - specifically, the growing tension around AI development between the US and China. Huang was careful, but his concern was clear. He believes restricting access to technology might slow others down temporarily, but it also accelerates their motivation to build domestic alternatives.
In other words - chip export controls might buy time, but they don't stop the race. They just change the shape of it. And in the long run, technology tends to flow around barriers, not stop at them.
The question nobody has a good answer to yet: how do you balance open research and collaboration (which accelerates progress for everyone) with national security concerns (which demand control and restriction)? Huang doesn't claim to have the solution. But he clearly thinks the current approach is more complicated than it looks.
What This Means for Builders
If you're building with AI, the takeaway is this: the infrastructure beneath your applications is shifting fast. NVIDIA isn't just selling you chips anymore. They're offering an entire stack. That could make things easier - or it could lock you into a single vendor's ecosystem.
And the energy constraint? That's real. If you're planning AI workloads at scale, power consumption isn't just an environmental concern. It's a business constraint. Efficiency isn't optional anymore.
The age of "just throw more compute at it" is ending. What comes next is optimisation, architecture, and making every watt count.