The Inference Era Begins: Agents, Tiny Teams, Real Work

The Inference Era Begins: Agents, Tiny Teams, Real Work

Today's Overview

Last week, Jensen Huang stood at GTC and told the world something the numbers already knew: the era of training large models is ending. The era of running them-constantly, at scale, across millions of users and billions of agents-is beginning. This shift from training economics to inference economics changes everything: the hardware we build, the companies that survive, and how you should be thinking about deploying AI today.

The Token Economy Is Real Now

Azeem Azhar caught the thing everyone's circling around but not saying directly: two years ago, he used 150,000 tokens per day. Last week, he used 870 million tokens in a single day-with agents handling the work while he slept. That's not a typo. That's a 5,800x increase in two years. NVIDIA's inference chips (Groq, Vera Rubin) arriving at 35x better throughput per watt than Blackwell aren't speculative. They're a bet on a world where token budgets become productive inputs, like electricity or office space. Most large companies are still treating token budgets like IT cost-center line items. That gap is the opportunity.

Building Gets Smaller, Gets Faster

The practical story this week came from Dreamer, David Singleton's new platform built with Hugo Barra and Nicholas Checkoff. Six people built the entire system you just watched. Seventeen now. Singleton came from Stripe (payments) and Android (ecosystems). He's building agent tooling the way mobile app stores were built: discover, build, share, get paid. What matters: he wrote a conference app in 25 minutes of actual work. A thing that would have cost agencies six figures five years ago. And this is the pattern: tiny teams, agents doing the scaffolding, friction collapsing. If you're hiring engineers today, Singleton's interview revealed his interview loop: watch how they prompt agents, how they chain multiple agents together, how they think about the work. The era of "can you write code on a whiteboard" is ending. The era of "can you make agents work for you" is starting.

Agents Are Hitting Real Work

RoboForce raised $52 million to commercialize Titan-a dual-armed outdoor robot doing solar, mining, shipping. Bedrock Robotics is moving construction equipment from supervised autonomy to fully autonomous. Ottobot is delivering meals at remote mining villages. These aren't pilot projects or hype cycles anymore. They're operational robots doing the dangerous work humans shouldn't. The constraint now isn't whether AI can do the work. It's how to collect high-quality outdoor data, how to handle real-world variability, how to make the safety story bulletproof. That's a different problem entirely-and a problem with moat.

The pattern underneath all of this is the same: inference at scale requires different hardware, different software patterns, different team structures, and different economics. Companies that organise around that shift-that treat agents as a native primitive rather than a feature bolted onto existing systems-will be the ones that move fast. The rest are still optimising for a world that's already ending.