Zoox Goes National. Agents Get Harnesses. APIs Get Cheap.

Today's Overview

Three distinct threads emerged this week that tell us something about where AI is actually heading-away from hype, toward infrastructure.

First, the physical world is getting serious. Zoox announced a multi-city expansion of its autonomous robotaxi service to Austin, Miami, Las Vegas, and San Francisco. The company has logged nearly 2 million autonomous miles and transported over 350,000 passengers in purpose-built vehicles. What's interesting isn't the expansion itself-it's the validation: after a decade of development, Zoox is the only company operating a fully autonomous ride-hailing service in a vehicle designed specifically for robotaxis, not retrofitted consumer cars. That design choice, once questioned, now looks prescient. Meanwhile, Agile Robots partnered with Google DeepMind to integrate Gemini Robotics foundation models into its industrial humanoid. The collaboration creates a data flywheel: real-world deployments improve models, better models enable wider deployment. This is how physical AI scales.

The Harness, Not the Model

The second thread is about infrastructure. The latest Latent Space report notes a surge in CLIs for agents-Stripe launched Projects.dev, Ramp released their CLI, ElevenLabs shipped voice CLI, and the list goes on. CLIs matter because agents need to *do* things, not just chat. And doing things requires protocols that let models call services, provision infrastructure, and manage state. What's quietly becoming clear: the model is no longer the bottleneck. The harness is. Middleware, memory, task orchestration, tool interfaces-these are where real differentiation happens. Claude is a great model. But Claude wrapped in proper architecture, with persistent memory and constrained behavior, is a different product entirely. That's why so many builders are focusing on the layers around the model, not the model itself.

Costs and Architecture

The third thread hits the economics of building. One builder cut their LLM API costs by 60% by routing simple tasks to cheaper models and complex ones to premium models. The insight is simple: most AI apps send every request to the most expensive model. TokenRouter sits between your code and the API, classifying requests by complexity and routing accordingly. Email summaries don't need GPT-4. Ticket classification doesn't need Sonnet. The cost difference is 30x. That's the sort of problem that becomes increasingly important as you scale-at startup speeds, model costs dominate your budget. Another piece articulated a deeper design problem: most AI products treat the model as a request-response engine, not a persistent interaction system. Users expect continuity, memory, consistent tone. But stateless architectures can't deliver that. The fix requires rethinking: interaction loops, memory-backed state, behavioral constraints. It's not a model problem. It's an architecture problem.

The pattern across all three: the low-hanging fruit in AI isn't models anymore. It's everything around them-robotics platforms that work at scale, harnesses that let agents actually function, and systems that are built for interaction, not just inference.