When Reasoning Becomes Physical, Custom Agents Ship

Today's Overview

Boston Dynamics announced this week that Spot will now run on Gemini Robotics-ER 1.6, Google DeepMind's reasoning-first model. The upgrade moves beyond object recognition into higher-order visual reasoning-gauges, safety compliance, even detecting puddles. This matters because it's not about what the robot sees anymore. It's about what it understands about what it's seeing, and what that means in an industrial context. The robot improves automatically through cloud-side updates, no downtime required.

Meanwhile, Notion quietly became a full AI agent platform this week after four or five complete rebuilds since 2022. The journey teaches something uncomfortable: the difference between wrapping a model and building a system that actually works. Early attempts at agents failed because tool-calling didn't exist, context windows were too short, and the models weren't reasoning well enough. What changed wasn't just better models-it was realizing that the harness matters as much as the intelligence inside it. Notion's team discovered this by shipping early, iterating constantly, and being willing to delete months of work when the architecture no longer fit the moment.

The Hidden Cost Isn't Tokens

Notion's pricing strategy reveals something the industry hasn't fully admitted: not all tokens are equal, and not all intelligence is needed for every task. They introduced "credits" as an abstraction over raw token costs because infrastructure, model type, and serving tier all factor into true expense. A simple email-filtering task shouldn't cost the same as complex reasoning. Yet most platforms charge uniformly or hide the complexity. Notion's team also discovered that users don't want speed when tasks run in the background-they want the right answer at the right cost. That alignment between capability, price, and latency is where the real product design happens.

The Builders Are Looking Past Foundation Models

What struck us most: neither Boston Dynamics nor Notion is betting heavily on training their own foundation models. Boston Dynamics builds the harness-the tools, the robot body, the reasoning layer. Notion builds the system of record and the evaluation framework. Both are investing deeply in retrieval, ranking, search optimization, and tool design instead. The frontier labs (OpenAI, Anthropic, Google) move so fast that the ROI on training flips constantly. The real moat isn't in foundation models anymore-it's in knowing which model to use for which job, and building the infrastructure around that decision-making.

This week marked a quiet shift: the age of "one model to rule them all" ended. Now it's about orchestration, evaluation, and cost-aware routing. Builders who understand this will own the next layer.