Physical AI Heats Up; Agents Rewrite Software

Today's Overview

It's a week where the physical and digital worlds are colliding hard. On one side, robotics companies are raising serious money to solve real-world problems in factories and warehouses. On the other, AI agents-particularly coding agents-are fundamentally changing how software gets built. And underneath both, there's an emerging question about what it means to understand the systems we've created.

Physical AI Finds Its Footing

RLWRLD, a Seoul-based startup building robot foundation models for industrial environments, just raised $26 million in its second seed round. That brings their total to $42 million. What makes this noteworthy isn't just the funding-it's their approach. Instead of training robots in clean labs with perfect conditions, they're training directly in real factories, warehouses, and service environments. Crowded spaces, variable lighting, human interaction, unpredictable conditions. That messiness is the training data. They're working with logistics giants like CJ Logistics and Lotte on actual warehouse deployments, not theoretical pilots. This is the difference between published research and operational robotics. And it's why investors are betting on it.

Meanwhile, Teledyne FLIR just released the Lepton XDS, a compact thermal-and-visible camera module aimed at everything from fire detection to EV battery monitoring to robotic navigation. Starting at $109. It's the kind of enabling technology that makes building robots cheaper and faster. And at the University of Waterloo, researchers got a swarm of robot painters to translate music into light trails-robots collaborating with humans to create art that responds to emotion in sound. Not immediately practical, but it tells you something about where the thinking is heading: robots as creative partners, not just industrial workhorse.

Coding Agents Cross a Line

The real story this week isn't about bigger models. It's about agents that actually work. Andrej Karpathy, one of the sharpest engineers in deep learning, recently described what he called a "phase change" in coding agents since December. He handed off an entire end-to-end deployment task-SSH keys, downloading a model, setting up a server, writing a UI, systemd configuration, reporting-with minimal intervention. It worked. The agent didn't just generate code snippets. It understood the task, maintained coherence across dozens of steps, and adapted when things went wrong.

This matters because it's moving coding agents from impressive demo territory into actually useful territory. Perplexity launched "Computer," explicitly building multi-agent orchestration into their product-routing tasks to specialist models (research agents, coding agents, media agents) rather than one monolithic system. OpenAI released GPT-5.3-Codex in the API. Cline announced ~25% speed improvements and higher token efficiency. Claude Code marked its first birthday. And GitHub Copilot CLI went GA with a new /research feature that can do repo-wide analysis.

The pattern is clear: coding isn't a single task anymore. It's a workflow. And agents are the interface that lets humans orchestrate that workflow at speed. The question everyone's asking now isn't "can AI write code?" It's "what does a software team look like when agents handle the factories that build your software?"

The Unspoken Cost

But here's the uncomfortable bit. In an Exponential View conversation, neuroscientist Nita Farahany raised something worth sitting with: when AI handles more of your work, you stop developing intuition about whether it's right. Doctors using AI diagnosis tools get worse at diagnosing when the AI is turned off. Younger gastroenterologists never learn to spot polyps without AI assistance. The skill degrades. Not because AI is bad, but because competence requires practice, and offloading removes the practice.

Meanwhile, Gary Marcus is scared. He's watching the Trump administration pressure companies like Anthropic to give unrestricted AI access for military and surveillance use. He's read the data: in simulated nuclear crises, AI models chose nuclear escalation in 95% of cases. The systems we're building are not reliable enough for the stakes we're putting them in. And yet the pressure to deploy them everywhere-immediately-keeps accelerating.

Physical AI is getting real. Coding agents are reshaping how work happens. Open-weight models are proliferating at pace (Qwen3.5, GLM-5, MiniMax M2.5 all released this month alone). But the thread connecting all of this is a question we're not asking loudly enough: as these systems get more capable and more embedded in our workflows, who's responsible for understanding whether they're right? And what happens to human judgment when the machines handle so much of the thinking?