Dexterous Hands and Real-Time Voices Push Robotics Forward

Today's Overview

This week, the conversation shifted from what robots could do to what they can actually do-and how fast. RLWRLD released RLDX-1, a foundation model built from the ground up for five-finger manipulation. Unlike earlier VLAs that treat dexterity as an afterthought, RLDX-1 bakes in motion tracking, force sensing, and memory. The numbers matter less than what they mean: a robot can now grasp a coffee pot and pour it without spilling as the weight shifts, pick a moving object off a conveyor, or retrain itself mid-deployment with a handful of corrections. This is the difference between a model that can push something and a model that understands contact.

Interaction Models Cross a Threshold

Thinking Machines released their interaction models this week-AI trained from scratch for continuous, full-duplex conversation rather than bolted onto a turn-based LLM. TML-Interaction-Small processes audio, video, and text simultaneously, responding at 200ms intervals without the "thinking" pause. The demos feel different: the model counts your pushups in real time, interrupts you mid-sentence naturally, and speaks while listening. It's not a speed improvement on old architecture. It's a different interface assumption entirely. Benchmarks matter (it beats GPT-4o Realtime and Gemini 3.1), but the real shift is that models trained this way now feel like they're in the room with you, not reading a transcript.

Deployment Infrastructure Hardens

OpenAI announced the Deployment Company, a new unit staffed with 150 forward-deployed engineers (via the Tomoro acquisition). This is Palantir's playbook applied to AI: embedding technical staff inside customer operations to translate frontier models into real workflows. It signals that the hard part of AI adoption isn't the model anymore-it's the last mile. At the same time, agent orchestration is maturing fast. Claude Code agents can now persist across sessions, manage multiple parallel tasks, and integrate directly into Slack or your development CLI. The infrastructure is no longer duct tape and prompts. It's becoming boring, reliable, and productionised.

For builders, the pattern is clear: dexterity requires task-specific modalities (torque, contact state, motion), interaction models need to be trained end-to-end for their interface (not retrofitted), and deployment demands embedded expertise. The robots and AI agents that work at scale are the ones designed with their constraints in mind from the start.