Intelligence is foundation
Subscribe
  • Luma
  • About
  • Sources
  • Ecosystem
  • Nura
  • Marbl Codes
00:00
Contact
[email protected]
Connect
  • YouTube
  • LinkedIn
  • GitHub
Legal
Privacy Cookies Terms
  1. Home›
  2. Featured›
  3. Robotics & Automation›
  4. Robots Learn From YouTube Now - No Sensors Required
Robotics & Automation Saturday, 9 May 2026

Robots Learn From YouTube Now - No Sensors Required

Share: LinkedIn
Robots Learn From YouTube Now - No Sensors Required

A robot learning to fold laundry doesn't watch another robot fold laundry. It watches you.

That's the insight driving a fundamental shift in how robots learn tasks - one that removes the most expensive bottleneck in robotics development. Eric Chan at Rhoda AI has been working on what he calls "direct video action models" - systems that train robots by watching internet video of humans doing things, not by collecting terabytes of sensor data from other robots.

The traditional approach was ruinously expensive. You needed to physically demonstrate a task hundreds or thousands of times, recording every motor position, every sensor reading, every joint angle. Want to teach a robot to open different types of doors? That's weeks of manual demonstration across dozens of door types, all carefully logged. The data collection cost more than building the robot.

How Video Action Models Work

Chan's approach sidesteps this entirely. The model watches video of humans performing a task - opening doors, folding clothes, pouring liquid - and learns the underlying action pattern. Not the specific motor commands for that specific robot, but the concept of the action itself. When a robot needs to replicate the task, the model translates that concept into motor commands for its particular hardware.

This works because the internet already contains millions of hours of humans doing things. Recipe videos, how-to guides, manufacturing footage - all of it becomes training data. The model learns that "opening a door" involves approaching a handle, grasping it, applying rotational force, then pulling or pushing. The specific mechanics vary, but the action pattern is consistent.

The practical impact is substantial. A task that previously required 500 manual demonstrations can now be trained with 10-20 videos scraped from the internet. Training time drops from weeks to hours. More importantly, the model generalises better - it's seen hundreds of different people opening thousands of different doors, not just your specific training setup.

What This Means For Development Costs

The cost reduction is dramatic enough to change what's economically viable. Chan notes that data collection used to represent 60-70% of a robotics project's budget. With video action models, that drops to single digits. The expensive part becomes hardware and deployment, not training.

This shifts who can afford to build robots. Small manufacturers, logistics companies, even individual developers can now train systems for specific tasks without needing a robotics lab and a team of PhD students collecting data for months. The barrier to entry just collapsed.

There's a catch, of course. Video action models work well for tasks humans do regularly and film frequently. Opening doors, picking up objects, basic manipulation - all well-covered on YouTube. Highly specialised industrial tasks with no public video record still need traditional data collection. But that's a much smaller set of use cases than most people assume.

The Robotics Data Problem Is Solved

The broader implication is that robots can now learn complex behaviours faster than humans can demonstrate them. A warehouse robot learning to handle packages doesn't need you to demonstrate every possible box size and weight combination. It watches a few thousand delivery videos and extrapolates the rest.

This is the pattern we've seen in other AI domains - foundation models trained on broad datasets outperforming narrow models trained on hand-curated data. The difference is that in robotics, the cost savings are immediate and measurable. Every manual demonstration you don't need to perform saves hours of labour and equipment time.

Chan's work suggests that the expensive part of robotics is shifting from software to hardware. Training is becoming cheap and fast. Manufacturing, deployment, and maintenance remain expensive. That's a different economics entirely - and one that favours production scale over research depth.

For business owners watching these developments, the question is no longer whether robots can learn a task, but whether deploying them makes economic sense. The training cost just stopped being the blocker.

More Featured Insights

Builders & Makers
Why Developers Are Running AI On £5 Chips Instead Of Cloud APIs
Voices & Thought Leaders
Anthropic Grows 80x While Tech Giants Cut Staff - The Split Is Real

Video Sources

AI Engineer
How Transformers Finally Ate Vision - Isaac Robinson, Roboflow
AI Engineer
FLUX, Open Research, and the Future of Visual AI - Stephen Batifol, Black Forest Labs
Google for Developers
Inside YC x Google DeepMind Startups Day
AI Revolution
OpenAI Just Dropped The Biggest Voice AI Upgrade Yet
World of AI
Codex Super App, OpenAI Chaos Drama, Gemini 3.2 Pro In Arena, GPT-Realtime-2, & NotebookLM Update!
Dwarkesh Patel
David Reich - Bronze Age shock, the Neanderthal puzzle, & farming's sudden spread

Today's Sources

DEV.to AI
Microcontrollers vs cloud: why AI is moving to the edge
DEV.to AI
TinyML on microcontrollers: from prototype to production
Towards Data Science
The AI Agent Security Surface: What Gets Exposed When You Add Tools and Memory
The Robot Report
Why traditional robotics data collection is obsolete and what replaces it
The Robot Report
Nyobolt raises funding to bring fast charging to more robots
Hackaday Robotics
Could Your Next House be Built from Giant Lego By an Inchworm Robot?
ROS Discourse
ros2_lingua: A safe, dependency-aware grounding engine for LLMs
The Robot Report
Learn how to successfully design hospital logistics robots at the Robotics Summit
ROS Discourse
QERRA-v2 Classical Edition - Full SEMEV-12 Implementation & Live Public API
Latent Space
[AINews] Anthropic growing 10x/year while everyone else is laying off >10% of their workforce
Gary Marcus
Agents and ROI
Ben Thompson Stratechery
2026.19: Earning & Spending

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Richard Bland
About Sources Privacy Cookies Terms Thou Art That
MEM Digital Ltd t/a Marbl Codes
Co. 13753194 (England & Wales)
VAT: 400325657
3-4 Brittens Court, Clifton Reynes, Olney, MK46 5LG
© 2026 MEM Digital Ltd