Intelligence is foundation
Subscribe
  • Luma
  • About
  • Sources
  • Ecosystem
  • Nura
  • Marbl Codes
00:00
Contact
[email protected]
Connect
  • YouTube
  • LinkedIn
  • GitHub
Legal
Privacy Cookies Terms
  1. Home›
  2. Featured›
  3. Voices & Thought Leaders›
  4. TML-Interaction-Small: Audio, Video, Text at 200ms Intervals
Voices & Thought Leaders Tuesday, 12 May 2026

TML-Interaction-Small: Audio, Video, Text at 200ms Intervals

Share: LinkedIn
TML-Interaction-Small: Audio, Video, Text at 200ms Intervals

Thinking Machines just released TML-Interaction-Small, a 276-billion parameter mixture-of-experts model that processes audio, video, and text simultaneously every 200 milliseconds. It does not wait for you to finish speaking. It does not separate inputs into discrete turns. It runs continuously, processing everything at once, and responds when it has something to say.

This is different from existing multimodal models. GPT-4o Realtime and Gemini 3.1 process audio and video, but they operate in turns - you speak, they respond, boundaries are clear. TML-Interaction-Small removes those boundaries. It watches, listens, and thinks in parallel, updating its understanding continuously.

What Continuous Processing Enables

The 200ms interval means the model samples the world five times per second. Fast enough to catch interruptions, facial expressions, and gesture changes in real-time. Fast enough to respond mid-sentence if context shifts.

This unlocks three capabilities existing models struggle with: visual proactivity, continuous awareness, and background tool use.

Visual proactivity means the model can notice something in the video feed and comment without being prompted. If you are assembling furniture and reach for the wrong screw, the model can interject. If someone walks into frame during a video call, the model knows before you say anything.

Continuous awareness means the model maintains context across overlapping inputs. It does not reset between turns. If you start a sentence, get interrupted, then finish it thirty seconds later, the model remembers the first half. If you gesture at something while speaking, the model connects the gesture to the words without explicit linking.

Background tool use means the model can trigger actions without waiting for conversation to pause. According to Thinking Machines, the model can search for information, generate code, or query databases while still processing audio and video. The tool execution happens in parallel with interaction, not sequentially.

Benchmark Performance

TML-Interaction-Small outperforms GPT-4o Realtime and Gemini 3.1 on interaction benchmarks. The specific benchmarks measure interruption handling, multi-input synthesis, and response latency under continuous input conditions. These are not standard language model benchmarks - they test how well models handle messy, overlapping real-world interaction.

The model's mixture-of-experts architecture is key here. Different expert networks handle audio processing, video analysis, and language generation. Because they operate in parallel, the model can process all three input streams without bottlenecking on any single modality. The experts share learned representations but specialise in their domain.

Latency matters more in continuous interaction than in turn-based systems. A 200ms response delay is imperceptible in conversation. A 2-second delay breaks flow. TML-Interaction-Small is optimised for the former - fast enough to feel immediate, slow enough to process complex inputs properly.

What This Means for Builders

If you are building voice interfaces, video assistants, or collaborative tools, continuous processing changes the design space. You no longer need to manage turn-taking logic or explicit input boundaries. The model handles interruptions, overlapping speech, and multi-person conversations natively.

For customer service applications, this means agents can monitor calls in real-time and surface information proactively. For accessibility tools, this means interfaces that respond to gesture, speech, and context simultaneously. For collaborative software, this means assistants that watch your screen, listen to your explanations, and offer suggestions without being asked.

The challenge is interaction design. When the model can interject at any moment, how do you prevent it from being intrusive? When it processes everything continuously, how do you signal when it should stay quiet? These are not technical problems - they are human factors problems. The model is fast enough to interrupt naturally. Whether it should is a different question.

The Bigger Shift

We have spent years teaching models to wait their turn. Polite, structured, turn-based interaction. TML-Interaction-Small does the opposite - it processes everything, all the time, and jumps in when it has something useful to contribute.

That shift from reactive to proactive is significant. It moves AI interfaces from tools you invoke to collaborators that observe and assist. The model does not wait to be asked. It watches what you are doing, understands context, and offers help when relevant.

Whether people want that is unclear. Some tasks benefit from proactive assistance - technical support, tutoring, real-time collaboration. Others require tools that stay silent until summoned. The interaction model that works for air traffic control does not work for creative writing.

For developers, TML-Interaction-Small is worth testing in scenarios where continuous awareness adds value. Where interruption is not rude but helpful. Where processing multiple input streams simultaneously solves a real problem. The model is fast, capable, and built for exactly that use case. Whether your application needs it is the question to answer first.

More Featured Insights

Builders & Makers
Bun Rewrites Its Runtime in Rust in Seven Days
Robotics & Automation
RLDX-1: The 276-Billion Parameter Model Built for Robot Hands

Video Sources

Theo (t3.gg)
Bun Rewritten in Rust: A Week-Long Rewrite That Ships
AI Engineer
Embedding OpenClaw Coding Agent in B2B Products
AI Engineer
Viktor: AI Coworker Living in Slack
AI Revolution
Claude Mythos Reaches 16-Hour Autonomous Task Horizon
World of AI
Claude Code Agent View with /goal and Session Management
OpenAI
Endava on Codex: Before and After
Dwarkesh Patel
Natural Selection Is Making Us Stay in School Longer - David Reich

Today's Sources

Hacker News Best
TanStack NPM Supply-Chain Compromise Postmortem
DEV.to AI
Evaluating MERN Stack Development Companies: Seven Pillars in 2026
Towards Data Science
Building a Claude Code-Powered Knowledge Base
The Robot Report
RLWRLD Releases RLDX-1: Dexterity-First Foundation Model for Robot Hands
Robohub
Kinematic Intelligence Enables Skill Transfer Across Different Robot Bodies
Hackaday Robotics
Humanoid Robot as Haptic Feedback for VR Driving Simulator
ROS Discourse
Building IK Solvers for 7-DoF Robot Arms in ROS2
ROS Discourse
Visual Identification of Target Ports in GT-Denied Robot Policy
ROS Discourse
Docker Compose Configuration for AIC Engine Local Evaluation
Latent Space
Thinking Machines' Native Interaction Models Advance Real-Time Multimodal AI
Ben Thompson Stratechery
Ben Thompson on Anthropic-xAI Deal and Musk's Strategic Options

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Richard Bland
About Sources Privacy Cookies Terms Thou Art That
MEM Digital Ltd t/a Marbl Codes
Co. 13753194 (England & Wales)
VAT: 400325657
3-4 Brittens Court, Clifton Reynes, Olney, MK46 5LG
© 2026 MEM Digital Ltd