Intelligence is foundation
Subscribe
  • Luma
  • About
  • Sources
  • Ecosystem
  • Nura
  • Marbl Codes
00:00
Contact
[email protected]
Connect
  • YouTube
  • LinkedIn
  • GitHub
Legal
Privacy Cookies Terms
  1. Home›
  2. Featured›
  3. Artificial Intelligence›
  4. Most AI Frameworks Treat Human Approval Like a Console Prompt
Artificial Intelligence Monday, 25 May 2026

Most AI Frameworks Treat Human Approval Like a Console Prompt

Share: LinkedIn
Most AI Frameworks Treat Human Approval Like a Console Prompt

A developer just audited twelve AI agent frameworks to see how they handle human approval. Only three of them got it right.

The audit covered LangGraph, Pydantic AI, Mastra, and nine others - frameworks people actually use to build AI agents in production. The researcher scored each one across six criteria: durability (what happens if the system crashes mid-approval?), idempotency (can you safely retry?), typed input/output, channel abstraction, and whether the framework forces you to block the entire agent while waiting for a human.

Most of them failed spectacularly. Eight frameworks scored below 10 out of 30. The worst offenders reduce human approval to a literal input() call - the Python equivalent of stopping your entire application to wait for someone to type something into a terminal. If the process dies, the approval request vanishes. If the user refreshes the page, nothing happens. If two requests come in at once, the second one overwrites the first.

This isn't theoretical. If you're building an AI agent that needs approval before spending money, modifying data, or taking any action with consequences, you need durability. The agent should be able to pause, store the approval request somewhere persistent, and resume when the human responds - even if that takes three days. Most of these frameworks can't do that without you rebuilding the entire approval system yourself.

The Three That Actually Work

Three frameworks scored above 15: LangGraph (18/30), Mastra (16/30), and Pydantic AI (15/30). What they have in common: they treat human approval as a first-class async operation, not a blocking input call.

LangGraph uses persistent checkpoints. If your agent needs approval, it saves its state, pauses, and waits for a signal. The process can die. The server can restart. When the human clicks "approve", the agent picks up exactly where it left off. That's what production-ready looks like.

Mastra separates the approval request from the execution flow. The agent doesn't block - it hands the request off to a channel and keeps running other tasks. When approval comes back, it resumes. This is how you build systems that handle hundreds of approval requests without grinding to a halt.

Pydantic AI uses typed input and output schemas. The agent knows exactly what it's asking for, and the human knows exactly what they're approving. No ambiguous strings, no parsing errors, no "did they mean yes or Yes or y?"

Why This Matters Beyond Agents

The broader point here isn't just about AI frameworks. It's about how we're building tools for systems that need human oversight. If the default approach is a blocking input call, we're designing for demos, not production.

Real systems fail. Networks drop. Browsers crash. Users walk away from their screens. If your approval mechanism can't survive any of that, you're not building something reliable - you're building something that works until it doesn't, and then loses the approval request entirely.

The audit also exposes a gap in how these frameworks think about concurrency. Most of them assume one agent, one approval, one human, all in a single synchronous flow. But production systems run multiple agents. They handle multiple users. They need queues, retries, and state persistence. The frameworks that score well are the ones that thought about this from the start.

For developers building on these frameworks, the lesson is clear: test your approval flow under failure conditions before you ship. Kill the process mid-approval. Restart the server. Send two approval requests at once. If your system can't handle that, you're one crash away from a lost request and a very confused user.

Read the full audit on Dev.to for the detailed scoring breakdown and code examples from each framework.

More Featured Insights

Quantum Computing
New Method Tests Quantum Computers Without Trusting Their Output
Web Development
Three Frameworks Pass the Human Approval Test. Nine Don't.

Today's Sources

Dev.to
How 12 AI Agent Frameworks Handle Human Approval (Most Badly)
arXiv cs.AI
BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems
arXiv cs.AI
NeuroNL2LTL: Neurosymbolic Framework for Natural Language to Formal Logic
arXiv cs.AI
RMA: An Agentic System for Research-Level Mathematical Problems
TechRadar
Can You Tell a Bot From a Human Online? 47% of People Cannot
TechCrunch
Everyone Is Navigating AI Security in Real Time - Even Google
arXiv – Quantum Physics
Sample-Efficient Benchmarking of Shallow All-to-All Random Quantum Circuits
arXiv – Quantum Physics
Unified Resonant-Manifold Framework for Dynamical Quantum Phase Transitions
arXiv – Quantum Physics
Quantum Fisher Information Under Decoherence With Explicit Wavefunctions
Dev.to
How 12 AI Agent Frameworks Handle Human Approval (Most Badly)
InfoQ
Google Introduces Middleware Architecture for Genkit Applications
The Pragmatic Engineer
Forward Deployed Engineering Heats Up Again
Dev.to
Insults & Cutlasses: Local LLM Sword Fighting on Melee Island
Dev.to
Production Lab: ECS Fargate + Prometheus + Grafana + Loki + Alloy + Node Exporter
arXiv cs.LG
Latent Cache Flow: Model-to-Model Communication Without Text

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Richard Bland
About Sources Privacy Cookies Terms Thou Art That
MEM Digital Ltd t/a Marbl Codes
Co. 13753194 (England & Wales)
VAT: 400325657
24-25 High Street, Wellingborough, NN8 4JZ
© 2026 MEM Digital Ltd