Intelligence is foundation
Subscribe
  • Luma
  • About
  • Sources
  • Ecosystem
  • Nura
  • Marbl Codes
00:00
Contact
[email protected]
Connect
  • YouTube
  • LinkedIn
  • GitHub
Legal
Privacy Cookies Terms
  1. Home›
  2. Featured›
  3. Web Development›
  4. Three Frameworks Pass the Human Approval Test. Nine Don't.
Web Development Monday, 25 May 2026

Three Frameworks Pass the Human Approval Test. Nine Don't.

Share: LinkedIn
Three Frameworks Pass the Human Approval Test. Nine Don't.

If your AI agent needs permission before it acts, most frameworks aren't ready for production. A developer just scored twelve of them.

The audit tested LangGraph, Pydantic AI, Mastra, and nine others across six criteria. Can the approval survive a crash? Can you retry safely? Does the framework know what data it's asking for? Can the agent do other work while waiting? The answers expose a pattern: most frameworks treat human approval as an afterthought.

Eight frameworks scored below 10 out of 30. Some of them - tools people are using to build real systems - reduce human approval to a blocking input() call. If the process dies, the approval vanishes. If the user doesn't respond immediately, the entire agent freezes. If two approvals overlap, the system can't handle it.

What Production-Ready Looks Like

Three frameworks scored above 15: LangGraph (18/30), Mastra (16/30), and Pydantic AI (15/30). They share a design philosophy: treat approval as an async operation with persistence.

LangGraph uses checkpoints. When the agent needs approval, it saves its state, pauses, and waits for a signal. The server can restart. The approval can take three days. When the human responds, the agent resumes from exactly where it stopped. No lost state, no brittle input loops.

Mastra separates approval requests from execution. The agent doesn't block - it hands the request to a channel and keeps running other tasks. When approval comes back, it picks up that thread. This is how you handle hundreds of concurrent approvals without grinding your system to a halt.

Pydantic AI enforces typed schemas. The agent specifies exactly what it's asking for. The human sees a structured approval request, not a vague string. The response comes back typed and validated. No parsing errors, no ambiguity about what "yes" means in context.

The Scoring Breakdown

The audit measured six things:

Durability - Can the approval survive a crash? Most frameworks: no. They rely on in-memory state. If the process dies mid-approval, the request is gone. LangGraph and Mastra persist approval state to disk or a database.

Idempotency - Can you safely retry? If the human clicks "approve" twice, does the agent execute twice? If the network drops and the approval message gets resent, does the system handle it? Most frameworks don't. LangGraph does.

Typed Input/Output - Does the framework know what data it's requesting? Can the human see a structured form instead of a text box? Pydantic AI and Mastra enforce schemas. The rest treat approvals as untyped strings.

Channel Abstraction - Can the approval request go somewhere other than a blocking terminal input? Can you send it to a web UI, a Slack channel, an email? Mastra and LangGraph decouple approval from execution. The others lock you into synchronous flows.

Non-Blocking - Can the agent do other work while waiting for approval? Most frameworks: no. The entire agent pauses. Mastra and LangGraph keep running.

Multi-User Support - Can the system handle approvals from multiple users at once? Can it route requests to the right person? Most frameworks assume one agent, one user, one approval at a time. That doesn't scale.

Why This Matters

The gap between demo and production is durability. Demos work when everything goes right. Production systems work when things fail. If your approval mechanism can't survive a crash, a network drop, or a user walking away from their screen, you're not building something reliable.

The frameworks that score well treat approval as infrastructure, not a feature. They assume processes will die. They assume approvals will take hours or days. They assume multiple agents and multiple users. The ones that score badly assume none of that - they assume a happy path where the human is sitting at a terminal, ready to respond immediately, and nothing ever crashes.

For developers, the practical advice is simple: test your approval flow under failure. Kill the process mid-approval. Send two approval requests at once. Restart the server. If your framework can't handle it, you're one crash away from a lost approval and a very confused stakeholder.

Read the full audit on Dev.to for code examples and detailed scoring from each framework.

More Featured Insights

Artificial Intelligence
Most AI Frameworks Treat Human Approval Like a Console Prompt
Quantum Computing
New Method Tests Quantum Computers Without Trusting Their Output

Today's Sources

Dev.to
How 12 AI Agent Frameworks Handle Human Approval (Most Badly)
arXiv cs.AI
BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems
arXiv cs.AI
NeuroNL2LTL: Neurosymbolic Framework for Natural Language to Formal Logic
arXiv cs.AI
RMA: An Agentic System for Research-Level Mathematical Problems
TechRadar
Can You Tell a Bot From a Human Online? 47% of People Cannot
TechCrunch
Everyone Is Navigating AI Security in Real Time - Even Google
arXiv – Quantum Physics
Sample-Efficient Benchmarking of Shallow All-to-All Random Quantum Circuits
arXiv – Quantum Physics
Unified Resonant-Manifold Framework for Dynamical Quantum Phase Transitions
arXiv – Quantum Physics
Quantum Fisher Information Under Decoherence With Explicit Wavefunctions
Dev.to
How 12 AI Agent Frameworks Handle Human Approval (Most Badly)
InfoQ
Google Introduces Middleware Architecture for Genkit Applications
The Pragmatic Engineer
Forward Deployed Engineering Heats Up Again
Dev.to
Insults & Cutlasses: Local LLM Sword Fighting on Melee Island
Dev.to
Production Lab: ECS Fargate + Prometheus + Grafana + Loki + Alloy + Node Exporter
arXiv cs.LG
Latent Cache Flow: Model-to-Model Communication Without Text

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Richard Bland
About Sources Privacy Cookies Terms Thou Art That
MEM Digital Ltd t/a Marbl Codes
Co. 13753194 (England & Wales)
VAT: 400325657
24-25 High Street, Wellingborough, NN8 4JZ
© 2026 MEM Digital Ltd