Three Frameworks Pass the Human Approval Test. Nine Don't.

If your AI agent needs permission before it acts, most frameworks aren't ready for production. A developer just scored twelve of them.

The audit tested LangGraph, Pydantic AI, Mastra, and nine others across six criteria. Can the approval survive a crash? Can you retry safely? Does the framework know what data it's asking for? Can the agent do other work while waiting? The answers expose a pattern: most frameworks treat human approval as an afterthought.

Eight frameworks scored below 10 out of 30. Some of them - tools people are using to build real systems - reduce human approval to a blocking input() call. If the process dies, the approval vanishes. If the user doesn't respond immediately, the entire agent freezes. If two approvals overlap, the system can't handle it.

What Production-Ready Looks Like

Three frameworks scored above 15: LangGraph (18/30), Mastra (16/30), and Pydantic AI (15/30). They share a design philosophy: treat approval as an async operation with persistence.

LangGraph uses checkpoints. When the agent needs approval, it saves its state, pauses, and waits for a signal. The server can restart. The approval can take three days. When the human responds, the agent resumes from exactly where it stopped. No lost state, no brittle input loops.

Mastra separates approval requests from execution. The agent doesn't block - it hands the request to a channel and keeps running other tasks. When approval comes back, it picks up that thread. This is how you handle hundreds of concurrent approvals without grinding your system to a halt.

Pydantic AI enforces typed schemas. The agent specifies exactly what it's asking for. The human sees a structured approval request, not a vague string. The response comes back typed and validated. No parsing errors, no ambiguity about what "yes" means in context.

The Scoring Breakdown

The audit measured six things:

Durability - Can the approval survive a crash? Most frameworks: no. They rely on in-memory state. If the process dies mid-approval, the request is gone. LangGraph and Mastra persist approval state to disk or a database.

Idempotency - Can you safely retry? If the human clicks "approve" twice, does the agent execute twice? If the network drops and the approval message gets resent, does the system handle it? Most frameworks don't. LangGraph does.

Typed Input/Output - Does the framework know what data it's requesting? Can the human see a structured form instead of a text box? Pydantic AI and Mastra enforce schemas. The rest treat approvals as untyped strings.

Channel Abstraction - Can the approval request go somewhere other than a blocking terminal input? Can you send it to a web UI, a Slack channel, an email? Mastra and LangGraph decouple approval from execution. The others lock you into synchronous flows.

Non-Blocking - Can the agent do other work while waiting for approval? Most frameworks: no. The entire agent pauses. Mastra and LangGraph keep running.

Multi-User Support - Can the system handle approvals from multiple users at once? Can it route requests to the right person? Most frameworks assume one agent, one user, one approval at a time. That doesn't scale.

Why This Matters

The gap between demo and production is durability. Demos work when everything goes right. Production systems work when things fail. If your approval mechanism can't survive a crash, a network drop, or a user walking away from their screen, you're not building something reliable.

The frameworks that score well treat approval as infrastructure, not a feature. They assume processes will die. They assume approvals will take hours or days. They assume multiple agents and multiple users. The ones that score badly assume none of that - they assume a happy path where the human is sitting at a terminal, ready to respond immediately, and nothing ever crashes.

For developers, the practical advice is simple: test your approval flow under failure. Kill the process mid-approval. Send two approval requests at once. Restart the server. If your framework can't handle it, you're one crash away from a lost approval and a very confused stakeholder.

Read the full audit on Dev.to for code examples and detailed scoring from each framework.