AI Agents That Debug Running Code - Not Just Read It

Most AI coding assistants work like brilliant proofreaders - they read your code, suggest improvements, spot patterns. But they're looking at a snapshot, not the living thing. They can't see what your program is actually doing when it runs.

A developer has built girb-mcp, a tool that changes this. It connects AI agents directly to running Ruby processes - not the source code, but the actual execution. The agent can inspect runtime state, set breakpoints, and watch what's happening inside a live program.

This is a different kind of debugging. Instead of analysing static code and guessing what might go wrong, the AI observes the real behaviour. It's the difference between reading a recipe and watching someone cook.

Why Runtime Access Matters

Static analysis has limits. An AI can spot a potential null reference error by reading your code, but it can't tell you why that variable is null in production. It can suggest a fix, but it's working from theory.

With runtime access, the AI sees the actual state of the program when things go wrong. It can inspect variables at the moment of failure, trace execution paths that were actually taken, and understand context that only exists when the code is running.

For Ruby developers, this means an AI agent could connect to a running Rails application, set a breakpoint in a problematic controller action, and examine the exact state of objects when a bug occurs. Not a simulation - the real thing.

The Model Context Protocol Connection

This tool uses Model Context Protocol (MCP) - Anthropic's standard for giving AI agents controlled access to external systems. MCP defines how an AI can safely request actions and retrieve information from tools it doesn't control.

In this case, the MCP server acts as a bridge between the AI agent and Ruby's interactive debugger. The agent sends requests - "show me the value of this variable", "set a breakpoint here", "step through this method" - and the server translates those into debugger commands.

This matters because it's supervised. The AI can't execute arbitrary code or access anything outside the debugging session. It's constrained to inspection and observation - powerful, but controlled.

Autonomous Debugging, Not Just Assistance

The real shift here is autonomy. Current AI coding tools are assistants - they make suggestions, you evaluate them. This approach moves towards autonomous debugging - the AI investigates the problem itself, following its own reasoning through the live system.

An agent could theoretically: connect to a failing process, identify which method is throwing errors, inspect the state at that point, trace back through the call stack to find where bad data entered the system, and propose a fix based on actual runtime evidence.

For developers, this could mean pointing an AI at a production issue and saying "work out what's wrong" - then returning to a report that shows exactly what it found and why.

The Practical Question

This is experimental. Ruby's debugging tools are mature, but connecting them to autonomous agents is new territory. The obvious questions: How reliable is the agent's reasoning when observing live state? How do you prevent it from disrupting the running process? What happens when it misinterprets what it sees?

There's also the trust question. Developers are used to being in the debugger, watching their own reasoning unfold. Handing that control to an AI - even a supervised one - requires confidence that it's drawing the right conclusions from what it observes.

But the potential is clear. Debugging is pattern recognition and hypothesis testing - exactly what modern AI agents are increasingly good at. Giving them access to runtime state, not just source code, could make them significantly more effective.

For Ruby developers working on complex systems, this could be the difference between spending hours tracing an elusive bug and having an agent narrow it down to the exact line where things went wrong. Not magic - just observation at a scale and speed humans can't match.

The code is experimental, but the principle is sound: AI agents that debug by watching your program run are fundamentally more capable than those that only read your code.