The Cloud Operations AI Can't Run Alone - And Why Read-Only Wasn't the Answer

A team gave AI agents write access to their cloud infrastructure. Ninety days later, they revoked it. Not because the AI made obvious mistakes - because it couldn't see what humans see.

The problem wasn't technical capability. The agents handled local operations beautifully: spinning up instances, adjusting configurations, managing logs. What broke them were the invisible dependencies - cross-account relationships, disaster recovery requirements, concurrent deployments happening elsewhere. Context that existed in Slack threads, tribal knowledge, and the collective memory of the ops team.

What Went Wrong

Three operations caused the rethink. First: an agent resized a database instance during what looked like low usage. It had local metrics showing idle capacity. What it couldn't see was a scheduled data migration happening in another account that needed that headroom. The migration failed halfway through.

Second: an agent cleaned up what appeared to be orphaned snapshots. They were part of a disaster recovery strategy that lived in documentation the agent hadn't ingested. Recovery time objectives doubled overnight.

Third: concurrent deployments. Two agents, working independently, both decided to update the same shared infrastructure component. Neither knew the other was working. The collision took the service down for three hours.

The pattern was clear: AI agents excel at local optimisation but fail at global coordination. They see the tree, not the forest. And in cloud operations, the forest matters more.

Why Read-Only Didn't Work Either

The obvious fix seemed simple - revoke write access, make the agents read-only observers who suggest changes to humans. But read-only agents are barely useful. They can't format changes, can't validate configurations, can't build the exact Terraform diff or API call sequence. They become idea generators, not execution partners. The human still does 80% of the grunt work.

The team found a middle path: read-and-propose. The agent still does 95% of the work - researching context, building the change, formatting the exact commands, validating syntax. But instead of executing, it presents a fully-formed proposal. A human reviews the formatted change in under a minute and approves or rejects. If approved, the agent executes.

This isn't just read-only with extra steps. The agent is doing real work - just stopping one click before execution. It handles all the tedious parts: looking up resource IDs, checking current state, building valid API calls, running dry-runs. The human reviews the plan, not the process.

The Implementation

The architecture is simpler than it sounds. Agents use Model Context Protocol with read-write separation: full read access to cloud APIs, logs, and documentation. Write operations go through a proposal queue. A human operator reviews the queue, sees formatted changes with context, approves or rejects with one click.

What makes this work is the proposal format. Not "I think we should resize this instance" - that's useless. Instead: "Resize instance i-abc123 from t3.large to t3.xlarge. Current CPU: 85% sustained. Cost increase: £42/month. Dependencies checked: none. Recovery plan: revert command included. Approve?"

Everything needed for a decision, formatted as an executable change. The human isn't reconstructing the work - they're validating it.

What Changed

Operations that took 45 minutes now take 3 minutes of human time. The agent does the research, builds the change, formats the commands. The human brings the global context: "No, there's a migration tonight" or "Yes, but wait until after the deployment freeze". Decisions the agent can't make, but prep work it absolutely can.

Mistake rate dropped too. Not because agents stopped making errors - they still do - but because the review catches them. An agent suggested deleting unused EBS volumes last week. The proposal looked perfect: volumes unattached for 90 days, no snapshots, no dependencies. The human reviewing it remembered: those volumes are part of a cold standby system that only mounts during failover. The agent couldn't know that. The human could.

The team now runs 60-70 agent-proposed changes daily. Approval rate is around 85%. The 15% rejected aren't failures - they're the agent finding optimisation opportunities that don't fit current priorities, or missing context that isn't in any system. That's exactly what you want: automation doing the grunt work, humans providing judgment.

The Principle

This isn't about trusting or not trusting AI. It's about recognising what each side is good at. AI agents are exceptional at: gathering data, checking states, formatting changes, validating syntax, running pre-flight checks. They're terrible at: understanding implicit constraints, weighing competing priorities, knowing what's happening in the next room.

Read-and-propose puts each where they're strongest. The agent does 95% of the execution work. The human makes the 5% of decisions that require global context. Neither could do the other's job well, but together they're faster than either alone.

The real lesson from those 90 days: autonomy isn't binary. It's not "full control" or "no control". The sweet spot is somewhere in the middle - where the AI handles everything it can see, and the human handles everything it can't.