Intelligence is foundation
Subscribe
  • Luma
  • About
  • Sources
  • Ecosystem
  • Nura
  • Marbl Codes
00:00
Contact
[email protected]
Connect
  • YouTube
  • LinkedIn
  • GitHub
Legal
Privacy Cookies Terms
  1. Home›
  2. Featured›
  3. Artificial Intelligence›
  4. The Cloud Operations AI Can't Run Alone - And Why Read-Only Wasn't the Answer
Artificial Intelligence Friday, 15 May 2026

The Cloud Operations AI Can't Run Alone - And Why Read-Only Wasn't the Answer

Share: LinkedIn
The Cloud Operations AI Can't Run Alone - And Why Read-Only Wasn't the Answer

A team gave AI agents write access to their cloud infrastructure. Ninety days later, they revoked it. Not because the AI made obvious mistakes - because it couldn't see what humans see.

The problem wasn't technical capability. The agents handled local operations beautifully: spinning up instances, adjusting configurations, managing logs. What broke them were the invisible dependencies - cross-account relationships, disaster recovery requirements, concurrent deployments happening elsewhere. Context that existed in Slack threads, tribal knowledge, and the collective memory of the ops team.

What Went Wrong

Three operations caused the rethink. First: an agent resized a database instance during what looked like low usage. It had local metrics showing idle capacity. What it couldn't see was a scheduled data migration happening in another account that needed that headroom. The migration failed halfway through.

Second: an agent cleaned up what appeared to be orphaned snapshots. They were part of a disaster recovery strategy that lived in documentation the agent hadn't ingested. Recovery time objectives doubled overnight.

Third: concurrent deployments. Two agents, working independently, both decided to update the same shared infrastructure component. Neither knew the other was working. The collision took the service down for three hours.

The pattern was clear: AI agents excel at local optimisation but fail at global coordination. They see the tree, not the forest. And in cloud operations, the forest matters more.

Why Read-Only Didn't Work Either

The obvious fix seemed simple - revoke write access, make the agents read-only observers who suggest changes to humans. But read-only agents are barely useful. They can't format changes, can't validate configurations, can't build the exact Terraform diff or API call sequence. They become idea generators, not execution partners. The human still does 80% of the grunt work.

The team found a middle path: read-and-propose. The agent still does 95% of the work - researching context, building the change, formatting the exact commands, validating syntax. But instead of executing, it presents a fully-formed proposal. A human reviews the formatted change in under a minute and approves or rejects. If approved, the agent executes.

This isn't just read-only with extra steps. The agent is doing real work - just stopping one click before execution. It handles all the tedious parts: looking up resource IDs, checking current state, building valid API calls, running dry-runs. The human reviews the plan, not the process.

The Implementation

The architecture is simpler than it sounds. Agents use Model Context Protocol with read-write separation: full read access to cloud APIs, logs, and documentation. Write operations go through a proposal queue. A human operator reviews the queue, sees formatted changes with context, approves or rejects with one click.

What makes this work is the proposal format. Not "I think we should resize this instance" - that's useless. Instead: "Resize instance i-abc123 from t3.large to t3.xlarge. Current CPU: 85% sustained. Cost increase: £42/month. Dependencies checked: none. Recovery plan: revert command included. Approve?"

Everything needed for a decision, formatted as an executable change. The human isn't reconstructing the work - they're validating it.

What Changed

Operations that took 45 minutes now take 3 minutes of human time. The agent does the research, builds the change, formats the commands. The human brings the global context: "No, there's a migration tonight" or "Yes, but wait until after the deployment freeze". Decisions the agent can't make, but prep work it absolutely can.

Mistake rate dropped too. Not because agents stopped making errors - they still do - but because the review catches them. An agent suggested deleting unused EBS volumes last week. The proposal looked perfect: volumes unattached for 90 days, no snapshots, no dependencies. The human reviewing it remembered: those volumes are part of a cold standby system that only mounts during failover. The agent couldn't know that. The human could.

The team now runs 60-70 agent-proposed changes daily. Approval rate is around 85%. The 15% rejected aren't failures - they're the agent finding optimisation opportunities that don't fit current priorities, or missing context that isn't in any system. That's exactly what you want: automation doing the grunt work, humans providing judgment.

The Principle

This isn't about trusting or not trusting AI. It's about recognising what each side is good at. AI agents are exceptional at: gathering data, checking states, formatting changes, validating syntax, running pre-flight checks. They're terrible at: understanding implicit constraints, weighing competing priorities, knowing what's happening in the next room.

Read-and-propose puts each where they're strongest. The agent does 95% of the execution work. The human makes the 5% of decisions that require global context. Neither could do the other's job well, but together they're faster than either alone.

The real lesson from those 90 days: autonomy isn't binary. It's not "full control" or "no control". The sweet spot is somewhere in the middle - where the AI handles everything it can see, and the human handles everything it can't.

More Featured Insights

Quantum Computing
Quantum Neural Networks Just Modelled Fluid Flow With 90% Fewer Parameters
Web Development
How a 4-Field Lookup Table Replaced 200 Cloud Cost Tags

Today's Sources

Dev.to
Read-Write MCP: Three Cloud Operations We Stopped Letting AI Touch After 90 Days
Stack Overflow Blog
Observability and human intuition in an AI world
SAP News Center
The Path to the Autonomous Enterprise: SAP Announces New Sustainability AI Agents
Meta Newsroom
Introducing Business AI on WhatsApp for Small Businesses in India
SAP News Center
Certification in the AI Era: From Knowledge to Capability
arXiv cs.AI
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
arXiv – Quantum Physics
A QPINN Framework with Quantum Trainable Embeddings for the Lid-Driven Cavity Problem
arXiv – Quantum Physics
Universal Spin Squeezing Dynamical Phase Transitions across Lattice Geometries, Dimensions, and Microscopic Couplings
arXiv – Quantum Physics
All-Electric Quantum State Transfer via Spin-Orbit Phase Matching
Dev.to
Chargeback Without Spreadsheets: The 4-Field Schema That Replaced Our 200-Tag Mess
Hacker News
How Claude Code works in large codebases
Dev.to
A Practical AI Voice Workflow for Creator Tools and Product Demos
GitHub Blog
GitHub availability report: April 2026
Hacker News
UK Government Kicks Out Palantir

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Richard Bland
About Sources Privacy Cookies Terms Thou Art That
MEM Digital Ltd t/a Marbl Codes
Co. 13753194 (England & Wales)
VAT: 400325657
3-4 Brittens Court, Clifton Reynes, Olney, MK46 5LG
© 2026 MEM Digital Ltd