Intelligence is foundation
Subscribe
  • Luma
  • About
  • Sources
  • Ecosystem
  • Nura
  • Marbl Codes
00:00
Contact
[email protected]
Connect
  • YouTube
  • LinkedIn
  • GitHub
Legal
Privacy Cookies Terms
  1. Home›
  2. Featured›
  3. Builders & Makers›
  4. Testing Google's Agent API for $0.37 - Every Bug You'll Hit
Builders & Makers Friday, 22 May 2026

Testing Google's Agent API for $0.37 - Every Bug You'll Hit

Share: LinkedIn
Testing Google's Agent API for $0.37 - Every Bug You'll Hit

Stephen Sebastian ran Google's Antigravity managed agents against 14 services and spent $0.37 doing it. Then he wrote up every bug, every cost surprise, and every production readiness gap he found.

This is what good builder content looks like. Not "here's why agents are significant". More like "here's what broke when I tried to use them for dependency audits".

What Antigravity Actually Does

Antigravity is Google's managed runtime for AI agents. You define a task - in this case, auditing dependencies across multiple services - and the agent handles the execution. It's meant to abstract away the infrastructure complexity of running agents at scale.

The promise: you focus on what the agent should do, Google handles how it runs. The reality, as Sebastian found, is messier than the marketing.

He tested it on 14 different services, checking for outdated dependencies, security vulnerabilities, and configuration drift. Each run cost between $0.02 and $0.04 in tokens, depending on the size of the codebase and the depth of the audit.

Total spend: $0.37. That's the interesting bit. We're at the point where you can test an AI agent system across a realistic workload for less than a coffee. The barrier to experimentation is essentially zero.

The Token Economics Breakdown

Each agent run cost $0.044 on average. That's not compute cost - that's token cost. The agent is making multiple LLM calls per run: parsing the dependency file, cross-referencing versions, checking for known vulnerabilities, generating a report.

Sebastian breaks down the token usage per run. Most of the cost is in the analysis phase, not the output. The agent reads more than it writes. For a typical Node.js service with 50-70 dependencies, the agent used around 8,000 input tokens and 1,200 output tokens per audit.

At current pricing, that works out to about $0.04 per service. For a company with 100 microservices, running a weekly dependency audit would cost $4. Monthly cost: $16. That's cheaper than paying someone to do it manually, and the agent runs consistently every time.

But here's the catch Sebastian found: the cost scales non-linearly with complexity. Small services with well-structured dependency files cost $0.02. Large monorepos with nested dependencies and multiple package managers cost $0.08. The difference is in how many LLM calls the agent needs to make to understand the structure.

The Bugs He Hit

Sebastian's writeup is valuable because he documents the failures, not just the successes. Antigravity isn't production-ready for every use case. Here's what broke:

Rate limiting: Running 14 audits in parallel hit Google's API rate limits immediately. The agent doesn't handle backoff gracefully - it just fails and returns an error. You need to implement your own retry logic.

Timeout handling: Large codebases take longer to analyze than the default timeout allows. The agent gets cut off mid-analysis and returns incomplete results. There's no way to extend the timeout or resume from where it stopped.

Dependency resolution: The agent struggled with monorepos that use multiple package managers. It could handle npm or Yarn, but not both in the same run. It also missed workspace dependencies in Yarn 2+ configurations.

Output formatting: The agent returns results as unstructured text, not JSON. If you want to pipe the results into another system, you need to parse the text output yourself. That's fine for human review, less useful for automation.

Version comparison logic: The agent occasionally flagged dependencies as outdated when they weren't. It compared semantic versions incorrectly - treating 1.10.0 as older than 1.9.0 because it compared digit-by-digit as strings, not as version numbers.

The Production Readiness Checklist

Sebastian's conclusion: Antigravity works for one-off audits and manual reviews. It's not ready for automated, production-scale workflows. His checklist for what's missing:

Structured output: Agents need to return JSON, not prose. If the output is meant for another system, unstructured text doesn't cut it.

Resumability: Long-running tasks need checkpoints. If an agent times out, it should resume from where it stopped, not restart from scratch.

Error handling: Rate limits, timeouts, and API failures need graceful degradation. Right now, the agent just fails. It should retry with backoff or return partial results.

Cost visibility: You don't know the token cost until after the run completes. For production workflows, you need cost estimation upfront so you can set budgets and alert on overruns.

Observability: There's no way to see what the agent is doing while it runs. You get the final result, but not the intermediate steps. For debugging, that's a problem.

Why Managed Runtimes Change Workflows

The bigger point Sebastian makes is about managed runtimes in general. When you give an agent to Google to run, you lose control over execution. You can't inspect the runtime, debug the process, or optimize the infrastructure. You trade control for convenience.

For some use cases, that trade-off works. Running a dependency audit once a week? Fine. The cost is low, the task is simple, and you don't need custom infrastructure. But for complex, business-critical workflows, managed runtimes introduce risks. You're dependent on the provider's reliability, their pricing, and their feature roadmap.

Sebastian's advice: use managed agents for low-stakes automation and one-off tasks. For anything production-critical, run your own infrastructure. The cost might be higher upfront, but you control the failure modes.

What This Tells Us About Agent Maturity

The fact that you can test an agent system for $0.37 is remarkable. The fact that it still has these bugs is expected. We're early. The infrastructure is improving, but it's not polished yet.

What's useful about Sebastian's writeup is the specificity. He doesn't say "agents aren't ready" - he says "here's exactly what breaks and under what conditions". That's the kind of feedback that moves the ecosystem forward.

If you're building with agents, read his full breakdown. It's the best $0.37 someone else has spent for you this week.

More Featured Insights

Robotics & Automation
Boston Dynamics' Atlas Now Lifts Fridges - 25,000 Units Coming
Voices & Thought Leaders
Why Agents Need Real Computers, Not Lambda Functions

Video Sources

Google for Developers
Physical AI: The New Era of Robotics
Google for Developers
A New Era of Discovery: AI and the Frontiers of Science with Demis Hassabis
Google for Developers
Directing the Future: Craft and Creativity in the Age of AI
AI Revolution
Boston Dynamics' Upgraded Atlas Lifts Real Loads
AI Revolution
Boston Dynamics' New Upgraded ATLAS Just Went BEAST MODE
Andrej Karpathy
I Let AI Cold-Call 100 Plumbers (Genspark)
Two Minute Papers
DeepSeek's New AI Is A significant development
World of AI
Qwen 3.7 Max: Powerful AI Model Beats Opus 4.6, Gemini 3.1, DeepSeek v4

Today's Sources

DEV.to AI
I Spent $0.37 Testing Google's Agent API on 14 Services
Towards Data Science
Lost in Translation: How AI Exposes the Rift Between Law and Logic
ML Mastery
Building Context-Aware Search in Python with LLM Embeddings + Metadata
The Robot Report
Brain Corp Partners with UC San Diego on Semantic Mapping for Robot Autonomy
ROS Discourse
MuJoCo + ROS2 Robotic Arm Workflow for Embodied AI
Hackaday Robotics
DIY Autonomous Submarine Navigates Using Colour Detection
ROS Discourse
Steam Deck as Teleoperation Controller for Robotics
Latent Space
Giving Agents Computers - Ivan Burazin, Daytona
Latent Space
[AINews] New AI Infra Unicorns: Exa, Modal, TurboPuffer

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Richard Bland
About Sources Privacy Cookies Terms Thou Art That
MEM Digital Ltd t/a Marbl Codes
Co. 13753194 (England & Wales)
VAT: 400325657
24-25 High Street, Wellingborough, NN8 4JZ
© 2026 MEM Digital Ltd