The $0 Agent: How Tiered Model Routing Kills AI Bills

Here's a question that should make every startup builder pause: what if most of your AI agent's tasks don't actually need an AI?

A recent breakdown from freeCodeCamp walks through building an SEO analysis agent that went from $0.006 per URL to $0 per URL by asking one simple question at every step: does this task actually require a language model?

The answer, most of the time, was no.

The Three-Tier Strategy

The approach is straightforward: route each task to the cheapest thing that can solve it. First, try pure Python - regular expressions, string parsing, basic logic. If that doesn't work, escalate to Claude Haiku, the fast and cheap model. Only when both fail do you reach for Claude Sonnet, the expensive one.

For the SEO agent, the breakdown looked like this: extracting metadata from HTML? Python's BeautifulSoup handles it. Checking keyword density? String operations and counters. Analysing semantic relevance between title and content? Now you need Haiku. Generating strategic recommendations based on competitive analysis? That's Sonnet territory.

The result wasn't just cost savings. The agent ran faster because most tasks bypassed the API entirely. Error handling got simpler because there were fewer external calls to manage. And the whole system became easier to debug - when something breaks in a regex, you know exactly where to look.

What This Means for Builders

The insight here isn't really about SEO agents. It's about architecture. Most AI agents in production right now are over-engineered - every task gets routed to the same expensive model because that's the path of least resistance when you're prototyping.

But production is different. In production, you have real usage patterns. You know which tasks repeat ten thousand times a day. You know which outputs are predictable enough that a smaller model handles them fine. You know where the edge cases hide.

That knowledge is leverage. Use it.

The three-tier pattern creates natural checkpoints. Before you send anything to an LLM, ask: could Python do this? If you're using Sonnet, ask: would Haiku work? These aren't rhetorical questions. Test them. Most developers find they can downgrade 60-70% of their LLM calls without any loss in output quality.

The Real Savings

Going from $0.006 to $0 per URL sounds small until you multiply it. Process a million URLs and you've saved $6,000. Process ten million and you're talking about meaningful infrastructure costs - the difference between sustainable and unsustainable unit economics for a bootstrapped product.

But the bigger win is speed. API calls have latency - even fast models take 200-500ms per request. Python runs in single-digit milliseconds. When you're chaining together five or six tasks to complete one user request, cutting out three LLM calls can halve your response time.

For business owners watching API bills climb every month, this is the pattern to steal. You don't need a new model. You don't need more compute. You need better routing logic.

Where This Falls Apart

Tiered routing isn't free. You're trading dollars for complexity. Now you have three systems to maintain instead of one. Now you need fallback logic when Python fails, escalation logic when Haiku isn't good enough, and monitoring to catch when Sonnet gets called more often than expected.

That's real overhead. For a prototype or a weekend project, it's not worth it. Use Sonnet for everything and move fast.

But once you're processing thousands of requests a day, the maths changes. The complexity pays for itself. The trick is knowing when to make the switch - and the answer is usually earlier than you think.

The freeCodeCamp guide includes working Python code, tier-by-tier implementation steps, and specific examples of what belongs where. If you're running an agent in production and the API bills are starting to sting, it's worth an afternoon to see how much you can route away from the expensive models.

Because the cheapest LLM call is the one you never make.