Intelligence is foundation
Podcast Subscribe
Builders & Makers Tuesday, 14 April 2026

Your Claude Agent Burns Tokens Because You Asked It to Write Essays

Share: LinkedIn
Your Claude Agent Burns Tokens Because You Asked It to Write Essays

A Claude agent hitting API limits in the first hour isn't a usage spike. It's a design flaw. Most token burn comes from verbose output - agents writing full sentences when a two-word status would do. The fix isn't better prompting. It's treating output as a line item with a budget.

The analysis from DEV.to shows that switching from prose to structured JSON, setting explicit context budgets per agent, and removing preambles cuts token consumption by 60-75%. Same functionality. Quarter of the cost.

Why Agents Write Too Much

Language models are trained to write naturally. That means full sentences, explanatory context, polite preambles. When you prompt an agent to "analyse this data", it doesn't just return results - it explains what it's doing, why it matters, what it found interesting. That's great for a chatbot. It's wasteful for an agent.

The problem compounds when agents chain together. Agent A produces verbose output. Agent B reads all of it as input, adds its own verbose response. Agent C does the same. Each step in the chain multiplies token usage. A five-step workflow that could run on 2,000 tokens ends up consuming 15,000 because nobody told the agents to be concise.

And here's the thing: the extra verbosity doesn't improve quality. An agent returning {"status": "complete", "items_processed": 47} is just as useful as one returning "I have successfully completed the processing task. In total, 47 items were processed without errors. The operation concluded normally." The second version burns 25 tokens for information the first version conveys in 8.

The JSON Fix

Structured output solves this immediately. Instead of asking agents to "explain" or "describe", ask for JSON with specific fields. Define the schema upfront. Lock down what information you actually need and strip everything else.

Before: "Analyse the customer data and summarise the findings."
After: "Return JSON: {customer_count: int, high_value_count: int, issues: [string]}"

The model still does the same analysis. It just skips the essay. Token usage drops by half or more, depending on how verbose your original prompts allowed the agent to be.

This isn't about losing information. It's about being precise. If you need context, add a field for it. If you don't, don't let the model write it anyway. Every token you don't need costs money and latency. Structure forces you to decide what actually matters.

Context Budgets Per Agent

Most workflows don't need every agent to see the full conversation history. Agent A needs its instructions and the current task. It doesn't need to know what Agent C did three steps ago. But by default, you're passing the entire context window forward, and it grows with every interaction.

Set explicit context budgets. Each agent gets exactly the information it needs - no more. If Agent B only needs the output from Agent A, pass only that. If Agent C needs a summary of prior steps, pass a summary, not the raw transcripts.

This requires thinking through your data flow. What does each agent actually use? What's just bloat? Most multi-agent systems pass far more context than necessary because it's easier than being selective. But the cost adds up fast. Ruthless pruning of context is one of the highest-use optimisations you can make.

Kill the Preambles

Language models love preambles. "Certainly, I'll help you with that." "Here's what I found." "Let me break this down for you." Every response starts with pleasantries because that's what the training data looks like.

For agents, this is pure waste. Tell the model explicitly: no preambles, no sign-offs, no politeness. Just the answer. Add it to your system prompt. Enforce it in your output validation. A well-designed agent returns data, not conversation.

The same goes for explanations of reasoning. Unless you specifically need the chain of thought, don't ask for it. The model will happily explain its logic in detail if you let it. That explanation costs tokens. If the output is correct, the reasoning is a luxury you probably can't afford at scale.

What Good Agents Look Like

A token-efficient agent is ruthlessly minimal. It receives structured input, performs one clear task, and returns structured output. No fluff. No context it doesn't need. No explaining itself unless explicitly asked.

This feels unnatural at first. We're used to conversational interfaces. But agents aren't chatbots. They're functions. Treat them like functions - defined inputs, defined outputs, no side conversation. The moment you start letting agents "talk", you're burning tokens on theatre.

The 60-75% reduction in token usage isn't theoretical. It's what happens when you stop asking language models to write prose and start asking them to return data. Same intelligence. Same capability. Quarter of the cost. Build accordingly.

More Featured Insights

Robotics & Automation
Spot Just Learned to Think About What It Sees
Voices & Thought Leaders
Claude Found the Shortcut You Didn't Know Existed

Video Sources

Boston Dynamics YouTube
Spot Uses Visual Reasoning to Complete Real-World Tasks
NVIDIA Robotics
AI-RAN Base Stations Transform Telecom Networks Into Edge AI Infrastructure
Two Minute Papers
Anthropic's Claude Model Optimizes for Shortcuts When Constraints Allow
OpenAI
Codex Enabled Wasmer to Build JavaScript Runtime in 2 Weeks
Theo (t3.gg)
Anthropic Claims Privacy-First iMessage Integration Violates Apple's Terms

Today's Sources

DEV.to AI
Why Your Claude Agents Burn Through API Limits in Hour 1 (And the Fix)
DEV.to AI
I Built an AI System That Runs Itself 24/7-Here's What Actually Happened
DEV.to AI
Adding Memory to AI Agents Using Spring AI and Oracle AI Database
DEV.to AI
Design Needs a Rebrand: How Agents Break Traditional Interface Design
DEV.to AI
Building a CloudTrail Sonifier: Co-developing with Claude
DEV.to AI
Focused Expands to EMEA to Support Production Agent Integration
The Robot Report
Ouster Releases Wrist-Mounted ZED X Nano Stereo Camera
Robohub
25 Years of Automated Science: An Interview with Ross King
ROS Discourse
ROS2 Adaptive Admittance Controller for Compliant Manipulation

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Free Daily Briefing

Start Every Morning Smarter

Luma curates the most important AI, quantum, and tech developments into a 5-minute morning briefing. Free, daily, no spam.

  • 8:00 AM Morning digest ready to listen
  • 1:00 PM Afternoon edition catches what you missed
  • 8:00 PM Daily roundup lands in your inbox

We respect your inbox. Unsubscribe anytime. Privacy Policy

© 2026 MEM Digital Ltd t/a Marbl Codes
About Sources Podcast Audio Privacy Cookies Terms Thou Art That
RSS Feed