GitHub's engineering team did something rare in the AI space - they actually measured what their agents were doing. What they found was eye-opening: between 37% and 62% of their token consumption was pure waste.
The insight came from instrumenting their own CI/CD agentic workflows. These are the AI agents that help developers write code, review pull requests, and automate repository tasks. GitHub built automated tools to track exactly where tokens were being spent - and more importantly, where they were being burned for no reason.
The biggest offender? Unused tools. Agents were being given access to dozens of functions they never called. Every unused tool adds to the context window - that's the working memory an LLM uses to reason about what to do next. Fill it with irrelevant options and you're paying for the AI to ignore things.
The second issue was subtler but just as costly: data gathering happening inside the reasoning loop. Imagine asking someone to solve a maths problem, but first making them read an entire textbook to find the relevant formula. That's what these agents were doing - fetching data, processing it, then using it to make a decision, all within the same expensive LLM call.
Moving Data Out of the Decision Loop
The fix was architectural. GitHub's team moved data gathering into a separate step - fetch what you need first, then hand the relevant bits to the LLM for reasoning. The LLM still makes the decision, but it's not wading through raw data to get there.
This isn't just about cost. Smaller context windows mean faster responses. Less noise in the prompt means better decisions. When you strip out the irrelevant, what's left gets more attention.
The other intervention was ruthless pruning of tool access. If an agent hadn't used a tool in recent runs, it got removed from the options. This required instrumentation - tracking which functions were actually being called versus which were just sitting there inflating the token count.
Why This Matters Beyond GitHub
Most companies building agentic workflows don't have GitHub's resources to instrument everything. But the principle scales down. If you're building agents - whether for customer support, code review, or internal automation - you're almost certainly paying for tokens you don't need.
The pattern GitHub identified shows up everywhere: agents with too many tools, prompts stuffed with context "just in case", and reasoning loops doing work that could happen upstream. Every one of those is a lever you can pull.
For developers running agents on a budget, the lesson is clear. Before optimising your prompts or switching models, measure where your tokens are going. You might find that half your spend is on functions that never get called and data that never gets used. That's not a model problem. That's an architecture problem.
GitHub's full write-up is available on their blog. It's worth reading if you're running production agents - the specifics of their instrumentation approach are generous in detail.
The bigger takeaway: agentic AI is still new enough that basic housekeeping - measuring, pruning, separating concerns - can cut your bills in half. That's not clever prompt engineering. That's just paying attention.