Intelligence is foundation
Subscribe
  • Luma
  • About
  • Sources
  • Ecosystem
  • Nura
  • Marbl Codes
00:00
Contact
[email protected]
Connect
  • YouTube
  • LinkedIn
  • GitHub
Legal
Privacy Cookies Terms
  1. Home›
  2. Featured›
  3. Artificial Intelligence›
  4. GitHub Cut Their AI Bills by Half With One Simple Trick
Artificial Intelligence Friday, 8 May 2026

GitHub Cut Their AI Bills by Half With One Simple Trick

Share: LinkedIn
GitHub Cut Their AI Bills by Half With One Simple Trick

GitHub's engineering team did something rare in the AI space - they actually measured what their agents were doing. What they found was eye-opening: between 37% and 62% of their token consumption was pure waste.

The insight came from instrumenting their own CI/CD agentic workflows. These are the AI agents that help developers write code, review pull requests, and automate repository tasks. GitHub built automated tools to track exactly where tokens were being spent - and more importantly, where they were being burned for no reason.

The biggest offender? Unused tools. Agents were being given access to dozens of functions they never called. Every unused tool adds to the context window - that's the working memory an LLM uses to reason about what to do next. Fill it with irrelevant options and you're paying for the AI to ignore things.

The second issue was subtler but just as costly: data gathering happening inside the reasoning loop. Imagine asking someone to solve a maths problem, but first making them read an entire textbook to find the relevant formula. That's what these agents were doing - fetching data, processing it, then using it to make a decision, all within the same expensive LLM call.

Moving Data Out of the Decision Loop

The fix was architectural. GitHub's team moved data gathering into a separate step - fetch what you need first, then hand the relevant bits to the LLM for reasoning. The LLM still makes the decision, but it's not wading through raw data to get there.

This isn't just about cost. Smaller context windows mean faster responses. Less noise in the prompt means better decisions. When you strip out the irrelevant, what's left gets more attention.

The other intervention was ruthless pruning of tool access. If an agent hadn't used a tool in recent runs, it got removed from the options. This required instrumentation - tracking which functions were actually being called versus which were just sitting there inflating the token count.

Why This Matters Beyond GitHub

Most companies building agentic workflows don't have GitHub's resources to instrument everything. But the principle scales down. If you're building agents - whether for customer support, code review, or internal automation - you're almost certainly paying for tokens you don't need.

The pattern GitHub identified shows up everywhere: agents with too many tools, prompts stuffed with context "just in case", and reasoning loops doing work that could happen upstream. Every one of those is a lever you can pull.

For developers running agents on a budget, the lesson is clear. Before optimising your prompts or switching models, measure where your tokens are going. You might find that half your spend is on functions that never get called and data that never gets used. That's not a model problem. That's an architecture problem.

GitHub's full write-up is available on their blog. It's worth reading if you're running production agents - the specifics of their instrumentation approach are generous in detail.

The bigger takeaway: agentic AI is still new enough that basic housekeeping - measuring, pruning, separating concerns - can cut your bills in half. That's not clever prompt engineering. That's just paying attention.

More Featured Insights

Quantum Computing
Electrons Form Crystals That Melt Like Ice
Web Development
Why Your AI Agent Keeps Breaking and How to Fix It

Today's Sources

GitHub Blog
Improving token efficiency in GitHub Agentic Workflows
arXiv cs.LG
SAT: Sequential Agent Tuning for Coordinator Free Multi-LLM Training
TechCrunch
OpenAI launches new voice intelligence features in its API
TechCrunch
Why you can never get your doctor to call you back
TechCrunch AI
Voi founders' new AI startup Pit raises $16M seed from a16z
arXiv cs.LG
Are Flat Minima an Illusion?
Phys.org Quantum Physics
Quantum metallurgy: Electron crystals deform and melt
arXiv – Quantum Physics
Error Mitigation in Dynamic Circuits for Hamiltonian Simulation
arXiv – Quantum Physics
Quantizing gravitational fields with an entropy-corrected action principle
Dev.to
The Perfect CLAUDE.md: A Practical Specification for Agentic Coding Projects
Hacker News
Mojo 1.0 Beta
DZone
How to Implement AI Agents in Rails With RubyLLM
Dev.to
Java Generics, Type Erasure, and Reified Generics as First-Class Citizens
Hacker News
Evaluating Geekbench 6
Hacker News
Blaise - A modern self-hosting zero-legacy Object Pascal compiler targeting QBE

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Richard Bland
About Sources Privacy Cookies Terms Thou Art That
MEM Digital Ltd t/a Marbl Codes
Co. 13753194 (England & Wales)
VAT: 400325657
3-4 Brittens Court, Clifton Reynes, Olney, MK46 5LG
© 2026 MEM Digital Ltd