Intelligence is foundation
Subscribe
  • Luma
  • About
  • Sources
  • Ecosystem
  • Nura
  • Marbl Codes
00:00
Contact
[email protected]
Connect
  • YouTube
  • LinkedIn
  • GitHub
Legal
Privacy Cookies Terms
  1. Home›
  2. Featured›
  3. Artificial Intelligence›
  4. Frontier AI Models Waste 90% of Their Reasoning Steps
Artificial Intelligence Tuesday, 26 May 2026

Frontier AI Models Waste 90% of Their Reasoning Steps

Share: LinkedIn
Frontier AI Models Waste 90% of Their Reasoning Steps

New research reveals a structural inefficiency in how frontier reasoning models think: between 61% and 93% of their internal reasoning steps are redundant. These steps don't change the final answer. They're cognitive wheel-spinning, baked into the training process itself.

The study, published on arXiv, examined models trained with reinforcement learning to "think out loud" through multi-step reasoning. The promise was transparency - we could see the model's work, verify its logic, catch errors early. The reality is messier.

Researchers found that if you remove most of a model's reasoning chain - the internal monologue it generates before answering - the final answer stays the same. The model arrives at the correct conclusion whether it shows 100 steps of work or 10. The extra 90 steps aren't wrong. They're just... there. Decorative scaffolding around a decision the model had already made.

Why This Happens

The redundancy isn't a bug in any single model. It's structural to how these systems are trained. When you reward a model for reaching the right answer through multi-step reasoning, you don't reward efficiency. You reward correctness. The model learns to generate reasoning chains that look thorough and convince the reward signal, not reasoning chains that are minimal and necessary.

Think of it like a student who's learned that longer essays get better marks. They pad. They repeat themselves in slightly different words. They add steps that don't advance the argument but make the work look more substantial. The teacher (the reward model) can't tell the difference between genuine depth and performative elaboration, so the student optimises for length, not clarity.

Frontier models are doing the same thing. They've learned that verbose reasoning chains correlate with high rewards, so they generate verbose reasoning chains. The system can't distinguish between a step that changes the outcome and a step that just looks plausible.

What This Means for Inference Costs

Redundant reasoning isn't just an academic curiosity. It's expensive. Every reasoning step costs tokens. Tokens cost money. If 90% of those steps are redundant, you're paying for cognitive theatre, not cognitive work.

For developers building on reasoning models, this changes the maths. The current pricing assumes every token of reasoning adds value. If most don't, you're buying hallway conversations when you needed a meeting. The output is the same, but the bill is 10x higher.

This also matters for latency. Reasoning models are slower than direct-answer models because they generate long chains of thought before producing a response. If most of that chain is redundant, we're waiting for nothing. A model that could answer in 10 steps instead takes 100, and the user sits there watching a spinner.

The Training Problem

Fixing this requires rethinking how reasoning models are trained. Right now, the reward signal only cares about the final answer. If the model gets it right, all the steps that led there get reinforced - even the redundant ones. The system has no incentive to prune unnecessary reasoning.

What would help: reward sparsity. Penalise models for taking more steps than necessary. Train them to recognise when they've gathered enough information to commit to an answer, rather than continuing to elaborate. Teach them to stop thinking when thinking stops helping.

This is harder than it sounds. You'd need a way to measure necessity - to distinguish between a step that adds new information and a step that rephrases existing information. That requires supervision at the chain level, not just at the answer level. Most training pipelines don't have that.

Why It Matters Now

Reasoning models are being positioned as the next frontier in AI capability. OpenAI's o1, Google's Gemini reasoning mode, and other systems promise better performance on complex tasks by "thinking harder". But if 90% of that thinking is redundant, the performance gains come at a cost that doesn't scale.

For businesses evaluating reasoning models, this research is a reminder to test inference costs in production, not just capability benchmarks. A model that scores 5% higher on a reasoning benchmark but takes 10x longer to respond might not be worth it. The redundancy tax is real, and it compounds across millions of API calls.

The good news: this is a training problem, not a capability ceiling. Models can reason efficiently - they're just not trained to. The next generation of reasoning systems will need to optimise for both correctness and economy. Until then, expect to pay for a lot of thinking that doesn't actually think.

More Featured Insights

Quantum Computing
US and France Drop $3.5 Billion Into Quantum This Week
Web Development
The 20-Second Import That Killed CI Performance

Today's Sources

arXiv cs.AI
How Much Thinking is Enough? Quantifying Redundancy in LLM Reasoning
arXiv cs.AI
Confidence Calibration in Large Language Models
arXiv cs.LG
Algometrics: Forecasting Under Algorithmic Feedback
arXiv cs.AI
In Search of Open-Endedness: Replicating Picbreeder with Vision-Language Models
arXiv cs.LG
CAFD: Concept-Aware DNN Fault Detection using VLMs
Quantum Pirates
The Week in Quantum Computing: $3B+ in Government & Commercial Capital
Phys.org Quantum Physics
'Butterfly' Molecule Spotted, Completing 20-Year Quantum Zoo Hunt
Phys.org Quantum Physics
Hydrogen Puts Quantum Wormhole Conjecture to the Test
Phys.org Quantum Physics
Randomization Improves Quantum Computer Performance Under Noise
Dev.to
Bajándole todos los minutos posibles al CI del backend con mas de 1000 tests
Dev.to
Harness Engineering: Stop Re-Prompting Your Coding Agent Every Session
Stack Overflow Blog
Do You Have What It Takes to Run AI in Production?
Dev.to
HTML Meta Referrer: Canonical Reference
InfoQ
Java News Roundup: WildFly, Micronaut, Spring AI, Apache Fory, GlassFish Plugin

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Richard Bland
About Sources Privacy Cookies Terms Thou Art That
MEM Digital Ltd t/a Marbl Codes
Co. 13753194 (England & Wales)
VAT: 400325657
24-25 High Street, Wellingborough, NN8 4JZ
© 2026 MEM Digital Ltd