Intelligence is foundation
Podcast Subscribe
Artificial Intelligence Thursday, 19 February 2026

Testing AI by Breaking It on Purpose

Share: LinkedIn
Testing AI by Breaking It on Purpose

There's a problem with how we test AI systems. We throw data at them and see what comes out. But what if we could work backwards - design inputs specifically to expose the exact behaviours we're worried about?

That's what researchers have done with a new approach called reverse n-wise output testing. Instead of testing what goes in, they focus on what comes out - and then craft inputs to reveal specific failure modes.

Why Traditional Testing Misses Problems

Most AI testing works like this: feed the model varied inputs, check the outputs, hope you catch issues. The problem? You're guessing. You might miss critical failure patterns simply because you didn't happen to test the right combination of inputs.

Think of it like testing a bridge by driving random vehicles across it. You might miss the specific weight distribution that causes a crack. What you really want is to design the test load to stress exactly the points you're worried about.

This new method does exactly that for AI. It identifies output behaviours you want to test - say, biased decisions or safety violations - then works backwards to generate inputs that trigger those specific outcomes.

How It Actually Works

The reverse n-wise approach starts with the outputs you care about. Want to test if your AI makes fair hiring decisions across different demographic combinations? Define those output scenarios first.

Then the system generates inputs designed to produce those exact outputs. It's not random. It's targeted. The method ensures you test every meaningful combination of output behaviours, not just the ones you stumble across.

Early results show this catches significantly more faults than traditional input-based testing. That matters because AI systems deployed in healthcare, finance, or autonomous vehicles can't afford to have blind spots.

What This Means for Builders

If you're building with AI, this is a shift in how you think about testing. Stop asking "what inputs should I test?" Start asking "what failure modes am I worried about?"

Define the behaviours that would constitute problems - discrimination, safety violations, incorrect classifications in critical scenarios. Then design tests that specifically hunt for those outcomes.

This approach doesn't replace traditional testing. It complements it. Use broad input testing to catch unexpected issues. Use output-focused testing to verify the specific behaviours that matter most.

The practical implication? More reliable AI systems with fewer catastrophic blind spots. That's not theoretical. That's what happens when you test what you actually care about instead of hoping random inputs will reveal problems.

For anyone deploying AI in production, the question isn't whether to adopt this approach. It's how quickly you can integrate it into your testing pipeline.

More Featured Insights

Quantum Computing
Simulating Turbulence Without Losing the Plot
Web Development
The Stack Choices That Actually Matter in 2026

Today's Sources

AI Testing Focuses on Outcomes, Not Inputs
Survey Reveals AI Advances in Telecom: Networks and Automation in Driver's Seat
Reliance Unveils $110B AI Investment Plan as India Ramps Up Tech Ambitions
The Future of Agentic AI
I Needed Claude Code as a Network Service for My Pipelines. So I Built One.
DBS Pilots System That Lets AI Agents Make Payments for Customers
Cursor vs Windsurf vs Claude Code in 2026: The Honest Comparison After Using All Three
I Benchmarked 10 AI Models on Reading Human Emotions
I Reverse-Engineered ChatGPT's UI Into an OpenAI-Compatible API and Here's Why You Shouldn't
Evaluating AI agents: Real-world lessons from building agentic systems at Amazon
New in Agent Builder: all new agent chat, file uploads + tool registry
Google DeepMind wants to know if chatbots are just virtue signaling
New Model Captures Complex Flows over Long Timescales
Metallic Material Breaks 100-Year Thermal Conductivity Record
Gravity and Matter Linked by Quantum Entanglement
Quantum entanglement pushes optical clocks to new precision
Levitated Microsphere Boosts Force Sensing at Tiny Scales
Microscopic mirrors for future quantum networks: A new way to make high-performance optical resonators
Simplifying quantum simulations-symmetry can cut computational effort by several orders of magnitude
Rethinking how quantum phases change
The 5 Future-Proof Language + Framework Combos Crushing It Right Now
The Future of Agentic AI
Creating a Smooth Horizontal Parallax Gallery: From DOM to WebGL
Bliki: Host Leadership
The Corner Cases of Implementing CSS corner-shape in Blink
I Automated My Own Voice - And It's Weirder Than I Expected
We Ralph Wiggumed WebStreams to make them 10x faster
Interop 2026 launched: 15 new cross-browser features coming
Fragments: February 18 - Thoughtworks on AI-driven development
GitHub Agentic Workflows Unleash AI-Driven Repository Automation
What to expect for open source in 2026

Listen

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Free Daily Briefing

Start Every Morning Smarter

Luma curates the most important AI, quantum, and tech developments into a 5-minute morning briefing. Free, daily, no spam.

  • 8:00 AM Morning digest ready to listen
  • 1:00 PM Afternoon edition catches what you missed
  • 8:00 PM Daily roundup lands in your inbox

We respect your inbox. Unsubscribe anytime. Privacy Policy

© 2026 MEM Digital Ltd t/a Marbl Codes
About Sources Podcast Audio Privacy Cookies Terms Thou Art That
RSS Feed