GitHub Security Lab released an open-source framework that uses AI agents to find security vulnerabilities in code. The results are worth paying attention to - not because AI found bugs (plenty of tools do that), but because it found the kind of bugs traditional tools consistently miss.
The framework discovered over 80 vulnerabilities across open-source projects. Twenty-one percent were rated high severity. These weren't simple coding errors like buffer overflows or SQL injection - the ones static analysis tools catch easily. These were logic bugs and authorisation issues. The kind that require understanding what the code is supposed to DO, not just what it syntactically says.
Why Logic Bugs Are Hard to Catch
Traditional static analysis tools (SAST) work by pattern matching. They scan code looking for known dangerous patterns: unsanitised user input, hardcoded credentials, missing validation checks. They're excellent at this. But they struggle with logic bugs.
A logic bug isn't a syntax error. It's a case where the code does exactly what you told it to, but what you told it to do is wrong. Think: a permission check that validates a user owns Resource A before letting them modify Resource B. The code runs fine. The logic is broken.
GitHub's framework addresses this with a two-stage process: threat modelling followed by targeted auditing. The AI agent first builds a mental model of what the application does - its attack surface, data flows, trust boundaries. Then it audits specific areas where that model suggests vulnerabilities might hide.
This matters because it mirrors how human security researchers work. You don't grep for "eval()" and call it done. You understand the system, identify where things could go wrong, then look closely at those places.
What Business Owners Should Know
If you're running software - whether built in-house or relying on open-source dependencies - this shift is significant. The framework is open source, which means security teams can run it themselves without vendor lock-in.
The practical implication: security testing just got cheaper and more thorough. Logic bugs are expensive to find manually. Security researchers charge appropriately because the work requires deep understanding. Automating even part of this process means broader coverage at lower cost.
But there's a flip side. If AI can find these vulnerabilities, so can attackers. The window between "vulnerability exists" and "vulnerability exploited" is shrinking. Organisations that aren't scanning their codebases regularly - with modern tools - are falling behind.
The Developer Perspective
For developers, this is both helpful and humbling. Helpful because catching authorisation bugs before production saves everyone a bad day. Humbling because it highlights how hard it is to write secure code when juggling feature deadlines.
The framework doesn't replace human review. It augments it. A 21% high-severity hit rate means 79% of findings were lower priority or false positives. You still need judgment to separate signal from noise. But it surfaces issues that might not get spotted otherwise, especially in large codebases where no single person understands every module.
One interesting detail: this works particularly well on open-source projects. The AI has access to the full codebase, issue trackers, documentation, and commit history. Closed-source internal projects may not provide the same context, though the core technique still applies.
What This Means for Security Practice
Security is moving from checklist compliance to continuous threat modelling. Instead of running scans quarterly and filing tickets, teams can integrate this kind of analysis into CI/CD pipelines. Find problems when they're introduced, not months later during a penetration test.
This also democratises security expertise. Smaller teams without dedicated security staff can run sophisticated analysis that previously required specialist knowledge. The AI handles the threat modelling grunt work. Humans focus on fixing what it finds.
The release is well-timed. Open-source supply chain security is a growing concern - and rightly so. Dependencies make up the majority of code in most applications. Knowing those dependencies have been scanned for logic bugs, not just pattern-matched for known CVEs, is reassuring.
For anyone building software, the playbook is straightforward: integrate tools like this into your development workflow, treat security findings as technical debt that compounds if ignored, and remember that AI doesn't replace security expertise - it scales it. The question isn't whether AI will change security testing. It already has. The question is whether your team is using it yet.