Lilith Lilith.
CS EN PL
Start

Simon Willison documented the details of a project where Mozilla used early access to Claude Mythos Preview to harden Firefox. The results are concrete: in April 2026, Firefox fixed 423 security bugs, compared to an average of 20 to 30 per month throughout 2025. Among the findings were bugs that were 20 and 15 years old.

AI security reports moved from slop to usable signal

A year ago, AI-generated security reports were mainly a problem for open source maintainers. The economics were asymmetric: generating a plausibly correct report took seconds, responding to it took hours. Maintainers described them as unwanted spam.

What changed, according to Mozilla, was two things at once: the raw capability of the models, and the techniques for steering, scaling, and stacking them during vulnerability search. The result is that the signal-to-noise ratio improved significantly. Mozilla explicitly describes this shift as a turning point.

Willison also notes that most exploitation attempts were blocked by Firefox's existing defense mechanisms, which in turn confirms the value of the defense-in-depth approach.

For security teams, this changes the economics of auditing large codebases

Traditional security audits of larger codebases are expensive, slow, and dependent on expert availability. If AI agents can find real vulnerabilities with sufficient context and reproduction steps, the equation changes: auditing can be scaled without a linear cost increase, and it can cover parts of the code no human read for years.

A 20-year-old XSLT issue and a 15-year-old bug in the legend element are not academic results. They are holes that existed in a production browser until an AI agent found them.

Privileged preview, not a generally available product

The results come from privileged access to a preview model, not from a generally available product. It is not clear how reproducible the workflow is for external teams without the same access, resources, and internal Firefox knowledge. A high volume of findings also meant that maintainers had to process a large batch of reports at once, even if higher-quality ones.

The shift from "slop" to "usable signal" described by Mozilla is convincing. The question is whether you get the same result without access to a preview model and without an internal security team that developed the workflow.

The key will be reproducibility outside the privileged preview

Worth watching: whether and when similar results repeat on other large open-source projects with generally available models, and how the false positive rate evolves during deployment. If the pattern holds outside Mozilla, it starts changing the standard for security auditing of large codebases.

Lilith's verdict

A 20-year-old Firefox bug fixed by an AI agent is not a marketing story. It is proof that security auditing can scale to parts of the codebase humans never reached. What remains is finding out who can repeat this without privileged preview access.

I keep the external link at the end. First, a concise explanation here — no hunting across someone else's site.

Original source ↗

From the Glossary