Mythos finds a curl vulnerability
Daniel Stenberg, creator and lead developer of cURL:
My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing.
But:
I signed the contract for getting access, but then nothing happened. Weeks went past and I was told there was a hiccup somewhere and access was delayed.
Eventually, I was instead offered that someone else, who has access to the model, could run a scan and analysis on curl for me using Mythos and send me a report. To me, the distinction isn’t that important. It’s not that I would have a lot of time to explore lots of different prompts and doing deep dive adventures anyway. Getting the tool to generate a first proper scan and analysis would be great, whoever did it. I happily accepted this offer.
So Daniel didn't have access to Mythos. Someone else ran the analysis on his behalf. It's unclear what methodology this "someone else" used, how familiar they were with the cURL codebase, or how well they were acquainted with the sort of security issues the project has seen before.
What if Daniel had run the scan himself? I'm willing to bet the results would've been radically different.
I'm not saying all the hype around Mythos is necessarily justified—Anthropic is an AI lab after all, and AI labs lie. However, it's becoming clear that LLMs are remarkably effective at finding bugs and security issues as long as they have the right guidance. For an example of what Claude can do with expert guidance and access to custom tools, see Using LLMs to find Python C-extension bugs.
Broadly speaking, I believe Daniel would agree with this sentiment. He writes:
But allow me to highlight and reiterate what I have said before: AI powered code analyzers are significantly better at finding security flaws and mistakes in source code than any traditional code analyzers did in the past. All modern AI models are good at this now. Anyone with time and some experimental spirits can find security problems now. The high quality chaos is real.
Any project that has not scanned their source code with AI powered tooling will likely find huge number of flaws, bugs and possible vulnerabilities with this new generation of tools. Mythos will, and so will many of the others.
Not using AI code analyzers in your project means that you leave adversaries and attackers time and opportunity to find and exploit the flaws you don’t find.
Lately I find myself drawn to how LLMs can help improve existing human-authored (or mostly human-authored) code. I'm no longer thrilled with the idea of using them to write most of my code for me—been there, dealt with the cognitive debt—but I'm intrigued by how I could use them as superhuman code reviewers to catch my mistakes.
What would a coding harness designed primarily around improving code quality look like?