Historical Notes

Historical notes

This doc talks about the rationale of some of the high-level design decisions for American Fuzzy Lop. It’s adopted from a discussion with Rob Graham. See README.md for the general instruction manual, and technical_details.md for additional implementation-level insights.

1) Influences

In short, afl-fuzz is inspired chiefly by the work done by Tavis Ormandy back in 2007. Tavis did some very persuasive experiments using gcov block coverage to select optimal test cases out of a large corpus of data, and then using them as a starting point for traditional fuzzing workflows.

(By “persuasive”, I mean: netting a significant number of interesting vulnerabilities.)

In parallel to this, both Tavis and I were interested in evolutionary fuzzing. Tavis had his experiments, and I was working on a tool called bunny-the-fuzzer, released somewhere in 2007.

Bunny used a generational algorithm not much different from afl-fuzz, but also tried to reason about the relationship between various input bits and the internal state of the program, with hopes of deriving some additional value from that. The reasoning / correlation part was probably in part inspired by other projects done around the same time by Will Drewry and Chris Evans.

The state correlation approach sounded very sexy on paper, but ultimately, made the fuzzer complicated, brittle, and cumbersome to use; every other target program would require a tweak or two. Because Bunny didn’t fare a whole lot better than less sophisticated brute-force tools, I eventually decided to write it off. You can still find its original documentation at:

https://code.google.com/p/bunny-the-fuzzer/wiki/BunnyDoc

There has been a fair amount of independent work, too. Most notably, a few weeks earlier that year, Jared DeMott had a Defcon presentation about a coverage-driven fuzzer that relied on coverage as a fitness function.

Jared’s approach was by no means identical to what afl-fuzz does, but it was in the same ballpark. His fuzzer tried to explicitly solve for the maximum coverage with a single input file; in comparison, afl simply selects for cases that do something new (which yields better results - see technical_details.md).

A few years later, Gabriel Campana released fuzzgrind, a tool that relied purely on Valgrind and a constraint solver to maximize coverage without any brute-force bits; and Microsoft Research folks talked extensively about their still non-public, solver-based SAGE framework.

In the past six years or so, I’ve also seen a fair number of academic papers that dealt with smart fuzzing (focusing chiefly on symbolic execution) and a couple papers that discussed proof-of-concept applications of genetic algorithms with the same goals in mind. I’m unconvinced how practical most of these experiments were; I suspect that many of them suffer from the bunny-the-fuzzer’s curse of being cool on paper and in carefully designed experiments, but failing the ultimate test of being able to find new, worthwhile security bugs in otherwise well-fuzzed, real-world software.

In some ways, the baseline that the “cool” solutions have to compete against is a lot more impressive than it may seem, making it difficult for competitors to stand out. For a singular example, check out the work by Gynvael and Mateusz Jurczyk, applying “dumb” fuzzing to ffmpeg, a prominent and security-critical component of modern browsers and media players:

http://googleonlinesecurity.blogspot.com/2014/01/ffmpeg-and-thousand-fixes.html

Effortlessly getting comparable results with state-of-the-art symbolic execution in equally complex software still seems fairly unlikely, and hasn’t been demonstrated in practice so far.

But I digress; ultimately, attribution is hard, and glorying the fundamental concepts behind AFL is probably a waste of time. The devil is very much in the often-overlooked details, which brings us to…

2. Design goals for afl-fuzz

In short, I believe that the current implementation of afl-fuzz takes care of several itches that seemed impossible to scratch with other tools:

  1. Speed. It’s genuinely hard to compete with brute force when your “smart” approach is resource-intensive. If your instrumentation makes it 10x more likely to find a bug, but runs 100x slower, your users are getting a bad deal.

    To avoid starting with a handicap, afl-fuzz is meant to let you fuzz most of the intended targets at roughly their native speed - so even if it doesn’t add value, you do not lose much.

    On top of this, the tool leverages instrumentation to actually reduce the amount of work in a couple of ways: for example, by carefully trimming the corpus or skipping non-functional but non-trimmable regions in the input files.

  2. Rock-solid reliability. It’s hard to compete with brute force if your approach is brittle and fails unexpectedly. Automated testing is attractive because it’s simple to use and scalable; anything that goes against these principles is an unwelcome trade-off and means that your tool will be used less often and with less consistent results.

    Most of the approaches based on symbolic execution, taint tracking, or complex syntax-aware instrumentation are currently fairly unreliable with real-world targets. Perhaps more importantly, their failure modes can render them strictly worse than “dumb” tools, and such degradation can be difficult for less experienced users to notice and correct.

    In contrast, afl-fuzz is designed to be rock solid, chiefly by keeping it simple. In fact, at its core, it’s designed to be just a very good traditional fuzzer with a wide range of interesting, well-researched strategies to go by. The fancy parts just help it focus the effort in places where it matters the most.

  3. Simplicity. The author of a testing framework is probably the only person who truly understands the impact of all the settings offered by the tool - and who can dial them in just right. Yet, even the most rudimentary fuzzer frameworks often come with countless knobs and fuzzing ratios that need to be guessed by the operator ahead of the time. This can do more harm than good.

    AFL is designed to avoid this as much as possible. The three knobs you can play with are the output file, the memory limit, and the ability to override the default, auto-calibrated timeout. The rest is just supposed to work. When it doesn’t, user-friendly error messages outline the probable causes and workarounds, and get you back on track right away.

  4. Chainability. Most general-purpose fuzzers can’t be easily employed against resource-hungry or interaction-heavy tools, necessitating the creation of custom in-process fuzzers or the investment of massive CPU power (most of which is wasted on tasks not directly related to the code we actually want to test).

    AFL tries to scratch this itch by allowing users to use more lightweight targets (e.g., standalone image parsing libraries) to create small corpora of interesting test cases that can be fed into a manual testing process or a UI harness later on.

As mentioned in technical_details.md, AFL does all this not by systematically applying a single overarching CS concept, but by experimenting with a variety of small, complementary methods that were shown to reliably yields results better than chance. The use of instrumentation is a part of that toolkit, but is far from being the most important one.

Ultimately, what matters is that afl-fuzz is designed to find cool bugs - and has a pretty robust track record of doing just that.