The LibAFL Fuzzing Library

AFL++ Logo

by Andrea Fioraldi and Dominik Maier

Welcome to LibAFL, the Advanced Fuzzing Library. This book shall be a gentle introduction into the library.

This version of the LibAFL book is coupled with the release 1.0 beta of the library.

This document is still work-in-progress and incomplete. The structure and the concepts explained here are subject to change in future revisions, as the structure of LibAFL itself will evolve.

The HTML version of this book is available online at https://aflplus.plus/libafl-book/ and offline from the LibAFL repository in the docs/ folder. Build it using mdbook build in this folder, or run mdbook serve to view the book.

Introduction

Fuzzers are important tools for security researchers and developers alike. A wide range of state-of-the-art tools like AFL++, libFuzzer or honggfuzz are available to users. They do their job in a very effective way, finding thousands of bugs.

From the perspective of a power user, however, these tools are limited. Their design does not treat extensibility as a first-class citizen. Usually, a fuzzer developer can choose to either fork one of these existing tools, or to create a new fuzzer from scratch. In any case, researchers end up with tons of fuzzers, all of which are incompatible with each other. Their outstanding features can not just be combined for new projects. By reinventing the wheel over and over, we may completely miss out on features that are complex to reimplement.

To tackle this issue, we created LibAFL, a library that is not just another fuzzer, but a collection of reusable pieces for individual fuzzers. LibAFL, written in Rust, helps you develop a fuzzer tailored for your specific needs. Be it a specific target, a particular instrumentation backend, or a custom mutator, you can leverage existing bits and pieces to craft the fastest and most efficient fuzzer you can envision.

Why LibAFL?

LibAFL gives you many of the benefits of an off-the-shelf fuzzer, while being completely customizable. Some highlight features currently include:

  • multi platform: LibAFL works pretty much anywhere you can find a Rust compiler for. We already used it on Windows, Android, MacOS, and Linux, on x86_64, aarch64, ...
  • portable: LibAFL can be built in no_std mode. This means it does not require a specific OS-dependent runtime to function. Define an allocator and a way to map pages, and you are good to inject LibAFL in obscure targets like embedded devices, hypervisors, or maybe even WebAssembly?
  • adaptable: Given years of experience fine-tuning AFLplusplus and our academic fuzzing background, we could incorporate recent fuzzing trends into LibAFL's design and make it future-proof. To give an example, as opposed to old-skool fuzzers, a BytesInput is just one of the potential forms of inputs: feel free to use and mutate an Abstract Syntax Tree instead, for structured fuzzing.
  • scalable: As part of LibAFL, we developed Low Level Message Passing, LLMP for short, which allows LibAFL to scale almost linearly over cores. That is, if you chose to use this feature - it is your fuzzer, after all. Scaling to multiple machines over TCP is also possible, using LLMP's broker2broker feature.
  • fast: We do everything we can at compile time so that the runtime overhead is as minimal as it can get.
  • bring your own target: We support binary-only modes, like QEMU-Mode and Frida-Mode with ASAN and CmpLog, as well as multiple compilation passes for sourced-based instrumentation. Of course, we also support custom instrumentation, as you can see in the Python example based on Google's Atheris.
  • usable: This one is on you to decide. Dig right in!

Getting Started

To get startes with LibAFL, there are some initial steps to do. In this chapter, we discuss how to download and build LibAFL, using Rust's cargo command. We also describe the structure of LibAFL's components, so-called crates, and the purpose of each individual crate.

Setup

The first step is to download LibAFL and all dependencies that are not automatically installed with cargo.

Command Line Notation

In this chapter and throughout the book, we show some commands used in the terminal. Lines that you should enter in a terminal all start with $. You don’t need to type in the $ character; it indicates the start of each command. Lines that don’t start with $ typically show the output of the previous command. Additionally, PowerShell-specific examples will use > rather than $.

While you technically do not need to install LibAFL, but can use the version from crates.io directly, we do recommend to download or clone the GitHub version. This gets you the example fuzzers, additional utilities, and latest patches. The easiest way to do this is to use git.

$ git clone git@github.com:AFLplusplus/LibAFL.git

You can alternatively, on a UNIX-like machine, download a compressed archive and extract it with:

$ wget https://github.com/AFLplusplus/LibAFL/archive/main.tar.gz
$ tar xvf LibAFL-main.tar.gz
$ rm LibAFL-main.tar.gz
$ ls LibAFL-main # this is the extracted folder

Clang installation

One of the external dependencies of LibAFL is the Clang C/C++ compiler. While most of the code is in pure Rust, we still need a C compiler because stable Rust still does not support features that some parts of LibAFL may need, such as weak linking, and LLVM builtins linking. For these parts, we use C to expose the missing functionalities to our Rust codebase.

In addition, if you want to perform source-level fuzz testing of C/C++ applications, you will likely need Clang with its instrumentation options to compile the programs under test.

On Linux you could use your distribution's package manager to get Clang, but these packages are not always up-to-date. Instead, we suggest using the Debian/Ubuntu prebuilt packages from LLVM that are available using their official repository.

For Microsoft Windows, you can download the installer package that LLVM generates periodically.

Despite Clang being the default C compiler on MacOS, we discourage the use of the build shipped by Apple and encourage the installation from Homebrew, using brew install llvm.

Alternatively, you can download and build the LLVM source tree - Clang included - following the steps explained here.

Rust installation

If you do not have Rust installed, you can easily follow the steps described here to install it on any supported system. Be aware that Rust versions shipped with Linux distributions may be outdated, LibAFL always targets the latest stable version available via rustup upgrade.

We suggest installing Clang and LLVM first.

Building LibAFL

LibAFL, as most of the Rust projects, can be built using cargo from the root directory of the project with:

$ cargo build --release

Note that the --release flag is optional for development, but you needed to add it to fuzzing at a decent speed. Slowdowns of 10x or more are not uncommon for Debug builds.

The LibAFL repository is composed of multiple crates. The top-level Cargo.toml is the workspace file grouping these crates. Calling cargo build from the root directory will compile all crates in the workspace.

Build Example Fuzzers

The best starting point for experienced rustaceans is to read through, and adapt, the example fuzzers.

We group these fuzzers in the ./fuzzers directory of the LibAFL repository. The directory contains a set of crates that are not part of the workspace.

Each of these example fuzzers uses particular features of LibAFL, sometimes combined with different instrumentation backends (e.g. SanitizerCoverage, Frida, ...).

You can use these crates as examples and as skeletons for custom fuzzers with similar feature sets. Each fuzzer will have a README.md file in its directory, describing the fuzzer and its features.

To build an example fuzzer, you have to invoke cargo build --release from its respective folder (fuzzers/[FUZZER_NAME]).

Crates

LibAFL is composed of different crates. A crate is an individual library in Rust's Cargo build system, that you can use by adding it to your project's Cargo.toml, like:

[dependencies]
libafl = { version = "*" }

For LibAFL, each crate has its self-contained purpose, and the user may not need to use all of them in its project. Following the naming convention of the folders in the project's root, they are:

libafl

This is the main crate that contains all the components needed to build a fuzzer.

This crate has a number of feature flags that enable and disable certain aspects of LibAFL. The features can be found in LibAFL's Cargo.toml under "[features]", and are usually explained with comments there. Some features worthy of remark are:

  • std enables the parts of the code that use the Rust standard library. Without this flag, LibAFL is no_std compatible. This disables a range of features, but allows us to use LibAFL in embedded environments, read the no_std section for further details.
  • derive enables the usage of the derive(...) macros defined in libafl_derive from libafl.
  • rand_trait allows you to use LibAFL's very fast (but insecure!) random number generator wherever compatibility with Rust's rand crate is needed.
  • llmp_bind_public makes LibAFL's LLMP bind to a public TCP port, over which other fuzzers nodes can communicate with this instance.
  • introspection adds performance statistics to LibAFL.

You can chose the features by using features = ["feature1", "feature2", ...] for LibAFL in your Cargo.toml. Out of this list, by default, std, derive, and rand_trait are already set. You can choose to disable them by setting default-features = false in your Cargo.toml.

libafl_sugar

The sugar crate abstracts away most of the complexity of LibAFL's API. Instead of high flexibility, it aims to be high-level and easy-to-use. It is not as flexible as stitching your fuzzer together from each individual component, but allows you to build a fuzzer with minimal lines of code. To see it in action, take a look at the libfuzzer_stb_image_sugar example fuzzer.

libafl_derive

This a proc-macro crate paired with the libafl crate.

At the moment, it just exposes the derive(SerdeAny) macro that can be used to define Metadata structs, see the section about Metadata for details.

libafl_targets

This crate exposes code to interact with, and to instrument, targets. To enable and disable features at compile-time, the features are enabled and disabled using feature flags.

Currently, the supported flags are:

  • pcguard_edges defines the SanitizerCoverage trace-pc-guard hooks to track the executed edges in a map.
  • `pcguard_hitcounts defines the SanitizerCoverage trace-pc-guard hooks to track the executed edges with the hitcounts (like AFL) in a map.
  • libfuzzer exposes a compatibility layer with libFuzzer style harnesses.
  • value_profile defines the SanitizerCoverage trace-cmp hooks to track the matching bits of each comparison in a map.

libafl_cc

This is a library that provides utils wrap compilers and create source-level fuzzers.

At the moment, only the Clang compiler is supported. To understand it deeper, look through the tutorials and examples.

libafl_frida

This library bridges LibAFL with Frida as instrumentation backend.

With this crate, you can instrument targets on Linux/macOS/Windows/Android for coverage collection.

Additionally, it supports CmpLog, and AddressSanitizer instrumentation and runtimes for aarch64.

libafl_qemu

This library bridges LibAFL with QEMU user-mode to fuzz ELF cross-platform binaries.

It works on Linux and can collect edge coverage without collisions! It also supports a wide range of hooks and instrumentation options.

A Simple LibAFL Fuzzer

This chapter discusses a naive fuzzer using the LibAFL API. You will learn about basic entities such as State, Observer, and Executor. While the following chapters discuss the components of LibAFL in detail, here we introduce the fundamentals.

We are going to fuzz a simple Rust function that panics under a condition. The fuzzer will be single-threaded and will stop after the crash, just like libFuzzer normally does.

You can find a complete version of this tutorial as an example fuzzer in fuzzers/baby_fuzzer.

Warning

This example fuzzer is too naive for any real-world usage. Its purpose is solely to show the main components of the library, for a more in-depth walkthrough on building a custom fuzzer go to the Tutorial chapter directly.

Creating a project

We use cargo to create a new Rust project with LibAFL as a dependency.

$ cargo new baby_fuzzer
$ cd baby_fuzzer

The generated Cargo.toml looks like the following:

[package]
name = "baby_fuzzer"
version = "0.1.0"
authors = ["Your Name <you@example.com>"]
edition = "2018"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]

In order to use LibAFl we must add it as dependency adding libafl = { path = "path/to/libafl/" } under [dependencies]. You can use the LibAFL version from crates.io if you want, in this case, you have to use libafl = "*" to get the latest version (or set it to the current version).

As we are going to fuzz Rust code, we want that a panic does not simply cause the program to exit, but raise an abort that can then be caught by the fuzzer. To do that, we specify panic = "abort" in the profiles.

Alongside this setting, we add some optimization flags for the compilation, when building in release mode.

The final Cargo.toml should look similar to the following:

[package]
name = "baby_fuzzer"
version = "0.1.0"
authors = ["Your Name <you@example.com>"]
edition = "2018"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
libafl = { path = "path/to/libafl/" }

[profile.dev]
panic = "abort"

[profile.release]
panic = "abort"
lto = true
codegen-units = 1
opt-level = 3
debug = true

The function under test

Opening src/main.rs, we have an empty main function. To start, we create the closure that we want to fuzz. It takes a buffer as input and panics if it starts with "abc". ExitKind is used to inform the fuzzer about the harness' exit status.

extern crate libafl;
use libafl::{
    bolts::AsSlice,
    inputs::{BytesInput, HasTargetBytes},
    executors::ExitKind,
};

fn main(){
    let mut harness = |input: &BytesInput| {
        let target = input.target_bytes();
        let buf = target.as_slice();
        if buf.len() > 0 && buf[0] == 'a' as u8 {
            if buf.len() > 1 && buf[1] == 'b' as u8 {
                if buf.len() > 2 && buf[2] == 'c' as u8 {
                    panic!("=)");
                }
            }
        }
        ExitKind::Ok
    };
    // To test the panic:
    let input = BytesInput::new(Vec::from("abc"));
    #[cfg(feature = "panic")]
    harness(&input);
}

Generating and running some tests

One of the main components that a LibAFL-based fuzzer uses is the State, a container of the data that is evolved during the fuzzing process. Includes all State, such as the Corpus of inputs, the current RNG state, and potential Metadata for the testcases and run. In our main we create a basic State instance like the following:

// create a State from scratch
let mut state = StdState::new(
    // RNG
    StdRand::with_seed(current_nanos()),
    // Corpus that will be evolved, we keep it in memory for performance
    InMemoryCorpus::new(),
    // Corpus in which we store solutions (crashes in this example),
    // on disk so the user can get them after stopping the fuzzer
    OnDiskCorpus::new(PathBuf::from("./crashes")).unwrap(),
    &mut (),
    &mut ()
).unwrap();
  • The first parameter is a random number generator, that is part of the fuzzer state, in this case, we use the default one StdRand, but you can choose a different one. We seed it with the current nanoseconds.

  • The second parameter is an instance of something implementing the Corpus trait, InMemoryCorpus in this case. The corpus is the container of the testcases evolved by the fuzzer, in this case, we keep it all in memory.

    To avoid type annotation error, you can use InMemoryCorpus::<BytesInput>::new() to replace InMemoryCorpus::new(). If not, type annotation will be automatically inferred when adding executor.

  • third parameter is another corpus that stores the "solution" testcases for the fuzzer. For our purpose, the solution is the input that triggers the panic. In this case, we want to store it to disk under the crashes directory, so we can inspect it.

  • last two parameters are feedback and objective, we will discuss them later.

Another required component is the EventManager. It handles some events such as the addition of a testcase to the corpus during the fuzzing process. For our purpose, we use the simplest one that just displays the information about these events to the user using a Monitor instance.

// The Monitor trait defines how the fuzzer stats are displayed to the user
let mon = SimpleMonitor::new(|s| println!("{}", s));

// The event manager handle the various events generated during the fuzzing loop
// such as the notification of the addition of a new item to the corpus
let mut mgr = SimpleEventManager::new(mon);

In addition, we have the Fuzzer, an entity that contains some actions that alter the State. One of these actions is the scheduling of the testcases to the fuzzer using a Scheduler. We create it as QueueScheduler, a scheduler that serves testcases to the fuzzer in a FIFO fashion.

// A queue policy to get testcasess from the corpus
let scheduler = QueueScheduler::new();

// A fuzzer with feedbacks and a corpus scheduler
let mut fuzzer = StdFuzzer::new(scheduler, (), ());

Last but not least, we need an Executor that is the entity responsible to run our program under test. In this example, we want to run the harness function in-process (without forking off a child, for example), and so we use the InProcessExecutor.

// Create the executor for an in-process function
let mut executor = InProcessExecutor::new(
    &mut harness,
    (),
    &mut fuzzer,
    &mut state,
    &mut mgr,
)
.expect("Failed to create the Executor");

It takes a reference to the harness, the state, and the event manager. We will discuss the second parameter later. As the executor expects that the harness returns an ExitKind object, so we have added ExitKind::Ok to our harness function before.

Now we have the 4 major entities ready for running our tests, but we still cannot generate testcases.

For this purpose, we use a Generator, RandPrintablesGenerator that generates a string of printable bytes.

use libafl::generators::RandPrintablesGenerator;

// Generator of printable bytearrays of max size 32
let mut generator = RandPrintablesGenerator::new(32);

// Generate 8 initial inputs
state
    .generate_initial_inputs(&mut fuzzer, &mut executor, &mut generator, &mut mgr, 8)
    .expect("Failed to generate the initial corpus".into());

Now you can prepend the necessary use directives to your main.rs and compile the fuzzer.


#![allow(unused)]
fn main() {
extern crate libafl;

use std::path::PathBuf;
use libafl::{
    bolts::{AsSlice, current_nanos, rands::StdRand},
    corpus::{InMemoryCorpus, OnDiskCorpus},
    events::SimpleEventManager,
    executors::{inprocess::InProcessExecutor, ExitKind},
    fuzzer::StdFuzzer,
    generators::RandPrintablesGenerator,
    inputs::{BytesInput, HasTargetBytes},
    monitors::SimpleMonitor,
    schedulers::QueueScheduler,
    state::StdState,
};
}

When running, you should see something similar to:

$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/baby_fuzzer`
[LOG Debug]: Loaded 0 over 8 initial testcases

Evolving the corpus with feedbacks

Now you simply ran 8 randomly generated testcases, but none of them has been stored in the corpus. If you are very lucky, maybe you triggered the panic by chance but you don't see any saved file in crashes.

Now we want to turn our simple fuzzer into a feedback-based one and increase the chance to generate the right input to trigger the panic. We are going to implement a simple feedback based on the 3 conditions that are needed to reach the panic. To do that, we need a way to keep track of if a condition is satisfied.

Observer can record the information about properties of a fuzzing run and then feeds the fuzzer. We use the StdMapObserver, the default observer that uses a map to keep track of covered elements. In our fuzzer, each condition is mapped to an entry of such map.

We represent such map as a static mut variable. As we don't rely on any instrumentation engine, we have to manually track the satisfied conditions by singals_set in our harness:


#![allow(unused)]
fn main() {
extern crate libafl;
use libafl::{
    bolts::AsSlice,
    inputs::{BytesInput, HasTargetBytes},
    executors::ExitKind,
};

// Coverage map with explicit assignments due to the lack of instrumentation
static mut SIGNALS: [u8; 16] = [0; 16];

fn signals_set(idx: usize) {
    unsafe { SIGNALS[idx] = 1 };
}

// The closure that we want to fuzz
let mut harness = |input: &BytesInput| {
    let target = input.target_bytes();
    let buf = target.as_slice();
    signals_set(0); // set SIGNALS[0]
    if buf.len() > 0 && buf[0] == 'a' as u8 {
        signals_set(1); // set SIGNALS[1]
        if buf.len() > 1 && buf[1] == 'b' as u8 {
            signals_set(2); // set SIGNALS[2]
            if buf.len() > 2 && buf[2] == 'c' as u8 {
                panic!("=)");
            }
        }
    }
    ExitKind::Ok
};
}

The observer can be created directly from the SIGNALS map, in the following way:

// Create an observation channel using the signals map
let observer = StdMapObserver::new("signals", unsafe { &mut SIGNALS });

The observers are usually kept in the corresponding executor as they keep track of information that is valid for just one run. We have then to modify our InProcessExecutor creation to include the observer as follows:

// Create the executor for an in-process function with just one observer
let mut executor = InProcessExecutor::new(
    &mut harness,
    tuple_list!(observer),
    &mut fuzzer,
    &mut state,
    &mut mgr,
)
.expect("Failed to create the Executor".into());

Now that the fuzzer can observe which condition is satisfied, we need a way to rate an input as interesting (i.e. worth of addition to the corpus) based on this observation. Here comes the notion of Feedback.

Feedback is part of the State and provides a way to rate input and its corresponding execution as interesting looking for the information in the observers. Feedbacks can maintain a cumulative state of the information seen so far in a metadata in the State, in our case it maintains the set of conditions satisfied in the previous runs.

We use MaxMapFeedback, a feedback that implements a novelty search over the map of the MapObserver. Basically, if there is a value in the observer's map that is greater than the maximum value registered so far for the same entry, it rates the input as interesting and updates its state.

Objective Feedback is another kind of Feedback which decide if an input is a "solution". It will save input to solutions(./crashes in our case) other than corpus when the input is rated interesting. We use CrashFeedback to tell the fuzzer that if an input causes the program to crash it is a solution for us.

We need to update our State creation including the feedback state and the Fuzzer including the feedback and the objective:

extern crate libafl;
use libafl::{
    bolts::{current_nanos, rands::StdRand, tuples::tuple_list},
    corpus::{InMemoryCorpus, OnDiskCorpus},
    feedbacks::{MaxMapFeedback, CrashFeedback},
    fuzzer::StdFuzzer,
    state::StdState,
    observers::StdMapObserver,
};

// Feedback to rate the interestingness of an input
let mut feedback = MaxMapFeedback::new(&observer);

// A feedback to choose if an input is a solution or not
let mut objective = CrashFeedback::new();

// create a State from scratch
let mut state = StdState::new(
    // RNG
    StdRand::with_seed(current_nanos()),
    // Corpus that will be evolved, we keep it in memory for performance
    InMemoryCorpus::new(),
    // Corpus in which we store solutions (crashes in this example),
    // on disk so the user can get them after stopping the fuzzer
    OnDiskCorpus::new(PathBuf::from("./crashes")).unwrap(),
    &mut feedback,
    &mut objective
).unwrap();

// ...

// A fuzzer with feedbacks and a corpus scheduler
let mut fuzzer = StdFuzzer::new(scheduler, feedback, objective);

The actual fuzzing

Now, after including the correct use, we can run the program, but the outcome is not so different from the previous one as the random generator does not take into account what we save as interesting in the corpus. To do that, we need to plug a Mutator.

Stages perform actions on individual inputs, taken from the corpus. For instance, the MutationalStage executes the harness several times in a row, every time with mutated inputs.

As the last step, we create a MutationalStage that uses a mutator inspired by the havoc mutator of AFL.

use libafl::{
    mutators::scheduled::{havoc_mutations, StdScheduledMutator},
    stages::mutational::StdMutationalStage,
    fuzzer::Fuzzer,
};

// ...

// Setup a mutational stage with a basic bytes mutator
let mutator = StdScheduledMutator::new(havoc_mutations());
let mut stages = tuple_list!(StdMutationalStage::new(mutator));

fuzzer
    .fuzz_loop(&mut stages, &mut executor, &mut state, &mut mgr)
    .expect("Error in the fuzzing loop");

fuzz_loop will request a testcase for each iteration to the fuzzer using the scheduler and then it will invoke the stage.

After adding this code, we have a proper fuzzer, that can run a find the input that panics the function in less than a second.

$ cargo run
   Compiling baby_fuzzer v0.1.0 (/home/andrea/Desktop/baby_fuzzer)
    Finished dev [unoptimized + debuginfo] target(s) in 1.56s
     Running `target/debug/baby_fuzzer`
[New Testcase] clients: 1, corpus: 2, objectives: 0, executions: 1, exec/sec: 0
[LOG Debug]: Loaded 1 over 8 initial testcases
[New Testcase] clients: 1, corpus: 3, objectives: 0, executions: 804, exec/sec: 0
[New Testcase] clients: 1, corpus: 4, objectives: 0, executions: 1408, exec/sec: 0
thread 'main' panicked at '=)', src/main.rs:35:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Crashed with SIGABRT
Child crashed!
[Objective] clients: 1, corpus: 4, objectives: 1, executions: 1408, exec/sec: 0
Waiting for broker...
Bye!

As you can see, after the panic message, the objectives count of the log increased by one and you will find the crashing input in crashes/.

The complete code can be found in ./fuzzers/baby_fuzzer.

More Examples

Examples can be found under ./fuzzer.

fuzzer nameusage
baby_fuzzer_gramatronGramatron is a fuzzer that uses grammar automatons in conjunction with aggressive mutation operators to synthesize complex bug triggers
baby_fuzzer_grimoireGrimoire is a fully automated coverage-guided fuzzer which works without any form of human interaction or pre-configuration
baby_fuzzer_nautilusnautilus is a coverage guided, grammar based fuzzer
baby_fuzzer_tokensbasic token level fuzzer with token level mutations
baby_fuzzer_with_forkexecutorexample for InProcessForkExecutor
baby_no_stda minimalistic example how to create a libafl based fuzzer that works on no_std environments like TEEs, Kernels or on barew metal

Core Concepts

LibAFL is designed around some core concepts that we think can effectively abstract most of the other fuzzers designs.

Here, we discuss these concepts and provide some examples related to other fuzzers.

Observer

An Observer, or Observation Channel, is an entity that provides an information observed during the execution of the program under test to the fuzzer.

The information contained in the Observer is not preserved across executions.

As an example, the coverage shared map filled during the execution to report the executed edges used by fuzzers such as AFL and HonggFuzz can be considered an Observation Channel. This information is not preserved across runs and it is an observation of a dynamic property of the program.

In terms of code, in the library this entity is described by the Observer trait.

In addition to holding the volatile data connected with the last execution of the target, the structures implementing this trait can define some execution hooks that are executed before and after each fuzz case. In this hooks, the observer can modify the fuzzer's state.

Executor

In different fuzzers, this concept of executing the program under test means each run is now always the same. For instance, for in-memory fuzzers like libFuzzer an execution is a call to an harness function, for hypervisor-based fuzzers like kAFL instead an entire operating system is started from a snapshot for each run.

In our model, an Executor is the entity that defines not only how to execute the target, but all the volatile operations that are related to just a single run of the target.

So the Executor is for instance responsible to inform the program about the input that the fuzzer wants to use in the run, writing to a memory location for instance or passing it as a parameter to the harness function.

In our model, it can also hold a set of Observers connected with each execution.

In Rust, we bind this concept to the Executor trait. A structure implementing this trait must implement HasObservers too if wants to hold a set of Observers.

By default, we implement some commonly used Executors such as InProcessExecutor in which the target is a harness function providing in-process crash detection. Another Executor is the ForkserverExecutor that implements an AFL-like mechanism to spawn child processes to fuzz.

A common pattern when creating an Executor is wrapping an existing one, for instance TimeoutExecutor wraps an executor and install a timeout callback before calling the original run function of the wrapped executor.

InProcessExecutor

Let's begin with the base case; InProcessExecutor. This executor executes the harness program (function) inside the fuzzer process.

When you want to execute the harness as fast as possible, you will most probably want to use this InprocessExecutor.

One thing to note here is, when your harness is likely to have heap corruption bugs, you want to use another allocator so that corrupted heap does not affect the fuzzer itself. (For example, we adopt MiMalloc in some of our fuzzers.). Alternatively you can compile your harness with address sanitizer to make sure you can catch these heap bugs.

ForkserverExecutor

Next, we'll take a look at the ForkserverExecutor. In this case, it is afl-cc (from AFLplusplus/AFLplusplus) that compiles the harness code, and therefore, we can't use EDGES_MAP anymore. Hopefully, we have a way to tell the forkserver which map to record the coverage.

As you can see from the forkserver example,

//Coverage map shared between observer and executor
let mut shmem = StdShMemProvider::new().unwrap().new_shmem(MAP_SIZE).unwrap();
//let the forkserver know the shmid
shmem.write_to_env("__AFL_SHM_ID").unwrap();
let mut shmem_buf = shmem.as_mut_slice();

Here we make a shared memory region; shmem, and write this to environmental variable __AFL_SHM_ID. Then the instrumented binary, or the forkserver, finds this shared memory region (from the aforementioned env var) to record its coverage. On your fuzzer side, you can pass this shmem map to your Observer to obtain coverage feedbacks combined with any Feedback.

Another feature of the ForkserverExecutor to mention is the shared memory testcases. In normal cases, the mutated input is passed between the forkserver and the instrumented binary via .cur_input file. You can improve your forkserver fuzzer's performance by passing the input with shared memory.

See AFL++'s documentation or the fuzzer example in forkserver_simple/src/program.c for reference.

It is very simple, when you call ForkserverExecutor::new() with use_shmem_testcase true, the ForkserverExecutor sets things up and your harness can just fetch the input from __AFL_FUZZ_TESTCASE_BUF

InprocessForkExecutor

Finally, we'll talk about the InProcessForkExecutor. InProcessForkExecutor has only one difference from InprocessExecutor; It forks before running the harness and that's it.

But why do we want to do so? well, under some circumstances, you may find your harness pretty unstable or your harness wreaks havoc on the global states. In this case, you want to fork it before executing the harness runs in the child process so that it doesn't break things.

However, we have to take care of the shared memory, it's the child process that runs the harness code and writes the coverage to the map.

We have to make the map shared between the parent process and the child process, so we'll use shared memory again. You should compile your harness with pointer_maps (for libafl_targes) features enabled, this way, we can have a pointer; EDGES_MAP_PTR that can point to any coverage map.

On your fuzzer side, you can allocate a shared memory region and make the EDGES_MAP_PTR point to your shared memory.

let mut shmem;
unsafe{
    shmem = StdShMemProvider::new().unwrap().new_shmem(MAX_EDGES_NUM).unwrap();
}
let shmem_buf = shmem.as_mut_slice();
unsafe{
    EDGES_PTR = shmem_buf.as_ptr();
}

Again, you can pass this shmem map to your Observer and Feedback to obtain coverage feedbacks.

Feedback

The Feedback is an entity that classifies the outcome of an execution of the program under test as interesting or not. Typically, if an execution is interesting, the corresponding input used to feed the target program is added to a corpus.

Most of the times, the notion of Feedback is deeply linked to the Observer, but they are different concepts.

The Feedback, in most of the cases, processes the information reported by one or more observers to decide if the execution is interesting. The concept of "interestingness" is abstract, but typically it is related to a novelty search (i.e. interesting inputs are those that reach a previously unseen edge in the control flow graph).

As an example, given an Observer that reports all the sizes of memory allocations, a maximization Feedback can be used to maximize these sizes to sport pathological inputs in terms of memory consumption.

In terms of code, the library offers the Feedback and the FeedbackState traits. The first is used to implement functors that, given the state of the obversers from the last execution, tells if the execution was interesting. The second is tied with Feedback and it is the state of the data that the feedback wants to persist in the fuzzers's state, for instance the cumulative map holding all the edges seen so far in the case of a feedback based on edge coverage.

Multiple Feedbacks can be combined into boolean formula, considering for instance an execution as interesting if it triggers new code paths or execute in less time compared to the average execution time using feedback_or.

TODO objective feedbacks and fast feedback logic operators

Input

Formally, the input of a program is the data taken from external sources that affect the program behaviour.

In our model of an abstract fuzzer, we define the Input as the internal representation of the program input (or a part of it).

In the straightforward case, the input of the program is a byte array and in fuzzers such as AFL we store and manipulate exactly these byte arrays.

But it is not always the case. A program can expect inputs that are not byte arrays (e.g. a sequence of syscalls) and the fuzzer does not represent the Input in the same way that the program consumes it.

In case of a grammar fuzzer for instance, the Input is generally an Abstract Syntax Tree because it is a data structure that can be easily manipulated while maintaining the validity, but the program expects a byte array as input, so just before the execution, the tree is serialized to a sequence of bytes.

In the Rust code, an Input is a trait that can be implemented only by structures that are serializable and have only owned data as fields.

Corpus

The Corpus is where testcases are stored. We define a Testcase as an Input and a set of related metadata like execution time for instance.

A Corpus can store testcases in diferent ways, for example on disk, or in memory, or implement a cache to speedup on disk storage.

Usually, a testcase is added to the Corpus when it is considered as interesting, but a Corpus is used also to store testcases that fulfill an objective (like crashing the tested program for instance).

Related to the Corpus, there is the way in which the fuzzer should ask for the next testcase to fuzz picking it from the Corpus. The taxonomy for this in LibAFL is CorpusScheduler, the entity representing the policy to pop testcases from the Corpus, FIFO for instance.

Speaking about the code, Corpus and CorpusScheduler are traits.

Mutator

The Mutator is an entity that takes one or more Inputs and generates a new derived one.

Mutators can be composed and they are generally linked to a specific Input type.

There can be, for instance, a Mutator that applies more than a single type of mutation on the input. Consider a generic Mutator for a byte stream, bit flip is just one of the possible mutations but not the only one, there is also, for instance, the random replacement of a byte of the copy of a chunk.

In LibAFL, Mutator is a trait.

Generator

A Generator is a component designed to generate an Input from scratch.

Typically, a random generator is used to generate random inputs.

Generators are traditionally less used in Feedback-driven Fuzzing, but there are exceptions, like Nautilus, that uses a Grammar generator to create the initial corpus and a sub-tree Generator as a mutation of its grammar Mutator.

In the code, Generator is a trait.

Stage

A Stage is an entity that operates on a single Input got from the Corpus.

For instance, a Mutational Stage, given an input of the corpus, applies a Mutator and executes the generated input one or more time. How many times this has to be done can be scheduled, AFL for instance uses a performance score of the input to choose how many times the havoc mutator should be invoked. This can depend also on other parameters, for instance, the length of the input if we want to just apply a sequential bitflip, or be a fixed value.

A stage can also be an analysis stage, for instance, the Colorization stage of Redqueen that aims to introduce more entropy in a testcase or the Trimming stage of AFL that aims to reduce the size of a testcase.

There are several stages in the LibAFL codebases implementing the Stage trait.

Design

In this chapter, we discuss how we designed the library taking into account the core concepts while allowing code reuse and extensibility.

Architecture

The LibAFL architecture is built around some entities to allow code reuse and low-cost abstractions.

Initially, we started thinking to implement LibAFL in an Object Oriented language, such C++. When we landed to Rust, we immediately changed our idea as we realized that, while Rust allows a sort of OOP pattern, we can build the library using a more sane approach like the one described in this blogpost about game design in Rust.

The LibAFL code reuse meachanism is so based on components rather than sub-classes, but there are still some OOP patterns in the library.

Thinking about similar fuzzers, you can observe that most of the times the data structures that are modified are the ones related to testcases and the fuzzer global state.

Beside the entities previously described, we introduce the Testcase and State entities. The Testcase is a container for an Input stored in the Corpus and its metadata (so, in the implementation, the Corpus stores Testcases) and the State contains all the metadata that are evolved while running the fuzzer, Corpus included.

The State, in the implementation, contains only owned objects that are serializable and it is serializable itself. Some fuzzers may want to serialize its state when pausing or just, when doing in-process fuzzing, serialize on crash and deserialize in the new process to continue to fuzz with all the metadata preserved.

Additionally, we group the entities that are "actions", like the CorpusScheduler and the Feedbacks, in a common place, the `Fuzzer'.

Metadata

A metadata in LibAFL is a self contained structure that holds associated data to the State or to a Testcase.

In terms of code, a metadata can be defined as a Rust struct registered in the SerdeAny register.


#![allow(unused)]
fn main() {
extern crate libafl;
extern crate serde;

use libafl::SerdeAny;
use serde::{Serialize, Deserialize};

#[derive(Debug, Serialize, Deserialize, SerdeAny)]
pub struct MyMetadata {
    //...
}
}

The struct must be static, so it cannot hold references to borrowed objects.

As an alternative to derive(SerdeAny) that is a proc-macro in libafl_derive the user can use libafl::impl_serdeany!(MyMetadata);.

Usage

Metadata objects are primarly intended to be used inside SerdeAnyMap and NamedSerdeAnyMap.

With these maps, the user can retrieve instances by type (and name). Internally, the instances are stored as SerdeAny trait objects.

Structs that want to have a set of metadata must implement the HasMetadata trait.

By default, Testcase and State implement it and hold a SerdeAnyMap testcase.

(De)Serialization

We are interested to store State's Metadata to not lose them in case of crash or stop of a fuzzer. To do that, they must be serialized and unserialized using Serde.

As Metadata is stored in a SerdeAnyMap as trait objects, they cannot be deserialized using Serde by default.

To cope with this problem, in LibAFL each SerdeAny struct must be registered in a global registry that keeps track of types and allows the (de)serialization of the registered types.

Normally, the impl_serdeany macro does that for the user creating a constructor function that fills the registry. However, when using LibAFL in no_std mode, this operation must be carried out manually before any other operation in the main function.

To do that, the developer needs to know each metadata type that is used inside the fuzzer and call RegistryBuilder::register::<MyMetadata>() for each of them at the beginning of main.

Message Passing

LibAFL offers a standard mechanism for message passing over processes and machines with a low overhead. We use message passing to inform the other connected clients/fuzzers/nodes about new testcases, metadata, and statistics about the current run. Depending on individual needs, LibAFL can also write testcase contents to disk, while still using events to notify other fuzzers, using an OnDiskCorpus.

In our tests, message passing scales very well to share new testcases and metadata between multiple running fuzzer instances for multi-core fuzzing. Specifically, it scales a lot better than using memory locks on a shared corpus, and a lot better than sharing the testcases via the filesystem, as AFL traditionally does. Think "all cores are green" in htop, aka., no kernel interaction.

The EventManager interface is used to send Events over the wire using Low Level Message Passing, a custom message passing mechanism over shared memory or TCP.

Low Level Message Passing (LLMP)

LibAFL comes with a reasonably lock-free message passing mechanism that scales well across cores and, using its broker2broker mechanism, even to connected machines via TCP. Most example fuzzers use this mechanism, and it is the best EventManager if you want to fuzz on more than a single core. In the following, we will describe the inner workings of LLMP.

LLMP has one broker process that can forward messages sent by any client process to all other clients. The broker can also intercept and filter the messages it receives instead of forwarding them. A common use-case for messages filtered by the broker are the status messages sent from each client to the broker directly. The broker used this information to paint a simple UI, with up-to-date information about all clients, however the other clients don't need to receive this information.

Speedy Local Messages via Shared Memory

Throughout LibAFL, we use a wrapper around different operating system's shared maps, called ShMem. Shared maps, called shared memory for the sake of not colliding with Rust's map() functions, are the backbone of LLMP. Each client, usually a fuzzer trying to share stats and new testcases, maps an outgoing ShMem map. With very few exceptions, only this client writes to this map, therefore, we do not run in race conditions and can live without locks. The broker reads from all client's ShMem maps. It checks all incoming client maps periodically and then forwards new messages to its outgoing broadcast-ShMem, mapped by all connected clients.

To send new messages, a client places a new message at the end of their shared memory and then updates a static field to notify the broker. Once the outgoing map is full, the sender allocates a new ShMem using the respective ShMemProvider. It then sends the information needed to map the newly-allocated page in connected processes to the old page, using an end of page (EOP) message. Once the receiver maps the new page, flags it as safe for unmapping from the sending process (to avoid race conditions if we have more than a single EOP in a short time), and then continues to read from the new ShMem.

The schema for client's maps to the broker is as follows:

[client0]        [client1]    ...    [clientN]
  |                  |                 /
[client0_out] [client1_out] ... [clientN_out]
  |                 /                /
  |________________/                /
  |________________________________/
 \|/
[broker]

The broker loops over all incoming maps, and checks for new messages. On std builds, the broker will sleep a few milliseconds after a loop, since we do not need the messages to arrive instantly. After the broker received a new message from clientN, (clientN_out->current_id != last_message->message_id) the broker copies the message content to its own broadcast shared memory.

The clients periodically, for example after finishing n mutations, check for new incoming messages by checking if (current_broadcast_map->current_id != last_message->message_id). While the broker uses the same EOP mechanism to map new ShMems for its outgoing map, it never unmaps old pages. This additional memory overhead serves a good purpose: by keeping all broadcast pages around, we make sure that new clients can join in on a fuzzing campaign at a later point in time They just need to re-read all broadcasted messages from start to finish.

So the outgoing messages flow like this over the outgoing broadcast Shmem:

[broker]
  |
[current_broadcast_shmem]
  |
  |___________________________________
  |_________________                  \
  |                 \                  \
  |                  |                  |
 \|/                \|/                \|/
[client0]        [client1]    ...    [clientN]

To use LLMP in LibAFL, you usually want to use an LlmpEventManager or its restarting variant. They are the default if using LibAFL's Launcher.

If you should want to use LLMP in its raw form, without any LibAFL abstractions, take a look at the llmp_test example in ./libafl/examples. You can run the example using cargo run --example llmp_test with the appropriate modes, as indicated by its help output. First, you will have to create a broker using LlmpBroker::new(). Then, create some LlmpClient``s in other threads and register them with the main thread using LlmpBroker::register_client. Finally, call LlmpBroker::loop_forever().

B2B: Connecting Fuzzers via TCP

For broker2broker communication, all broadcast messages are additionally forwarded via network sockets. To facilitate this, we spawn an additional client thread in the broker, that reads the broadcast shared memory, just like any other client would. For broker2broker communication, this b2b client listens for TCP connections from other, remote brokers. It keeps a pool of open sockets to other, remote, b2b brokers around at any time. When receiving a new message on the local broker shared memory, the b2b client will forward it to all connected remote brokers via TCP. Additionally, the broker can receive messages from all connected (remote) brokers, and forward them to the local broker over a client ShMem.

As a sidenote, the tcp listener used for b2b communication is also used for an initial handshake when a new client tries to connect to a broker locally, simply exchanging the initial ShMem descriptions.

Spawning Instances

Multiple fuzzer instances can be spawned using different ways.

Manually, via a TCP port

The straightforward way to do Multi-Threading is to use the LlmpRestartingEventManager, specifically to use setup_restarting_mgr_std. It abstracts away all the pesky details about restarts on crash handling (for in-memory fuzzers) and multi-threading. With it, every instance you launch manually tries to connect to a TCP port on the local machine.

If the port is not yet bound, this instance becomes the broker, itself binding to the port to await new clients.

If the port is already bound, the EventManager will try to connect to it. The instance becomes a client and can now communicate with all other nodes.

Launching nodes manually has the benefit that you can have multiple nodes with different configurations, such as clients fuzzing with and without ASAN.

While it's called "restarting" manager, it uses fork on Unix operating systems as optimization and only actually restarts from scratch on Windows.

Launcher

The Launcher is the lazy way to do multiprocessing. You can use the Launcher builder to create a fuzzer that spawns multiple nodes, all using restarting event managers. An example may look like this:

    Launcher::builder()
        .configuration(EventConfig::from_name(&configuration))
        .shmem_provider(shmem_provider)
        .monitor(mon)
        .run_client(&mut run_client)
        .cores(cores)
        .broker_port(broker_port)
        .stdout_file(stdout_file)
        .remote_broker_addr(broker_addr)
        .build()
        .launch()

This first starts a broker, then spawns n clients, according to the value passed to cores. The value is a string indicating the cores to bind to, for example, 0,2,5 or 0-3. For each client, run_client will be called. On Windows, the Launcher will restart each client, while on Unix, it will use fork.

Other ways

The LlmpEvenManager family is the easiest way to do spawn instances, but for obscure targets, you may need to come up with other solutions. LLMP is even, in theory, no_std compatible, and even completely different EventManagers can be used for message passing. If you are in this situation, please either read through the current implementations and/or reach out to us.

Configurations

Configurations for individual fuzzer nodes are relevant for multi node fuzzing. The chapter describes how to run nodes with different configurations in one fuzzing cluster. This allows, for example, a node compiled with ASAN, to know that it needs to rerun new testcases for a node without ASAN, while the same binary/configuration does not.

Under Construction!

This section is under construction. Please check back later (or open a PR)

Tutorial

In this chapter, we will build a custom fuzzer using the Lain mutator in Rust.

This tutorial will introduce you in writing extensions to LibAFL like Feedbacks and Testcase's metadata.

Introduction

Under Construction!

This section is under construction. Please check back later (or open a PR)

Advanced Features

In addition to core building blocks for fuzzers, LibAFL also has features for more advanced/niche fuzzing techniques. The following sections are dedicated to these features.

Concolic Tracing and Hybrid Fuzzing

LibAFL has support for concolic tracing based on the SymCC instrumenting compiler.

For those uninitiated, the following attempts to describe concolic tracing from the ground up using an example. Then, we'll go through the relationship of SymCC and LibAFL concolic tracing. Finally, we'll walk through building a basic hybrid fuzzer using LibAFL.

Concolic Tracing by Example

Suppose you want to fuzz the following program:


#![allow(unused)]
fn main() {
fn target(input: &[u8]) -> i32 {
    match &input {
        // fictitious crashing input
        &[1, 3, 3, 7] => 1337,
        // standard error handling code
        &[] => -1,
        // representative of normal execution
        _ => 0 
    }
}
}

A simple coverage-maximizing fuzzer that generates new inputs somewhat randomly will have a hard time finding an input that triggers the fictitious crashing input. Many techniques have been proposed to make fuzzing less random and more directly attempt to mutate the input to flip specific branches, such as the ones involved in crashing the above program.

Concolic tracing allows us to construct an input that exercises a new path in the program (such as the crashing one in the example) analytically instead of stochastically (ie. guessing). In principle, concolic tracing works by observing all executed instructions in an execution of the program that depend on the input. To understand what this entails, we'll run an example with the above program.

First, we'll simplify the program to simple if-then-else-statements:


#![allow(unused)]
fn main() {
fn target(input: &[u8]) -> i32 {
    if input.len() == 4 {
        if input[0] == 1 {
            if input[1] == 3 {
                if input[2] == 3 {
                    if input[3] == 7 {
                        return 1337;
                    } else {
                        return 0;
                    }
                } else {
                    return 0;
                }
            } else {
                return 0;
            }
        } else {
            return 0;
        }
    } else {
        if input.len() == 0 {
            return -1;
        } else {
            return 0;
        }
    }
}
}

Next, we'll trace the program on the input []. The trace would look like this:

Branch { // if input.len() == 4
    condition: Equals { 
        left: Variable { name: "input_len" }, 
        right: Integer { value: 4 } 
    }, 
    taken: false // This condition turned out to be false...
}
Branch { // if input.len() == 0
    condition: Equals { 
        left: Variable { name: "input_len" }, 
        right: Integer { value: 0 } 
    }, 
    taken: true // This condition turned out to be true!
}

Using this trace, we can easily deduce that we can force the program to take a different path by having an input of length 4 or having an input with non-zero length. We do this by negating each branch condition and analytically solving the resulting 'expression'. In fact, we can create these expressions for any computation and give them to an SMT-Solver that will generate an input that satisfies the expression (as long as such an input exists).

In hybrid fuzzing, we combine this tracing + solving approach with more traditional fuzzing techniques.

Concolic Tracing in LibAFL, SymCC and SymQEMU

The concolic tracing support in LibAFL is implemented using SymCC. SymCC is a compiler plugin for clang that can be used as a drop-in replacement for a normal C or C++ compiler. SymCC will instrument the compiled code with callbacks into a runtime that can be supplied by the user. These callbacks allow the runtime to construct a trace that similar to the previous example.

SymCC and its Runtimes

SymCC ships with 2 runtimes:

  • a 'simple' runtime that attempts to solve any branches it comes across using Z3 and
  • a QSym-based runtime, which does a bit more filtering on the expressions and also solves using Z3.

The integration with LibAFL, however, requires you to BYORT (bring your own runtime) using the symcc_runtime crate. This crate allows you to easily build a custom runtime out of the built-in building blocks or create entirely new runtimes with full flexibility. Checkout out the symcc_runtime docs for more information on how to build your own runtime.

SymQEMU

SymQEMU is a sibling project to SymCC. Instead of instrumenting the target at compile-time, it inserts instrumentation via dynamic binary translation, building on top of the QEMU emulation stack. This means that using SymQEMU, any (x86) binary can be traced without the need to build in instrumentation ahead of time. The symcc_runtime crate supports this use case and runtimes built with symcc_runtime also work with SymQEMU.

Hybrid Fuzzing in LibAFL

The LibAFL repository contains an example hybrid fuzzer.

There are three main steps involved with building a hybrid fuzzer using LibAFL:

  1. Building a runtime,
  2. choosing an instrumentation method and
  3. building the fuzzer.

Note that the order of these steps is important. For example, we need to have runtime ready before we can do instrumentation with SymCC.

Building a Runtime

Building a custom runtime can be done easily using the symcc_runtime crate. Note, that a custom runtime is a separate shared object file, which means that we need a separate crate for our runtime. Check out the example hybrid fuzzer's runtime and the symcc_runtime docs for inspiration.

Instrumentation

There are two main instrumentation methods to make use of concolic tracing in LibAFL:

  • Using an compile-time instrumented target with SymCC. This only works when the source is available for the target and the target is reasonably easy to build using the SymCC compiler wrapper.
  • Using SymQEMU to dynamically instrument the target at runtime. This avoids a separate instrumented target with concolic tracing instrumentation and does not require source code. It should be noted, however, that the 'quality' of the generated expressions can be significantly worse and SymQEMU generally produces significantly more and significantly more convoluted expressions than SymCC. Therefore, it is recommended to use SymCC over SymQEMU when possible.

Using SymCC

The target needs to be instrumented ahead of fuzzing using SymCC. How exactly this is done does not matter. However, the SymCC compiler needs to be made aware of the location of the runtime that it should instrument against. This is done by setting the SYMCC_RUNTIME_DIR environment variable to the directory which contains the runtime (typically the target/(debug|release) folder of your runtime crate).

The example hybrid fuzzer instruments the target in its build.rs build script. It does this by cloning and building a copy of SymCC and then using this version to instrument the target. The symcc_libafl crate contains helper functions for cloning and building SymCC.

Make sure you satisfy the build requirements of SymCC before attempting to build it.

Using SymQEMU

Build SymQEMU according to its build instructions. By default, SymQEMU looks for the runtime in a sibling directory. Since we don't have a runtime there, we need to let it know the path to your runtime by setting --symcc-build argument of the configure script to the path of your runtime.

Building the Fuzzer

No matter the instrumentation method, the interface between the fuzzer and the instrumented target should now be consistent. The only difference between using SymCC and SymQEMU should be the binary that represents the target: In the case of SymCC it will be the binary that was build with instrumentation and with SymQEMU it will be the emulator binary (eg. x86_64-linux-user/symqemu-x86_64), followed by your uninstrumented target binary and arguments.

You can use the CommandExecutor to execute your target (example). When configuring the command, make sure you pass the SYMCC_INPUT_FILE environment variable the input file path, if your target reads input from a file (instead of standard input).

Serialization and Solving

While it is perfectly possible to build a custom runtime that also performs the solving step of hybrid fuzzing in the context of the target process, the intended use of the LibAFL concolic tracing support is to serialize the (filtered and pre-processed) branch conditions using the TracingRuntime. This serialized representation can be deserialized in the fuzzer process for solving using a ConcolicObserver wrapped in a ConcolicTracingStage, which will attach a ConcolicMetadata to every TestCase.

The ConcolicMetadata can be used to replay the concolic trace and solved using an SMT-Solver. Most use-cases involving concolic tracing, however, will need to define some policy around which branches they want to solve. The SimpleConcolicMutationalStage can be used for testing purposes. It will attempt to solve all branches, like the original simple backend from SymCC, using Z3.

Example

The example fuzzer shows how to use the ConcolicTracingStage together with the SimpleConcolicMutationalStage to build a basic hybrid fuzzer.

Using LibAFL in no_std environments

It is possible to use LibAFL in no_std environments e.g. custom platforms like microcontrolles, kernels, hypervisors, and more.

You can simply add LibAFL to your Cargo.toml file:

libafl = { path = "path/to/libafl/", default-features = false}

Then build your project e.g. for aarch64-unknown-none using

cargo build --no-default-features --target aarch64-unknown-none

Use custom timing

The minimum amount of input LibAFL needs for no_std is a monotonically increasing timestamp. For this, anywhere in your project you need to implement the external_current_millis function, which returns the current time in milliseconds.

// Assume this a clock source from a custom stdlib, which you want to use, which returns current time in seconds.

int my_real_seconds(void)
{
    return *CLOCK;
}

and here we use it in Rust. external_current_millis is then called from LibAFL. Note that it needs to be no_mangle in order to get picked up by LibAFL at linktime.

#[no_mangle]
pub extern "C" fn external_current_millis() -> u64 {
    unsafe { my_real_seconds()*1000 }
}