Concolic Tracing and Hybrid Fuzzing
LibAFL has support for concolic tracing based on the SymCC instrumenting compiler.
For those uninitiated, the following attempts to describe concolic tracing from the ground up using an example. Then, we'll go through the relationship of SymCC and LibAFL concolic tracing. Finally, we'll walk through building a basic hybrid fuzzer using LibAFL.
Concolic Tracing by Example
Suppose you want to fuzz the following program:
#![allow(unused)] fn main() { fn target(input: &[u8]) -> i32 { match &input { // fictitious crashing input &[1, 3, 3, 7] => 1337, // standard error handling code &[] => -1, // representative of normal execution _ => 0 } } }
A simple coverage-maximizing fuzzer that generates new inputs somewhat randomly will have a hard time finding an input that triggers the fictitious crashing input. Many techniques have been proposed to make fuzzing less random and more directly attempt to mutate the input to flip specific branches, such as the ones involved in crashing the above program.
Concolic tracing allows us to construct an input that exercises a new path in the program (such as the crashing one in the example) analytically instead of stochastically (ie. guessing). In principle, concolic tracing works by observing all executed instructions in an execution of the program that depend on the input. To understand what this entails, we'll run an example with the above program.
First, we'll simplify the program to simple if-then-else-statements:
#![allow(unused)] fn main() { fn target(input: &[u8]) -> i32 { if input.len() == 4 { if input[0] == 1 { if input[1] == 3 { if input[2] == 3 { if input[3] == 7 { return 1337; } else { return 0; } } else { return 0; } } else { return 0; } } else { return 0; } } else { if input.len() == 0 { return -1; } else { return 0; } } } }
Next, we'll trace the program on the input []
.
The trace would look like this:
Branch { // if input.len() == 4
condition: Equals {
left: Variable { name: "input_len" },
right: Integer { value: 4 }
},
taken: false // This condition turned out to be false...
}
Branch { // if input.len() == 0
condition: Equals {
left: Variable { name: "input_len" },
right: Integer { value: 0 }
},
taken: true // This condition turned out to be true!
}
Using this trace, we can easily deduce that we can force the program to take a different path by having an input of length 4 or having an input with non-zero length. We do this by negating each branch condition and analytically solving the resulting 'expression'. In fact, we can create these expressions for any computation and give them to an SMT-Solver that will generate an input that satisfies the expression (as long as such an input exists).
In hybrid fuzzing, we combine this tracing + solving approach with more traditional fuzzing techniques.
Concolic Tracing in LibAFL, SymCC and SymQEMU
The concolic tracing support in LibAFL is implemented using SymCC. SymCC is a compiler plugin for clang that can be used as a drop-in replacement for a normal C or C++ compiler. SymCC will instrument the compiled code with callbacks into a runtime that can be supplied by the user. These callbacks allow the runtime to construct a trace that similar to the previous example.
SymCC and its Runtimes
SymCC ships with 2 runtimes:
- a 'simple' runtime that attempts to solve any branches it comes across using Z3 and
- a QSym-based runtime, which does a bit more filtering on the expressions and also solves using Z3.
The integration with LibAFL, however, requires you to BYORT (bring your own runtime) using the symcc_runtime
crate.
This crate allows you to easily build a custom runtime out of the built-in building blocks or create entirely new runtimes with full flexibility.
Checkout out the symcc_runtime
docs for more information on how to build your own runtime.
SymQEMU
SymQEMU is a sibling project to SymCC.
Instead of instrumenting the target at compile-time, it inserts instrumentation via dynamic binary translation, building on top of the QEMU
emulation stack.
This means that using SymQEMU, any (x86) binary can be traced without the need to build in instrumentation ahead of time.
The symcc_runtime
crate supports this use case and runtimes built with symcc_runtime
also work with SymQEMU.
Hybrid Fuzzing in LibAFL
The LibAFL repository contains an example hybrid fuzzer.
There are three main steps involved with building a hybrid fuzzer using LibAFL:
- Building a runtime,
- choosing an instrumentation method and
- building the fuzzer.
Note that the order of these steps is important. For example, we need to have runtime ready before we can do instrumentation with SymCC.
Building a Runtime
Building a custom runtime can be done easily using the symcc_runtime
crate.
Note, that a custom runtime is a separate shared object file, which means that we need a separate crate for our runtime.
Check out the example hybrid fuzzer's runtime and the symcc_runtime
docs for inspiration.
Instrumentation
There are two main instrumentation methods to make use of concolic tracing in LibAFL:
- Using an compile-time instrumented target with SymCC. This only works when the source is available for the target and the target is reasonably easy to build using the SymCC compiler wrapper.
- Using SymQEMU to dynamically instrument the target at runtime. This avoids a separate instrumented target with concolic tracing instrumentation and does not require source code. It should be noted, however, that the 'quality' of the generated expressions can be significantly worse and SymQEMU generally produces significantly more and significantly more convoluted expressions than SymCC. Therefore, it is recommended to use SymCC over SymQEMU when possible.
Using SymCC
The target needs to be instrumented ahead of fuzzing using SymCC.
How exactly this is done does not matter.
However, the SymCC compiler needs to be made aware of the location of the runtime that it should instrument against.
This is done by setting the SYMCC_RUNTIME_DIR
environment variable to the directory which contains the runtime (typically the target/(debug|release)
folder of your runtime crate).
The example hybrid fuzzer instruments the target in its build.rs
build script.
It does this by cloning and building a copy of SymCC and then using this version to instrument the target.
The symcc_libafl
crate contains helper functions for cloning and building SymCC.
Make sure you satisfy the build requirements of SymCC before attempting to build it.
Using SymQEMU
Build SymQEMU according to its build instructions.
By default, SymQEMU looks for the runtime in a sibling directory.
Since we don't have a runtime there, we need to let it know the path to your runtime by setting --symcc-build
argument of the configure
script to the path of your runtime.
Building the Fuzzer
No matter the instrumentation method, the interface between the fuzzer and the instrumented target should now be consistent.
The only difference between using SymCC and SymQEMU should be the binary that represents the target:
In the case of SymCC it will be the binary that was build with instrumentation and with SymQEMU it will be the emulator binary (eg. x86_64-linux-user/symqemu-x86_64
), followed by your uninstrumented target binary and arguments.
You can use the CommandExecutor
to execute your target (example).
When configuring the command, make sure you pass the SYMCC_INPUT_FILE
environment variable the input file path, if your target reads input from a file (instead of standard input).
Serialization and Solving
While it is perfectly possible to build a custom runtime that also performs the solving step of hybrid fuzzing in the context of the target process, the intended use of the LibAFL concolic tracing support is to serialize the (filtered and pre-processed) branch conditions using the TracingRuntime
.
This serialized representation can be deserialized in the fuzzer process for solving using a ConcolicObserver
wrapped in a ConcolicTracingStage
, which will attach a ConcolicMetadata
to every TestCase
.
The ConcolicMetadata
can be used to replay the concolic trace and solved using an SMT-Solver.
Most use-cases involving concolic tracing, however, will need to define some policy around which branches they want to solve.
The SimpleConcolicMutationalStage
can be used for testing purposes.
It will attempt to solve all branches, like the original simple backend from SymCC, using Z3.
Example
The example fuzzer shows how to use the ConcolicTracingStage
together with the SimpleConcolicMutationalStage
to build a basic hybrid fuzzer.