Gotchas when using angr
This section contains a list of gotchas that users/victims of angr frequently run into.
To make symbolic execution more tractable, angr replaces common library functions with summaries written in Python.
We call these summaries SimProcedures.
SimProcedures allow us to mitigate path explosion that would otherwise be introduced by, for example,
strlen running on a symbolic string.
Unfortunately, our SimProcedures are far from perfect. If angr is displaying unexpected behavior, it might be caused by a buggy/incomplete SimProcedure. There are several things that you can do:
- Disable the SimProcedure (you can exclude specific SimProcedures by passing options to the angr.Project class). This has the drawback of likely leading to a path explosion, unless you are very careful about constraining the input to the function in question. The path explosion can be partially mitigated with other angr capabilities (such as Veritesting).
- Replace the SimProcedure with something written directly to the situation in question. For example, our
scanfimplementation is not complete, but if you just need to support a single, known format string, you can write a hook to do exactly that.
- Fix the SimProcedure.
System calls are also implemented as SimProcedures. Unfortunately, there are system calls that we have not yet implemented in angr. There are several workarounds for an unsupported system call:
- Implement the system call. TODO: document this process
- Hook the callsite of the system call (using
project.hook) to make the required modifications to the state in an ad-hoc way.
- Use the
state.posix.queued_syscall_returnslist to queue syscall return values. If a return value is queued, the system call will not be executed, and the value will be used instead. Furthermore, a function can be queued instead as the "return value", which will result in that function being applied to the state when the system call is triggered.
Symbolic memory model
The default memory model used by angr is inspired by Mayhem.
This memory model supports limited symbolic reads and writes.
If the memory index of a read is symbolic and the range of possible values of this index is too wide, the index is concretized to a single value.
If the memory index of a write is symbolic at all, the index is concretized to a single value.
This is configurable by changing the memory concretization strategies of
SimProcedures, and especially system calls such as
write() might run into a situation where the length of a buffer is symbolic.
In general, this is handled very poorly: in many cases, this length will end up being concretized outright or retroactively concretized in later steps of execution.
Even in cases when it is not, the source or destination file might end up looking a bit "weird".