SimuVEX and Bare-Bones Symbolic Execution
Most analyses require an understanding of what the code is doing (semantic meaning), not just what the code is (syntactic meaning). For this, we developed a module called SimuVEX (https://github.com/angr/simuvex). SimuVEX provides a semantic understanding of what a given piece of VEX code does on a given machine state.
In a nutshell, SimuVEX is a symbolic VEX emulator. Given a machine state and a VEX IR block, SimuVEX provides a resulting machine state (or, in the case of condition jumps, several resulting machine states).
SimuVEX uses a series of engines to emulate the effects that of a given section of code has on an input state.
This mechanism has changed recently, so we have removed much related documentation pending a rewrite.
This information is not critical to the use of angr, since it is abstracted away by
PathGroup, but it provides useful insight into angr's functionality.
TODO: much things
SimEngine.process takes an input state and engine-specific arguments (such as a block of VEX IR for
SimEngineVEX) and returns a SimSuccessors object that contains the successor states, with modifications applied.
Since simuvex supports symbolic execution, there can be multiple output successor states for a single input state.
The successor states are stored in individual lists.
|Attribute||Guard Condition||Instruction Pointer||Description|
||True (can be symbolic, but constrained to True)||Can be symbolic (but 256 solutions or less; see
||A normal, satisfiable successor state to the state processed by the engine. The instruction pointer of this state may be symbolic (i.e., a computed jump based on user input), so the state might actually represent several potential continuations of execution going forward.|
||False (can be symbolic, but constrained to False).||Can be symbolic.||Unsatisfiable successors. These are successors whose guard conditions can only be false (i.e., jumps that cannot be taken, or the default branch of jumps that must be taken).|
||True (can be symbolic, but constrained to True).||Concrete value.||As noted above, states in the
||True (can be symbolic, but constrained to True).||Symbolic (with more than 256 solutions).||During the flattening procedure described above, if it turns out that there are more than 256 possible solutions for the instruction pointer, we assume that the instruction pointer has been overwritten with unconstrained data (i.e., a stack overflow with user data). This assumption is not sound in general. Such states are placed in
||Anything||Can be symbolic.||This is
SimProcedures are, first and foremost, symbolic function summaries: angr handles functions imported into the binary by executing a SimProcedure that symbolically implements the given library function, if one exists. SimProcedures are a generic enough interface to do more than this, though - they can be used to run Python code to mutate a state at any point in execution.
SimProcedures are injected into angr's execution pipeline through an interface called hooking. The full interface is described here, but the most important part is the
Project.hook(address, procedure) method. After running this, whenever execution in this project reaches
address, instead of running the binary code at that address, we run the SimProcedure specified by the
Project.hook can also take a plain python function as an argument, instead of a SimProcedure class. That function will be automatically wrapped by a SimProcedure and executed (with the current SimState) as its argument.
TODO: Programming SimProcedures. Cover all the kinds of control flow, inline calls, etc. If you want to program a SimProcedure now, look at the library of already-written ones.
Like any decent execution engine, SimuVEX supports breakpoints. This is pretty cool! A point is set as follows:
import angr, simuvex b = angr.Project('examples/fauxware/fauxware') # get our state s = b.factory.entry_state() # add a breakpoint. This breakpoint will drop into ipdb right before a memory write happens. s.inspect.b('mem_write') # on the other hand, we can have a breakpoint trigger right *after* a memory write happens. # we can also have a callback function run instead of opening ipdb. def debug_func(state): print "State %s is about to do a memory write!" s.inspect.b('mem_write', when=simuvex.BP_AFTER, action=debug_func) # or, you can have it drop you in an embedded IPython! s.inspect.b('mem_write', when=simuvex.BP_AFTER, action='IPython')
There are many other places to break than a memory write. Here is the list. You can break at BP_BEFORE or BP_AFTER for each of these events.
|Event type||Event meaning|
|mem_read||Memory is being read.|
|mem_write||Memory is being written.|
|reg_read||A register is being read.|
|reg_write||A register is being written.|
|tmp_read||A temp is being read.|
|tmp_write||A temp is being written.|
|expr||An expression is being created (i.e., a result of an arithmetic operation or a constant in the IR).|
|statement||An IR statement is being translated.|
|instruction||A new (native) instruction is being translated.|
|irsb||A new basic block is being translated.|
|constraints||New constraints are being added to the state.|
|exit||A successor is being generated from execution.|
|symbolic_variable||A new symbolic variable is being created.|
|call||A call instruction is hit.|
|address_concretization||A symbolic memory access is being resolved.|
These events expose different attributes:
|Event type||Attribute name||Attribute availability||Attribute meaning|
|mem_read||mem_read_address||BP_BEFORE or BP_AFTER||The address at which memory is being read.|
|mem_read||mem_read_length||BP_BEFORE or BP_AFTER||The length of the memory read.|
|mem_read||mem_read_expr||BP_AFTER||The expression at that address.|
|mem_write||mem_write_address||BP_BEFORE or BP_AFTER||The address at which memory is being written.|
|mem_write||mem_write_length||BP_BEFORE or BP_AFTER||The length of the memory write.|
|mem_write||mem_write_expr||BP_BEFORE or BP_AFTER||The expression that is being written.|
|reg_read||reg_read_offset||BP_BEFORE or BP_AFTER||The offset of the register being read.|
|reg_read||reg_read_length||BP_BEFORE or BP_AFTER||The length of the register read.|
|reg_read||reg_read_expr||BP_AFTER||The expression in the register.|
|reg_write||reg_write_offset||BP_BEFORE or BP_AFTER||The offset of the register being written.|
|reg_write||reg_write_length||BP_BEFORE or BP_AFTER||The length of the register write.|
|reg_write||reg_write_expr||BP_BEFORE or BP_AFTER||The expression that is being written.|
|tmp_read||tmp_read_num||BP_BEFORE or BP_AFTER||The number of the temp being read.|
|tmp_read||tmp_read_expr||BP_AFTER||The expression of the temp.|
|tmp_write||tmp_write_num||BP_BEFORE or BP_AFTER||The number of the temp written.|
|tmp_write||tmp_write_expr||BP_AFTER||The expression written to the temp.|
|expr||expr||BP_AFTER||The value of the expression.|
|statement||statement||BP_BEFORE or BP_AFTER||The index of the IR statement (in the IR basic block).|
|instruction||instruction||BP_BEFORE or BP_AFTER||The address of the native instruction.|
|irsb||address||BP_BEFORE or BP_AFTER||The address of the basic block.|
|constraints||added_constraints||BP_BEFORE or BP_AFTER||The list of constraint expressions being added.|
|call||function_name||BP_BEFORE or BP_AFTER||The name of the function being called.|
|exit||exit_target||BP_BEFORE or BP_AFTER||The expression representing the target of a SimExit.|
|exit||exit_guard||BP_BEFORE or BP_AFTER||The expression representing the guard of a SimExit.|
|exit||jumpkind||BP_BEFORE or BP_AFTER||The expression representing the kind of SimExit.|
|symbolic_variable||symbolic_name||BP_BEFORE or BP_AFTER||The name of the symbolic variable being created. The solver engine might modify this name (by appending a unique ID and length). Check the symbolic_expr for the final symbolic expression.|
|symbolic_variable||symbolic_size||BP_BEFORE or BP_AFTER||The size of the symbolic variable being created.|
|symbolic_variable||symbolic_expr||BP_AFTER||The expression representing the new symbolic variable.|
|address_concretization||address_concretization_strategy||BP_BEFORE or BP_AFTER||The SimConcretizationStrategy being used to resolve the address. This can be modified by the breakpoint handler to change the strategy that will be applied. If your breakpoint handler sets this to None, this strategy will be skipped.|
|address_concretization||address_concretization_action||BP_BEFORE or BP_AFTER||The SimAction object being used to record the memory action.|
|address_concretization||address_concretization_memory||BP_BEFORE or BP_AFTER||The SimMemory object on which the action was taken.|
|address_concretization||address_concretization_expr||BP_BEFORE or BP_AFTER||The AST representing the memory index being resolved. The breakpoint handler can modify this to affect the address being resolved.|
|address_concretization||address_concretization_add_constraints||BP_BEFORE or BP_AFTER||Whether or not constraints should/will be added for this read.|
|address_concretization||address_concretization_result||BP_AFTER||The list of resolved memory addresses (integers). The breakpoint handler can overwrite these to effect a different resolution result.|
These attributes can be accessed as members of
state.inspect during the appropriate breakpoint callback to access the appropriate values.
You can even modify these value to modify further uses of the values!
def track_reads(state): print 'Read', state.inspect.mem_read_expr, 'from', state.inspect.mem_read_address ... s.inspect.b('mem_read', when=simuvex.BP_AFTER, action=track_reads)
Additionally, each of these properties can be used as a keyword argument to
inspect.b to make the breakpoint conditional:
# This will break before a memory write if 0x1000 is a possible value of its target expression s.inspect.b('mem_write', mem_write_address=0x1000) # This will break before a memory write if 0x1000 is the *only* value of its target expression s.inspect.b('mem_write', mem_write_address=0x1000, mem_write_address_unique=True) # This will break after instruction 0x8000, but only 0x1000 is a possible value of the last expression that was read from memory s.inspect.b('instruction', when=simuvex.BP_AFTER, instruction=0x8000, mem_read_expr=0x1000)
Cool stuff! In fact, we can even specify a function as a condition:
# this is a complex condition that could do anything! In this case, it makes sure that RAX is 0x41414141 and # that the basic block starting at 0x8004 was executed sometime in this path's history def cond(state): return state.any_str(state.regs.rax) == 'AAAA' and 0x8004 in state.inspect.backtrace s.inspect.b('mem_write', condition=cond)
That is some cool stuff!
Symbolic memory indexing
SimuVEX supports symbolic memory addressing, meaning that offsets into memory may be symbolic. Our implementation of this is inspired by "Mayhem". Specifically, this means that angr concretizes symbolic addresses when they are used as the target of a write. This causes some surprises, as users tend to expect symbolic writes to be treated purely symbolically, or "as symbolically" as we treat symbolic reads, but that is not the default behavior. However, like most things in angr, this is configurable.
The address resolution behavior is governed by concretization strategies, which are subclasses of
Concretization strategies for reads are set in
state.memory.read_strategies and for writes in
These strategies are called, in order, until one of them is able to resolve addresses for the symbolic index.
By setting your own concretization strategies (or through the use of SimInspect
address_concretization breakpoints, described above), you can change the way SimuVEX resolves symbolic addresses.
For example, angr's default concretization strategies for writes are:
- A conditional concretization strategy that allows symbolic writes (with a maximum range of 128 possible solutions) for any indices that are annotated with
- A concretization strategy that simply selects the maximum possible solution of the symbolic index.
To enable symbolic writes for all indices, you can either add the
SYMBOLIC_WRITE_ADDRESSES state option at state creation time or manually insert a
simuvex.concretization_strategies.SimConcretizationStrategyRange object into
The strategy object takes a single argument, which is the maximum range of possible solutions that it allows before giving up and moving on to the next (presumably non-symbolic) strategy.
SimuVEX is extremely customizable through the use of state options, a set of constants stored in
These options are documented in the source code.