SimStateobjects) in the barest possible way in order to demonstrate basic concepts about angr's operation. Here, you'll learn about the structure of a state object and how to interact with it in a variety of useful ways.
state.regsprovides read and write access to the registers through attributes with the names of each register, and
state.memprovides typed read and write access to memory with index-access notation to specify the address followed by an attribute access to specify the type you would like to interpret the memory as.
state.step(). This method will perform one step of symbolic execution and return an object called
SimSuccessors. Unlike normal emulation, symbolic execution can produce several successor states that can be classified in a number of ways. For now, what we care about is the
.successorsproperty of this object, which is a list containing all the "normal" successors of a given step.
if (x > 4)is reached, what happens if x is a symbolic bitvector? Somewhere in the depths of angr, the comparison
x > 4is going to get performed, and the result is going to be
<Bool x_32_1 > 4>.
x > 4as a constraint, and in the second state, we add
!(x > 4)as a constraint. That way, whenever we perform a constraint solve using either of these successor states, the conditions on the state ensure that any solutions we get are valid inputs that will cause execution to follow the same path that the given state has followed.
strcmp, which is a tricky function to emulate symbolically, and the resulting constraints are very complicated.
state.posix.stdin.load(0, state.posix.stdin.size)to retrieve a bitvector representing all the content read from stdin so far.
state1path, you must have given as a password the backdoor string "SOSNEAKY". In order to go down the
state2path, you must have given something besides "SOSNEAKY". z3 has helpfully provided one of the billions of strings fitting this criteria.
project.factory.entry_state(). This is just one of several state constructors available on the project factory:
.blank_state()constructs a "blank slate" blank state, with most of its data left uninitialized. When accessing uninitialized data, an unconstrained symbolic value will be returned.
.entry_state()constructs a state ready to execute at the main binary's entry point.
.full_init_state()constructs a state that is ready to execute through any initializers that need to be run before the main binary's entry point, for example, shared library constructors or preinitializers. When it is finished with these it will jump to the entry point.
.call_state()constructs a state ready to execute a given function.
addrargument to specify the exact address to start.
argsand a dictionary of environment variables through
full_init_state. The values in these structures can be strings or bitvectors, and will be serialized into the state as the arguments and environment to the simulated execution. The default
argsis an empty list, so if the program you're analyzing expects to find at least an
argv, you should always provide that!
argcbe symbolic, you can pass a symbolic bitvector as
full_init_stateconstructors. Be careful, though: if you do this, you should also add a constraint to the resulting state that your value for argc cannot be larger than the number of args you passed into
.call_state(addr, arg1, arg2, ...), where
addris the address of the function you want to call and
argNis the Nth argument to that function, either as a Python integer, string, or array, or a bitvector. If you want to have memory allocated and actually pass in a pointer to an object, you should wrap it in an PointerWrapper, i.e.
angr.PointerWrapper("point to me!"). The results of this API can be a little unpredictable, but we're working on it.
call_state, you can pass a
SimCCinstance as the
ccargument. We try to pick a sane default, but for special cases you will need to help angr out.
state.meminterface is convenient for loading typed data from memory, but when you want to do raw loads and stores to and from ranges of memory, it's very cumbersome. It turns out that
state.memis actually just a bunch of logic to correctly access the underlying memory storage, which is just a flat address space filled with bitvector data:
state.memory. You can use
state.memorydirectly with the
state.memoryis to load an store swaths of data with no attached semantics. However, if you want to perform a byteswap on the loaded or stored data, you can pass a keyword argument
endness- if you specify little-endian, byteswap will happen. The endness should be one of the members of the
Endnessenum in the
archinfopackage used to hold declarative data about CPU architectures for angr. Additionally, the endness of the program being analyzed can be found as
arch.memory_endness- for instance
state.registers, that uses the exact same API as
state.memory, but explaining its behavior involves a dive into the abstractions that angr uses to seamlessly work with multiple architectures. The short version is that it is simply a register file, with the mapping between registers and offsets defined in archinfo.
state.options) of all its enabled options. Each option (really just a string) controls the behavior of angr's execution engine in some minute way. A listing of the full domain of options, along with the defaults for different state types, can be found in the appendix. You can access an individual option for adding to a state through
angr.options. The individual options are named with CAPITAL_LETTERS, but there are also common groupings of objects that you might want to use bundled together, named with lowercase_letters.
remove_options, which should be sets of options that modify the initial options set from the default.
solver, etc. This design allows for code modularity as well as the ability to easily implement new kinds of data storage for other aspects of an emulated state, or the ability to provide alternate implementations of plugins.
memoryplugin simulates a flat memory space, but analyses can choose to enable the "abstract memory" plugin, which uses alternate data types for addresses to simulate free-floating memory mappings independent of address, to provide
state.memory. Conversely, plugins can reduce code complexity:
state.registersare actually two different instances of the same plugin, since the registers are emulated with an address space as well.
state.globalsis an extremely simple plugin: it implements the interface of a standard Python dict, allowing you to store arbitrary data on a state.
state.historyis a very important plugin storing historical data about the path a state has taken during execution. It is actually a linked list of several history nodes, each one representing a single round of execution---you can traverse this list with
history.recent_NAMEand the iterator over them is just
history.NAME. For example,
for addr in state.history.bbl_addrs: print hex(addr)will print out a basic block address trace for the binary, while
state.history.recent_bbl_addrsis the list of basic blocks executed in the most recent step,
state.history.parent.recent_bbl_addrsis the list of basic blocks executed in the previous step, etc. If you ever need to quickly obtain a flat list of these values, you can access
state.history.bbl_addrs.hardcopy. Keep in mind though, index-based accessing is implemented on the iterators.
history.descriptionsis a listing of string descriptions of each of the rounds of execution performed on the state.
history.bbl_addrsis a listing of the basic block addresses executed by the state. There may be more than one per round of execution, and not all addresses may correspond to binary code - some may be addresses at which SimProcedures are hooked.
history.jumpkindsis a listing of the disposition of each of the control flow transitions in the state's history, as VEX enum strings.
history.jump_guardsis a listing of the conditions guarding each of the branches that the state has encountered.
history.eventsis a semantic listing of "interesting events" which happened during execution, such as the presence of a symbolic jump condition, the program popping up a message box, or execution terminating with an exit code.
history.actionsis usually empty, but if you add the
angr.options.refsoptions to the state, it will be populated with a log of all the memory, register, and temporary value accesses performed by the program.
state.callstackto get the callstack frames for each of the active frames, in order from most recent to oldest. If you just want the topmost frame, this is
callstack.func_addris the address of the function currently being executed
callstack.call_site_addris the address of the basic block which called the current function
callstack.stack_ptris the value of the stack pointer from the beginning of the current function
callstack.ret_addris the location that the current function will return to if it returns