Symbolic Expressions and Constraint Solving

angr's power comes not from it being an emulator, but from being able to execute with what we call symbolic variables. Instead of saying that a variable has a concrete numerical value, we can say that it holds a symbol, effectively just a name. Then, performing arithmetic operations with that variable will yield a tree of operations (termed an abstract syntax tree or AST, from compiler theory). ASTs can be translated into constraints for an SMT solver, like z3, in order to ask questions like "given the output of this sequence of operations, what must the input have been?" Here, you'll learn how to use angr to answer this.

Working with Bitvectors

Let's get a dummy project and state so we can start playing with numbers.

>>> import angr, monkeyhex
>>> proj = angr.Project('/bin/true')
>>> state = proj.factory.entry_state()

A bitvector is just a sequence of bits, interpreted with the semantics of a bounded integer for arithmetic. Let's make a few.

# 64-bit bitvectors with concrete values 1 and 100
>>> one = state.solver.BVV(1, 64)
>>> one
 <BV64 0x1>
>>> one_hundred = state.solver.BVV(100, 64)
>>> one_hundred
 <BV64 0x64>

# create a 27-bit bitvector with concrete value 9
>>> weird_nine = state.solver.BVV(9, 27)
>>> weird_nine
<BV27 0x9>

As you can see, you can have any sequence of bits and call them a bitvector. You can do math with them too:

>>> one + one_hundred
<BV64 0x65>

# You can provide normal python integers and they will be coerced to the appropriate type:
>>> one_hundred + 0x100
<BV64 0x164>

# The semantics of normal wrapping arithmetic apply
>>> one_hundred - one*200
<BV64 0xffffffffffffff9c>

You cannot say one + weird_nine, though. It is a type error to perform an operation on bitvectors of differing lengths. You can, however, extend weird_nine so it has an appropriate number of bits:

>>> weird_nine.zero_extend(64 - 27)
<BV64 0x9>
>>> one + weird_nine.zero_extend(64 - 27)
<BV64 0xa>

zero_extend will pad the bitvector on the left with the given number of zero bits. You can also use sign_extend to pad with a duplicate of the highest bit, preserving the value of the bitvector under two's compliment signed integer semantics.

Now, let's introduce some symbols into the mix.

# Create a bitvector symbol named "x" of length 64 bits
>>> x = state.solver.BVS("x", 64)
>>> x
<BV64 x_9_64>
>>> y = state.solver.BVS("y", 64)
>>> y
<BV64 y_10_64>

x and y are now symbolic variables, which are kind of like the variables you learned to work with in 7th grade algebra. Notice that the name you provided has been been mangled by appending an incrementing counter and You can do as much arithmetic as you want with them, but you won't get a number back, you'll get an AST instead.

>>> x + one
<BV64 x_9_64 + 0x1>

>>> (x + one) / 2
<BV64 (x_9_64 + 0x1) / 0x2>

>>> x - y
<BV64 x_9_64 - y_10_64>

Technically x and y and even one are also ASTs - any bitvector is a tree of operations, even if that tree is only one layer deep. To understand this, let's learn how to process ASTs.

Each AST has a .op and a .args. The op is a string naming the operation being performed, and the args are the values the operation takes as input. Unless the op is BVV or BVS (or a few others...), the args are all other ASTs, the tree eventually terminating with BVVs or BVSs.

>>> tree = (x + 1) / (y + 2)
>>> tree
<BV64 (x_9_64 + 0x1) / (y_10_64 + 0x2)>
>>> tree.op
>>> tree.args
(<BV64 x_9_64 + 0x1>, <BV64 y_10_64 + 0x2>)
>>> tree.args[0].op
>>> tree.args[0].args
(<BV64 x_9_64>, <BV64 0x1>)
>>> tree.args[0].args[1].op
>>> tree.args[0].args[1].args
(1, 64)

From here on out, we will use the word "bitvector" to refer to any AST whose topmost operation produces a bitvector. There can be other data types represented through ASTs, including floating point numbers and, as we're about to see, booleans.

Symbolic Constraints

Performing comparison operations between any two similarly-typed ASTs will yield another AST - not a bitvector, but now a symbolic boolean.

>>> x == 1
<Bool x_9_64 == 0x1>
>>> x == one
<Bool x_9_64 == 0x1>
>>> x > 2
<Bool x_9_64 > 0x2>
>>> x + y == one_hundred + 5
<Bool (x_9_64 + y_10_64) == 0x69>
>>> one_hundred > 5
<Bool True>
>>> one_hundred > -5
<Bool False>

One tidbit you can see from this is that the comparisons are unsigned by default. The -5 in the last example is coerced to <BV64 0xfffffffffffffffb>, which is definitely not less than one hundred. If you want the comparison to be signed, you can say one_hundred.SGT(-5) (that's "signed greater-than"). A full list of operations can be found at the end of this chapter.

This snippet also illustrates an important point about working with angr - you should never directly use a comparison between variables in the condition for an if- or while-statement, since the answer might not have a concrete truth value. Even if there is a concrete truth value, if one > one_hundred will raise an exception. Instead, you should use solver.is_true and solver.is_false, which test for concrete truthyness/falsiness without performing a constraint solve.

>>> yes = one == 1
>>> no = one == 2
>>> maybe = x == y
>>> state.solver.is_true(yes)
>>> state.solver.is_false(yes)
>>> state.solver.is_true(no)
>>> state.solver.is_false(no)
>>> state.solver.is_true(maybe)
>>> state.solver.is_false(maybe)

Constraint Solving

You can use treat any symbolic boolean as an assertion about the valid values of a symbolic variable by adding it as a constraint to the state. You can then query for a valid value of a symbolic variable by asking for an evaluation of a symbolic expression.

An example will probably be more clear than an explanation here:

>>> state.solver.add(x > y)
>>> state.solver.add(y > 2)
>>> state.solver.add(10 > x)
>>> state.solver.eval(x)

By adding these constraints to the state, we've forced the constraint solver to consider them as assertions that must be satisfied about any values it returns. If you run this code, you might get a different value for x, but that value will definitely be greater than 3 (since y must be greater than 2 and x must be greater than y) and less than 10. Furthermore, if you then say state.solver.eval(y), you'll get a value of y which is consistent with the value of x that you got. If you don't add any constraints between two queries, the results will be consistent with each other.

From here, it's easy to see how to do the task we proposed at the beginning of the chapter - finding the input that produced a given output.

# get a fresh state without constraints
>>> state = proj.factory.entry_state()
>>> input = state.solver.BVS('input', 64)
>>> operation = (((input + 4) * 3) >> 1) + input
>>> output = 200
>>> state.solver.add(operation == output)
>>> state.solver.eval(input)

Note that, again, this solution only works because of the bitvector semantics. If we were operating over the domain of integers, there would be no solutions!

If we add conflicting or contradictory constraints, such that there are no values that can be assigned to the variables such that the constraints are satisfied, the state becomes unsatisfiable, or unsat, and queries against it will raise an exception. You can check the satisfiability of a state with state.satisfiable().

>>> state.solver.add(input < 2**32)
>>> state.satisfiable()

You can also evaluate more complex expressions, not just single variables.

# fresh state
>>> state = proj.factory.entry_state()
>>> state.solver.add(x - y >= 4)
>>> state.solver.add(y > 0)
>>> state.solver.eval(x)
>>> state.solver.eval(y)
>>> state.solver.eval(x + y)

From this we can see that eval is a general purpose method to convert any bitvector into a python primitive while respecting the integrity of the state. This is why we use eval to convert from concrete bitvectors to python ints, too!

Also note that the x and y variables can be used in this new state despite having been created using an old state. Variables are not tied to any one state, and can exist freely.

Floating point numbers

z3 has support for the theory of IEEE754 floating point numbers, and so angr can use them as well. The main difference is that instead of a width, a floating point number has a sort. You can create floating point symbols and values with FPV and FPS.

# fresh state
>>> state = proj.factory.entry_state()
>>> a = state.solver.FPV(3.2, state.solver.fp.FSORT_DOUBLE)
>>> a
<FP64 FPV(3.2, DOUBLE)>

>>> b = state.solver.FPS('b', state.solver.fp.FSORT_DOUBLE)
>>> b
<FP64 FPS('FP_b_0_64', DOUBLE)>

>>> a + b
<FP64 fpAdd('RNE', FPV(3.2, DOUBLE), FPS('FP_b_0_64', DOUBLE))>

>>> a + 4.4
<FP64 FPV(7.6000000000000005, DOUBLE)>

>>> b + 2 < 0
<Bool fpLT(fpAdd('RNE', FPS('FP_b_0_64', DOUBLE), FPV(2.0, DOUBLE)), FPV(0.0, DOUBLE))>

So there's a bit to unpack here - for starters the pretty-printing isn't as smart about floating point numbers. But past that, most operations actually have a third parameter, implicitly added when you use the binary operators - the rounding mode. The IEEE754 spec supports multiple rounding modes (round-to-nearest, round-to-zero, round-to-positive, etc), so z3 has to support them. If you want to specify the rounding mode for an operation, use the fp operation explicitly (solver.fpAdd for example) with a rounding mode (one of solver.fp.RM_*) as the first argument.

Constraints and solving work in the same way, but with eval returning a floating point number:

>>> state.solver.add(b + 2 < 0)
>>> state.solver.add(b + 2 > -1)
>>> state.solver.eval(b)

This is nice, but sometimes we need to be able to work directly with the representation of the float as a bitvector. You can interpret bitvectors as floats and vice versa, with the methods raw_to_bv and raw_to_fp:

>>> a.raw_to_bv()
<BV64 0x400999999999999a>
>>> b.raw_to_bv()
<BV64 fpToIEEEBV(FPS('FP_b_0_64', DOUBLE))>

>>> state.solver.BVV(0, 64).raw_to_fp()
<FP64 FPV(0.0, DOUBLE)>
>>> state.solver.BVS('x', 64).raw_to_fp()
<FP64 fpToFP(x_1_64, DOUBLE)>

These conversions preserve the bit-pattern, as if you casted a float pointer to an int pointer or vice versa. However, if you want to preserve the value as closely as possible, as if you casted a float to an int (or vice versa), you can use a different set of methods, val_to_fp and val_to_bv. These methods must take the size or sort of the target value as a parameter, due to the floating-point nature of floats.

>>> a
<FP64 FPV(3.2, DOUBLE)>
>>> a.val_to_bv(12)
<BV12 0x3>
>>> a.val_to_bv(12).val_to_fp(state.solver.fp.FSORT_FLOAT)
<FP32 FPV(3.0, FLOAT)>

These methods can also take a signed parameter, designating the signedness of the source or target bitvector.

More Solving Methods

eval will give you one possible solution to an expression, but what if you want several? What if you want to ensure that the solution is unique? The solver provides you with several methods for common solving patterns:

  • solver.eval(expression) will give you one possible solution to the given expression.
  • solver.eval_one(expression) will give you the solution to the given expression, or throw an error if more than one solution is possible.
  • solver.eval_upto(expression, n) will give you up to n solutions to the given expression, returning fewer than n if fewer than n are possible.
  • solver.eval_atleast(expression, n) will give you n solutions to the given expression, throwing an error if fewer than n are possible.
  • solver.eval_exact(expression, n) will give you n solutions to the given expression, throwing an error if fewer or more than are possible.
  • solver.min(expression) will give you the minimum possible solution to the given expression.
  • solver.max(expression) will give you the maximum possible solution to the given expression.

Additionally, all of these methods can take the following keyword arguments:

  • extra_constraints can be passed as a tuple of constraints. These constraints will be taken into account for this evaluation, but will not be added to the state.
  • cast_to can be passed a data type to cast the result to. Currently, this can only be str, which will cause the method to return the byte representation of the underlying data. For example, state.solver.eval(state.solver.BVV(0x41424344, 32), cast_to=str) will return "ABCD".


That was a lot!! After reading this, you should be able to create and manipulate bitvectors, booleans, and floating point values to form trees of operations, and then query the constraint solver attached to a state for possible solutions under a set of constraints. Hopefully by this point you understand the power of using ASTs to represent computations, and the power of a constraint solver.

In the appendix, you can find a reference for all the additional operations you can apply to ASTs, in case you ever need a quick table to look at.

results matching ""

    No results matching ""