This lists the major changes in angr. Tracking minor changes are left as an exercise for the reader :-)
For the last month, we have been working on a major refactor of the angr to change the way that angr reasons about the code that it analyzes. Until now, angr has been bound to the VEX intermediate representation to lift native code, supporting a wide range of architectures but not being very expandable past them. This release represents the ground work for what we call translation and execution engines. These engines are independent backends, pluggable into the angr framework, that will allow angr to reason about a wide range of targets. For now, we have restructured the existing VEX and Unicorn Engine support into this engine paradigm, but as we discuss in our blog post, the plan is to create engines to enable angr's reasoning of Java bytecode and source code, and to augment angr's environment support through the use of external dynamic sandboxes.
For now, these changes are mostly internal. We have attempted to maintain compatibility for end-users, but those building systems atop angr will have to adapt to the modern codebase. The following are the major changes:
- simuvex: we have introduced SimEngine. SimEngine is a base class for abstractions over native code. For example, angr's VEX-specific functionality is now concentrated in SimEngineVEX, and new engines (such as SimEngineLLVM) can be implemented (even outside of simuvex itself) to support the analysis of new types of code.
- simuvex: as part of the engines refactor, the SimRun class has been eliminated. Instead of different subclasses of SimRun that would be instantiated from an input state, engines each have a
processfunction that, from an input state, produces a SimSuccessors instance containing lists of different successor states (normal, unsat, unconstrained, etc) and any engine-specific artifacts (such as the VEX statements. Take a look at
state.mem[x:] = ynow requires a type for storage (for example
state.mem[x:].dword = y).
- simuvex: the way of calling inline SimProcedures has been changed. Now you have to create a SimProcedure, and then call
execute()on it and pass in a program state as well as the arguments.
- simuvex: accessing registers through
state.regs.eax) always triggers SimInspect breakpoints and creates new actions. Now you can access a register by prefixing its name with an underscore (e.g.
state._ip) to avoid triggering breakpoints or creating actions.
- angr: the way hooks work has slightly changed, though is backwards-compatible. The new angr.Hook class acts as a wrapper for hooks (SimProcedures and functions), keeping things cleaner in the
- angr: we have deprecated the keyword argument
max_sizeand changed it to to
angr.Blockconstructor (i.e., the argument to
project.factory.blockand more upstream methods (
- angr: we have deprecated
project.factory.sim_runand changed it to to
project.factory.successors, and it now generates a
project.factory.sim_blockhas been deprecated and replaced with
- angr: angr syscalls are no longer hooks. Instead, the syscall table is now in
project._simos.syscall_table. This will be made "public" after a usability refactor. If you were using
project.is_hooked(addr)to see if an address has a related SimProcedure, now you probably want to check if there is a related syscall as well (using
project._simos.syscall_table.get_by_addr(addr) is not None).
- pyvex: to support custom lifters to VEX, pyvex has introduced the concept of backend lifters. Lifters can be written in pure python to produce VEX IR, allowing for extendability of angr's VEX-based analyses to other hardware architectures.
As usual, there are many other improvements and minor bugfixes.
- claripy: support
unsat_core()to get the core of unsatness of constraints. It is in fact a thin wrapper of the
unsat_core()function provided by Z3. Also a new state option
CONSTRAINT_TRACKING_IN_SOLVERis added to SimuVEX. That state option must be enabled if you want to use
unsat_core()on any state.
SimMemory.store()now takes a new parameter
disable_actions. Setting it to True will prevent any SimAction creation.
- angr: CFGFast has a better support for ARM binaries, especially for code in THUMB mode.
- angr: thanks to an improvement in SimuVEX, CFGAccurate now uses slightly less memory than before.
addr_traceis made much faster.
- angr: Fix a crash during CFG generation or symbolic execution on platforms/architectures with no syscall defined.
- angr: as part of the refactor,
BackwardSlicingis temporarily disabled. It will be re-enabled once all DDG-related refactor are merged to master.
Additionally, packaging and build-system improvements coordinated between the angr and Unicorn Engine projects have allowed angr's Unicorn support to be built on Windows. Because of this,
unicorn is now a dependency for
Looking forward, angr is poised to become a program analysis engine for binaries and more!
It has been over a month since the last release 22.214.171.124. Again, we’ve made some significant changes and improvements on the code base.
- angr: Labels are now stored in KnowledgeBase.
- angr: Add a new analysis:
Disassembly. The new Disassembly analysis provides an easy-to-use interface to render assembly of functions.
- angr: Fix the issue that
ForwardAnalysismay prematurely terminate while there are still un-processed jobs.
- angr: Many small improvements and bug fixes on
- angr: Many small improvements and bug fixes on
VFG. Bring back widening support. Fix the issue that
VFGmay not terminate under certain cases. Implement a new graph traversal algorithm to have an optimal traversal order. Allow state merging at non-merge-points, which allows faster convergence.
- angr-management: Display a progress during initial CFG recovery.
- angr-management: Display a “Load binary” window upon binary loading. Some analysis options can be adjusted there.
- angr-management: Disassembly view: Edge routing on the graph is improved.
- angr-management: Disassembly view: Support starting a new symbolic execution task from an arbitrary address in the program.
- angr-management: Disassembly view: Support renaming of function names and labels.
- angr-management: Disassembly view: Support “Jump to address”.
- angr-management: Disassembly view: Display resolved and unresolved jump targets. All jump targets are double-clickable.
- SimuVEX: Move region mapping from
SimMemory. This will allow an easier conversion between
SimSymbolicMemory, which is to say, conversion between symbolic states and static states is now possible.
- SimuVEX & claripy: Provide support for
unsat_corein Z3. It returns a set of constraints that led to unsatness of the constraint set on the current state.
- archinfo: Add a new Boolean variable
branch_delay_slotfor each architecture. It is set to True on MIPS32.
Major point release! An incredible number of things have changed in the month run-up to the Cyber Grand Challenge.
- Integration with Unicorn Engine supported for concrete execution. A new SimRun type, SimUnicorn, may step through many basic blocks at once, so long as there is no operation on symbolic data. Please use our fork of unicorn engine, which has many patches applied. All these patches are pending merge into upstream.
- Lots of improvements and bug fixes to CFGFast. Rumors are angr’s CFG was only "optimized" for x86-64 binaries (which is really because most of our test cases are compiled as 64-bit ELFs). Now it is also “optimized” for x86 binaries :) (editor's note: angr is built with cross-architecture analysis in mind. CFG construction is pretty much the only component which has architecture-specific behavior.)
- Lots of improvements to the VFG analysis, including speed and accuracy. However, there is still a lot to be done.
- Lots of speed optimizations in general - CFGFast should be 3-6x faster under CPython with much less memory usage.
- Now data dependence graph gives you a real dependence graph between variable definitions. Try
simplified_data_graphon a DDG object!
- New state option
simuvex.o.STRICT_PAGE_ACCESSwill cause a
SimSegfaultErrorto be raised whenever the guest reads/writes/executes memory that is either unmapped or doesn't have the appropriate permissions.
- Merging of paths (as opposed to states) is performed in a much smarter way.
- The behavior of the
support_selfmodifying_codeproject option is changed: Before, this would allow the state to be used as a fallback source of instruction bytes when no backer from CLE is available. Now, this option makes instruction lifting use the state as the source of bytes always. When the option is disabled and execution jumps outside the normal binary, the state will be used automatically.
- Actually support self-modifying code - if a basic block of code modifies itself, the block will be re-lifted before the next instruction starts.
- Syscalls are handled differently now - Before you would see a SimRun for a syscall helper, now you'll just see a SimProcedure for the given syscall. Additionally, each syscall has its own address in a "syscalls segment", and syscalls are treated as jumps to this segment. This simplifies a lot of things analysis-wise.
- CFGAccurate accepts a
base_graphkeyword to its constructor, e.g.
CFGFast().graph, or even
.graphof a function, to use as a base for analysis.
- New fast memory model for cases where symbolic-addressed reads and writes are unlikely.
- Conflicts between the
avoidparameters to the Explorer otiegnqwvk are resolved correctly. (credit clslgrnc)
- New analysis
StaticHookerwhich hooks library functions in unstripped statically linked binaries.
Liftercan be used without creating an angr Project. You must manually specify the architecture and bytestring in calls to
.fresh_block(). If you like, you can also specify the architecture as a parameter to the constructor and omit it from the lifting calls.
- Add two new analyses developed for the CGC (mostly as examples of doing static analysis with angr): Reassembler and BinaryOptimizer.
In general, there have been enormous amounts of speed improvements in this release. Depending on the workload, angr should run about twice as fast. Aside from this, there have also been many submodule-specific changes:
Quite a few changes and improvements are made to
CFGAccurate in order to have better and faster CFG recovery.
The two biggest changes in
CFGFast are jump table resolution and data references collection, respectively.
CFGFast resolves indirect jumps by default.
You may get a list of indirect jumps recovered in
CFGFast by accessing the
For many cases, it resolves the jump table accurately.
Data references collection is still in alpha mode.
To test data references collection, just pass
collect_data_references=True when creating a fast CFG, and access the
memory_data attribute after the CFG is constructed.
CFG recovery on ARM binaries is also improved.
A new paradigm called an "otiegnqwvk", or an "exploration technique", allows the packaging of special logic related to path group stepping.
Reads/writes to the x87 fpu registers now work correctly - there is special logic that rotates a pointer into part of the register file to simulate the x87 stack.
With the recent changes to Claripy, we have configured SimuVEX to use the composite solver by default. This should be transparent, but should be considered if strange issues (or differences in behavior) arise during symbolic execution.
Fixed a bug in claripy where
__div__ was not always doing unsigned division, and added new methods
SMod for signed division and signed remainder, respectively.
Claripy frontends have been completely rewritten into a mixin-centric solver design. Basic frontend functionality (i.e., calling into the solver or dealing with backends) is handled by frontends (in
claripy.frontends), and additional functionality (such as caching, deciding when to simplify, etc) is handled by frontend mixins (in
claripy.frontend_mixins). This makes it considerably easier to customize solvers to your specific needE. For examples, look at
Alongside the solver rewrite, the composite solver (which splits constraints into independent constraint sets for faster solving) has been immensely improved and is now functional and fast.
Syscalls are no longer handled by
Instead, syscalls are now handled by
Previously, the address of a syscall SimProcedure is the address right after the syscall instruction (e.g.
int 80h), which collides with the real basic block starting at that address, and is very confusing.
Now each syscall SimProcedure has its own address, just as a normal SimProcedure.
To support this, there is another region mapped for the syscall addresses,
Some refactoring and bug fixes in
Claripy has been given the ability to handle annotations on ASTs.
An annotation can be used to customize the behavior of some backends without impacting others.
For more information, check the docstrings of
New state constructor -
call_state. Comes with a refactor to
SimCC, a refactor to
callable, and the removal of
All these changes are thoroughly documented, in
SimType to make it easier to use types - they can be instanciated without a SimState and one can be added later.
Comes with some usability improvements to SimMemView.
Also, there's a better wrapper around PyCParser for generating SimType instances from c declarations and definitions.
Again, thoroughly documented, still in the structured data doc.
CFG is now an alias to
CFGFast instead of
CFGFast should work under most cases, and it's way faster than
We believe such a change is necessary, and will make angr more approachable to new users.
You will have to change your code from
CFGAccurate if you are relying on specific functionalities that only exist in
CFGAccurate, for example, context-sensitivity and state-preserving.
An exception will be raised by angr if any parameter passed to
CFG is only supported by
For more detailed explanation, please take a look at the documentation of
PyVEX has a structural overhaul. The
IRConst modules no longer exist as submodules, and those module names are deprecated.
pyvex.const if you need to access the members of those modules.
The names of the first three parameters to
pyvex.IRSB (the required ones) have been changed.
If you were passing the positional args to IRSB as keyword args, consider switching to positional args.
The order is
The optional parameter
sargc to the
full_init_state constructors has been removed and replaced with an
sargc predates being able to have claripy ASTs independent from a solver.
The new system is to pass in the exact value, ast or integer, that you'd like to have as the guest program's arg count.
CLE and angr can now accept file-like streams, that is, objects that support
stream.seek() can be passed in wherever a filepath is expected.
Documentation is much more complete, especially for PyVEX and angr's symbolic execution control components.
There have been several improvements to claripy that should be transparent to users:
- There's been a refactoring of the VSA StridedInterval classes to fix cases where operations were not sound. Precision might suffer as a result, however.
- Some general speed improvements.
- We've introduced a new backend into claripy: the ReplacementBackend. This frontend generates replacement sets from constraints added to it, and uses these replacement sets to increase the precision of VSA. Additionally, we have introduced the HybridBackend, which combines this functionality with a constraint solver, allowing for memory index resolution using VSA.
angr itself has undergone some improvements, with API changes as a result:
- We are moving toward a new way to store information that angr has recovered about a program: the knowledge base. When an analysis recovers some truth about a program (i.e., "there's a basic block at 0x400400", or "the block at 0x400400 has a jump to 0x400500"), it gets stored in a knowledge-base. Analysis that used to store data (currently, the CFG) now store them in a knowledge base and can share the global knowledge base of the project, now accessible via
project.kb. Over time, this knowledge base will be expanded in the course of any analysis or symbolic execution, so angr is constantly learning more information about the program it is analyzing.
- A forward data-flow analysis framework (called ForwardAnalysis) has been introduced, and the CFG was rewritten on top of it. The framework is still in alpha stage - expect more changes to be made. Documentation and more details will arrive shortly. The goal is to refactor other data-flow analysis, like CFGFast, VFG, DDG, etc. to use ForwardAnalysis.
- We refactored the CFG to a) improve code readability, and b) eliminate some bad designs that linger due to historical reasons.
Claripy has a new manager for backends, allowing external backends (i.e., those implemented by other modules) to be used.
The result is that
claripy.backend_concrete is now
claripy.backend_vsa is now
claripy.backends.vsa, and so on.
Improved the ability to recover from failures in instruction decoding.
You can now hook specific addresses at which VEX fails to decode with
project.hook, even if those addresses are not the beginning of a basic block.
This is a pretty beefy release, with over half of claripy having been rewritten and major changes to other analyses. Internally, Claripy has been unified -- the VSA mode and symbolic mode now work on the same structures instead of requiring structures to be created differently. This opens the door for awesome capabilities in the future, but could also result in unexpected behavior if we failed to account for something.
Claripy has had some major interface changes:
- claripy.BV has been renamed to claripy.BVS (bit-vector symbol). It can now create bitvectors out of strings (i.e., claripy.BVS(0x41, 8) and claripy.BVS("A") are identical).
- state.BV and state.BVV are deprecated. Please use state.se.BVS and state.se.BVV.
- BV.model is deprecated. If you're using it, you're doing something wrong, anyways. If you really need a specific model, convert it with the appropriate backend (i.e., claripy.backend_concrete.convert(bv)).
There have also been some changes to analyses:
- Interface: CFG argument
keep_input_statehas been renamed to
keep_state. With this option enabled, both input and final states are kept.
- Interface: Two arguments
BackwardSlicinghave been deprecated. Instead,
BackwardSlicingtakes a single argument,
targets. This means that we now support slicing from multiple sources.
- Performance: The speed of CFG recovery has been slightly improved. There is a noticeable speed improvement on MIPS binaries.
- Several bugs have been fixed in DDG, and some sanity checks were added to make it more usable.
And some general changes to angr itself:
- StringSpec is deprecated! You can now pass claripy bitvectors directly as arguments.