angr.analyses

class angr.analyses.CDG

Bases: Analysis

Implements a control dependence graph.

__init__(cfg, start=None, no_construct=False)

Constructor.

Parameters:
  • cfg – The control flow graph upon which this control dependence graph will build

  • start – The starting point to begin constructing the control dependence graph

  • no_construct – Skip the construction step. Only used in unit-testing.

property graph
get_post_dominators()

Return the post-dom tree

get_dependants(run)

Return a list of nodes that are control dependent on the given node in the control dependence graph

get_guardians(run)

Return a list of nodes on whom the specific node is control dependent in the control dependence graph

class angr.analyses.CFG

Bases: CFGFast

tl;dr: CFG is just a wrapper around CFGFast for compatibility issues. It will be fully replaced by CFGFast in future releases. Feel free to use CFG if you intend to use CFGFast. Please use CFGEmulated if you have to use the old, slow, dynamically-generated version of CFG.

For multiple historical reasons, angr’s CFG is accurate but slow, which does not meet what most people expect. We developed CFGFast for light-speed CFG recovery, and renamed the old CFG class to CFGEmulated. For compatibility concerns, CFG was kept as an alias to CFGEmulated.

However, so many new users of angr would load up a binary and generate a CFG immediately after running “pip install angr”, and draw the conclusion that “angr’s CFG is so slow - angr must be unusable!” Therefore, we made the hard decision: CFG will be an alias to CFGFast, instead of CFGEmulated.

To ease the transition of your existing code and script, the following changes are made:

  • A CFG class, which is a sub class of CFGFast, is created.

  • You will see both a warning message printed out to stderr and an exception raised by angr if you are passing CFG any parameter that only CFGEmulated supports. This exception is not a sub class of AngrError, so you wouldn’t capture it with your old code by mistake.

  • In the near future, this wrapper class will be removed completely, and CFG will be a simple alias to CFGFast.

We expect most interfaces are the same between CFGFast and CFGEmulated. Apparently some functionalities (like context-sensitivity, and state keeping) only exist in CFGEmulated, which is when you want to use CFGEmulated instead.

class angr.analyses.DDG

Bases: Analysis

This is a fast data dependence graph directly generated from our CFG analysis result. The only reason for its existence is the speed. There is zero guarantee for being sound or accurate. You are supposed to use it only when you want to track the simplest data dependence, and you do not care about soundness or accuracy.

For a better data dependence graph, please consider performing a better static analysis first (like Value-set Analysis), and then construct a dependence graph on top of the analysis result (for example, the VFG in angr).

The DDG is based on a CFG, which should ideally be a CFGEmulated generated with the following options:

  • keep_state=True to keep all input states

  • state_add_options=angr.options.refs to store memory, register, and temporary value accesses

You may want to consider a high value for context_sensitivity_level as well when generating the CFG.

Also note that since we are using states from CFG, any improvement in analysis performed on CFG (like a points-to analysis) will directly benefit the DDG.

__init__(cfg, start=None, call_depth=None, block_addrs=None)
Parameters:
  • cfg – Control flow graph. Please make sure each node has an associated state with it, e.g. by passing the keep_state=True and state_add_options=angr.options.refs arguments to CFGEmulated.

  • start – An address, Specifies where we start the generation of this data dependence graph.

  • call_depth – None or integers. A non-negative integer specifies how deep we would like to track in the call tree. None disables call_depth limit.

  • block_addrs (iterable or None) – A collection of block addresses that the DDG analysis should be performed on.

property graph

A networkx DiGraph instance representing the dependence relations between statements. :rtype: networkx.DiGraph

Type:

returns

property data_graph

Get the data dependence graph.

Returns:

A networkx DiGraph instance representing data dependence.

Return type:

networkx.DiGraph

property simplified_data_graph

return:

property ast_graph
pp()

Pretty printing.

dbg_repr()

Representation for debugging.

get_predecessors(code_location)

Returns all predecessors of the code location.

Parameters:

code_location – A CodeLocation instance.

Returns:

A list of all predecessors.

function_dependency_graph(func)

Get a dependency graph for the function func.

Parameters:

func – The Function object in CFG.function_manager.

Returns:

A networkx.DiGraph instance.

data_sub_graph(pv, simplified=True, killing_edges=False, excluding_types=None)

Get a subgraph from the data graph or the simplified data graph that starts from node pv.

Parameters:
  • pv (ProgramVariable) – The starting point of the subgraph.

  • simplified (bool) – When True, the simplified data graph is used, otherwise the data graph is used.

  • killing_edges (bool) – Are killing edges included or not.

  • excluding_types (iterable) – Excluding edges whose types are among those excluded types.

Returns:

A subgraph.

Return type:

networkx.MultiDiGraph

find_definitions(variable, location=None, simplified_graph=True)

Find all definitions of the given variable.

Parameters:
  • variable (SimVariable)

  • simplified_graph (bool) – True if you just want to search in the simplified graph instead of the normal graph. Usually the simplified graph suffices for finding definitions of register or memory variables.

Returns:

A collection of all variable definitions to the specific variable.

Return type:

list

find_consumers(var_def, simplified_graph=True)

Find all consumers to the specified variable definition.

Parameters:
  • var_def (ProgramVariable) – The variable definition.

  • simplified_graph (bool) – True if we want to search in the simplified graph, False otherwise.

Returns:

A collection of all consumers to the specified variable definition.

Return type:

list

find_killers(var_def, simplified_graph=True)

Find all killers to the specified variable definition.

Parameters:
  • var_def (ProgramVariable) – The variable definition.

  • simplified_graph (bool) – True if we want to search in the simplified graph, False otherwise.

Returns:

A collection of all killers to the specified variable definition.

Return type:

list

find_sources(var_def, simplified_graph=True)

Find all sources to the specified variable definition.

Parameters:
  • var_def (ProgramVariable) – The variable definition.

  • simplified_graph (bool) – True if we want to search in the simplified graph, False otherwise.

Returns:

A collection of all sources to the specified variable definition.

Return type:

list

class angr.analyses.VFG

Bases: ForwardAnalysis[SimState, VFGNode, VFGJob, BlockID, SimState], Analysis

This class represents a control-flow graph with static analysis result.

Perform abstract interpretation analysis starting from the given function address. The output is an invariant at the beginning (or the end) of each basic block.

Steps:

  • Generate a CFG first if CFG is not provided.

  • Identify all merge points (denote the set of merge points as Pw) in the CFG.

  • Cut those loop back edges (can be derived from Pw) so that we gain an acyclic CFG.

  • Identify all variables that are 1) from memory loading 2) from initial values, or 3) phi functions. Denote

    the set of those variables as S_{var}.

  • Start real AI analysis and try to compute a fix point of each merge point. Perform widening/narrowing only on

    variables in S_{var}.

__init__(cfg=None, context_sensitivity_level=2, start=None, function_start=None, interfunction_level=0, initial_state=None, avoid_runs=None, remove_options=None, timeout=None, max_iterations_before_widening=8, max_iterations=40, widening_interval=3, final_state_callback=None, status_callback=None, record_function_final_states=False)
Parameters:
  • cfg (CFGEmulated | None) – The control-flow graph to base this analysis on. If none is provided, we will construct a CFGEmulated.

  • context_sensitivity_level (int) – The level of context-sensitivity of this VFG. It ranges from 0 to infinity. Default 2.

  • function_start (int | None) – The address of the function to analyze.

  • interfunction_level (int) – The level of interfunction-ness to be

  • initial_state (SimState | None) – A state to use as the initial one

  • avoid_runs (list[int] | None) – A list of runs to avoid

  • remove_options (set[str] | None) – State options to remove from the initial state. It only works when initial_state is None

  • timeout (int | None)

  • final_state_callback (Callable[[SimState, CallStack], Any] | None) – callback function when countering final state

  • status_callback (Callable[[VFG], Any] | None) – callback function used in _analysis_core_baremetal

  • start (int | None)

  • max_iterations_before_widening (int)

  • max_iterations (int)

  • widening_interval (int)

  • record_function_final_states (bool)

Return type:

None

property function_initial_states
property function_final_states
get_any_node(addr)

Get any VFG node corresponding to the basic block at @addr. Note that depending on the context sensitivity level, there might be multiple nodes corresponding to different contexts. This function will return the first one it encounters, which might not be what you want.

Return type:

VFGNode | None

Parameters:

addr (int)

get_all_nodes(addr)
Return type:

Generator[VFGNode]

irsb_from_node(node)
copy()
class angr.analyses.VSA_DDG

Bases: Analysis

A Data dependency graph based on VSA states. That means we don’t (and shouldn’t) expect any symbolic expressions.

__init__(vfg=None, start_addr=None, interfunction_level=0, context_sensitivity_level=2, keep_data=False)

Constructor.

Parameters:
  • vfg – An already constructed VFG. If not specified, a new VFG will be created with other specified parameters. vfg and start_addr cannot both be unspecified.

  • start_addr – The address where to start the analysis (typically, a function’s entry point).

  • interfunction_level – See VFG analysis.

  • context_sensitivity_level – See VFG analysis.

  • keep_data – Whether we keep set of addresses as edges in the graph, or just the cardinality of the sets, which can be used as a “weight”.

get_predecessors(code_location)

Returns all predecessors of code_location.

Parameters:

code_location – A CodeLocation instance.

Returns:

A list of all predecessors.

get_all_nodes(simrun_addr, stmt_idx)

Get all DDG nodes matching the given basic block address and statement index.

class angr.analyses.AnalysesHub

Bases: PluginVendor[Any]

This class contains functions for all the registered and runnable analyses,

__init__(project)
class angr.analyses.Analysis

Bases: object

This class represents an analysis on the program.

Variables:
  • project – The project for this analysis.

  • kb (KnowledgeBase) – The knowledgebase object.

  • _progress_callback – A callback function for receiving the progress of this analysis. It only takes one argument, which is a float number from 0.0 to 100.0 indicating the current progress.

  • _show_progressbar (bool) – If a progressbar should be shown during the analysis. It’s independent from _progress_callback.

  • _progressbar (progress.Progress) – The progress bar object.

project: Project
kb: KnowledgeBase
errors: list[AnalysisLogEntry] = []
named_errors: defaultdict[str, list[AnalysisLogEntry]] = {}
log: list
property ram_usage: float

Return the current RAM usage of the Python process, in bytes. The value is updated at most once per second.

class angr.analyses.BackwardSlice

Bases: Analysis

Represents a backward slice of the program.

__init__(cfg, cdg, ddg, targets=None, cfg_node=None, stmt_id=None, control_flow_slice=False, same_function=False, no_construct=False)

Create a backward slice from a specific statement based on provided control flow graph (CFG), control dependence graph (CDG), and data dependence graph (DDG).

The data dependence graph can be either CFG-based, or Value-set analysis based. A CFG-based DDG is much faster to generate, but it only reflects those states while generating the CFG, and it is neither sound nor accurate. The VSA based DDG (called VSA_DDG) is based on static analysis, which gives you a much better result.

Parameters:
  • cfg – The control flow graph.

  • cdg – The control dependence graph.

  • ddg – The data dependence graph.

  • targets – A list of “target” that specify targets of the backward slices. Each target can be a tuple in form of (cfg_node, stmt_idx), or a CodeLocation instance.

  • cfg_node – Deprecated. The target CFGNode to reach. It should exist in the CFG.

  • stmt_id – Deprecated. The target statement to reach.

  • control_flow_slice – True/False, indicates whether we should slice only based on CFG. Sometimes when acquiring DDG is difficult or impossible, you can just create a slice on your CFG. Well, if you don’t even have a CFG, then…

  • no_construct – Only used for testing and debugging to easily create a BackwardSlice object.

dbg_repr(max_display=10)

Debugging output of this slice.

Parameters:

max_display – The maximum number of SimRun slices to show.

Returns:

A string representation.

dbg_repr_run(run_addr)

Debugging output of a single SimRun slice.

Parameters:

run_addr – Address of the SimRun.

Returns:

A string representation.

annotated_cfg(start_point=None)

Returns an AnnotatedCFG based on slicing result.

is_taint_related_to_ip(simrun_addr, stmt_idx, taint_type, simrun_whitelist=None)

Query in taint graph to check if a specific taint will taint the IP in the future or not. The taint is specified with the tuple (simrun_addr, stmt_idx, taint_type).

Parameters:
  • simrun_addr – Address of the SimRun.

  • stmt_idx – Statement ID.

  • taint_type – Type of the taint, might be one of the following: ‘reg’, ‘tmp’, ‘mem’.

  • simrun_whitelist – A list of SimRun addresses that are whitelisted, i.e. the tainted exit will be ignored if it is in those SimRuns.

Returns:

True/False

is_taint_impacting_stack_pointers(simrun_addr, stmt_idx, taint_type, simrun_whitelist=None)

Query in taint graph to check if a specific taint will taint the stack pointer in the future or not. The taint is specified with the tuple (simrun_addr, stmt_idx, taint_type).

Parameters:
  • simrun_addr – Address of the SimRun.

  • stmt_idx – Statement ID.

  • taint_type – Type of the taint, might be one of the following: ‘reg’, ‘tmp’, ‘mem’.

  • simrun_whitelist – A list of SimRun addresses that are whitelisted.

Returns:

True/False.

class angr.analyses.BinDiff

Bases: Analysis

This class computes the a diff between two binaries represented by angr Projects

__init__(other_project, cfg_a=None, cfg_b=None)
Parameters:

other_project – The second project to diff

functions_probably_identical(func_a_addr, func_b_addr, check_consts=False)

Compare two functions and return True if they appear identical.

Parameters:
  • func_a_addr – The address of the first function (in the first binary).

  • func_b_addr – The address of the second function (in the second binary).

Returns:

Whether or not the functions appear to be identical.

property identical_functions

A list of function matches that appear to be identical

Type:

returns

property differing_functions

A list of function matches that appear to differ

Type:

returns

differing_functions_with_consts()
Returns:

A list of function matches that appear to differ including just by constants

property differing_blocks

A list of block matches that appear to differ

Type:

returns

property identical_blocks

return A list of all block matches that appear to be identical

property blocks_with_differing_constants

A dict of block matches with differing constants to the tuple of constants

Type:

return

property unmatched_functions
get_function_diff(function_addr_a, function_addr_b)
Parameters:
  • function_addr_a – The address of the first function (in the first binary)

  • function_addr_b – The address of the second function (in the second binary)

Returns:

the FunctionDiff of the two functions

class angr.analyses.BinaryOptimizer

Bases: Analysis

This is a collection of binary optimization techniques we used in Mechanical Phish during the finals of Cyber Grand Challenge. It focuses on dealing with some serious speed-impacting code constructs, and sort of worked on some CGC binaries compiled with O0. Use this analysis as a reference of how to use data dependency graph and such.

There is no guarantee that BinaryOptimizer will ever work on non-CGC binaries. Feel free to give us PR or MR, but please do not ask for support of non-CGC binaries.

BLOCKS_THRESHOLD = 500
__init__(cfg, techniques)
optimize()
class angr.analyses.BoyScout

Bases: Analysis

Try to determine the architecture and endieness of a binary blob

__init__(cookiesize=1)
class angr.analyses.CFGArchOptions

Bases: object

Stores architecture-specific options and settings, as well as the detailed explanation of those options and settings.

Suppose ao is the CFGArchOptions object, and there is an option called ret_jumpkind_heuristics, you can access it by ao.ret_jumpkind_heuristics and set its value via ao.ret_jumpkind_heuristics = True

Variables:
  • OPTIONS (dict) – A dict of all default options for different architectures.

  • arch (archinfo.Arch) – The architecture object.

  • _options (dict) – Values of all CFG options that are specific to the current architecture.

OPTIONS = {'ARMCortexM': {'has_arm_code': (<class 'bool'>, False), 'pattern_match_ifuncs': (<class 'bool'>, True), 'ret_jumpkind_heuristics': (<class 'bool'>, True), 'switch_mode_on_nodecode': (<class 'bool'>, False)}, 'ARMEL': {'has_arm_code': (<class 'bool'>, True), 'pattern_match_ifuncs': (<class 'bool'>, True), 'ret_jumpkind_heuristics': (<class 'bool'>, True), 'switch_mode_on_nodecode': (<class 'bool'>, True)}, 'ARMHF': {'has_arm_code': (<class 'bool'>, True), 'pattern_match_ifuncs': (<class 'bool'>, True), 'ret_jumpkind_heuristics': (<class 'bool'>, True), 'switch_mode_on_nodecode': (<class 'bool'>, True)}}
__init__(arch, **options)

Constructor.

Parameters:
  • arch (archinfo.Arch) – The architecture instance.

  • options (dict) – Architecture-specific options, which will be used to initialize this object.

arch = None
class angr.analyses.CFGEmulated

Bases: ForwardAnalysis, CFGBase

This class represents a control-flow graph.

tag: str = 'CFGEmulated'
addr_type: Literal['int', 'block_id', 'soot'] = 'block_id'
__init__(context_sensitivity_level=1, start=None, avoid_runs=None, enable_function_hints=False, call_depth=None, call_tracing_filter=None, initial_state=None, starts=None, keep_state=False, indirect_jump_target_limit=100000, resolve_indirect_jumps=True, enable_advanced_backward_slicing=False, enable_symbolic_back_traversal=False, indirect_jump_resolvers=None, additional_edges=None, no_construct=False, normalize=False, max_iterations=1, address_whitelist=None, base_graph=None, iropt_level=None, max_steps=None, state_add_options=None, state_remove_options=None, model=None)

All parameters are optional.

Parameters:
  • context_sensitivity_level – The level of context-sensitivity of this CFG (see documentation for further details). It ranges from 0 to infinity. Default 1.

  • avoid_runs – A list of runs to avoid.

  • enable_function_hints – Whether to use function hints (constants that might be used as exit targets) or not.

  • call_depth – How deep in the call stack to trace.

  • call_tracing_filter – Filter to apply on a given path and jumpkind to determine if it should be skipped when call_depth is reached.

  • initial_state – An initial state to use to begin analysis.

  • starts (iterable) – A collection of starting points to begin analysis. It can contain the following three different types of entries: an address specified as an integer, a 2-tuple that includes an integer address and a jumpkind, or a SimState instance. Unsupported entries in starts will lead to an AngrCFGError being raised.

  • keep_state – Whether to keep the SimStates for each CFGNode.

  • resolve_indirect_jumps – Whether to enable the indirect jump resolvers for resolving indirect jumps

  • enable_advanced_backward_slicing – Whether to enable an intensive technique for resolving indirect jumps

  • enable_symbolic_back_traversal – Whether to enable an intensive technique for resolving indirect jumps

  • indirect_jump_resolvers (list) – A custom list of indirect jump resolvers. If this list is None or empty, default indirect jump resolvers specific to this architecture and binary types will be loaded.

  • additional_edges – A dict mapping addresses of basic blocks to addresses of successors to manually include and analyze forward from.

  • no_construct (bool) – Skip the construction procedure. Only used in unit-testing.

  • normalize (bool) – If the CFG as well as all Function graphs should be normalized or not.

  • max_iterations (int) – The maximum number of iterations that each basic block should be “executed”. 1 by default. Larger numbers of iterations are usually required for complex analyses like loop analysis.

  • address_whitelist (iterable) – A list of allowed addresses. Any basic blocks outside of this collection of addresses will be ignored.

  • base_graph (networkx.DiGraph) – A basic control flow graph to follow. Each node inside this graph must have the following properties: addr and size. CFG recovery will strictly follow nodes and edges shown in the graph, and discard any control flow that does not follow an existing edge in the base graph. For example, you can pass in a Function local transition graph as the base graph, and CFGEmulated will traverse nodes and edges and extract useful information.

  • iropt_level (int) – The optimization level of VEX IR (0, 1, 2). The default level will be used if iropt_level is None.

  • max_steps (int) – The maximum number of basic blocks to recover forthe longest path from each start before pausing the recovery procedure.

  • state_add_options – State options that will be added to the initial state.

  • state_remove_options – State options that will be removed from the initial state.

copy()

Make a copy of the CFG.

Return type:

CFGEmulated

Returns:

A copy of the CFG instance.

resume(starts=None, max_steps=None)

Resume a paused or terminated control flow graph recovery.

Parameters:
  • starts (iterable) – A collection of new starts to resume from. If starts is None, we will resume CFG recovery from where it was paused before.

  • max_steps (int) – The maximum number of blocks on the longest path starting from each start before pausing the recovery.

Returns:

None

remove_cycles()

Forces graph to become acyclic, removes all loop back edges and edges between overlapped loop headers and their successors.

downsize()

Remove saved states from all CFGNodes to reduce memory usage.

Returns:

None

unroll_loops(max_loop_unrolling_times)

Unroll loops for each function. The resulting CFG may still contain loops due to recursion, function calls, etc.

Parameters:

max_loop_unrolling_times (int) – The maximum iterations of unrolling.

Returns:

None

force_unroll_loops(max_loop_unrolling_times)

Unroll loops globally. The resulting CFG does not contain any loop, but this method is slow on large graphs.

Parameters:

max_loop_unrolling_times (int) – The maximum iterations of unrolling.

Returns:

None

immediate_dominators(start, target_graph=None)

Get all immediate dominators of sub graph from given node upwards.

Parameters:
  • start (str) – id of the node to navigate forwards from.

  • target_graph (networkx.classes.digraph.DiGraph) – graph to analyse, default is self.graph.

Returns:

each node of graph as index values, with element as respective node’s immediate dominator.

Return type:

dict

immediate_postdominators(end, target_graph=None)

Get all immediate postdominators of sub graph from given node upwards.

Parameters:
  • start (str) – id of the node to navigate forwards from.

  • target_graph (networkx.classes.digraph.DiGraph) – graph to analyse, default is self.graph.

Returns:

each node of graph as index values, with element as respective node’s immediate dominator.

Return type:

dict

remove_fakerets()

Get rid of fake returns (i.e., Ijk_FakeRet edges) from this CFG

Returns:

None

get_topological_order(cfg_node)

Get the topological order of a CFG Node.

Parameters:

cfg_node – A CFGNode instance.

Returns:

An integer representing its order, or None if the CFGNode does not exist in the graph.

get_subgraph(starting_node, block_addresses)

Get a sub-graph out of a bunch of basic block addresses.

Parameters:
  • starting_node (CFGNode) – The beginning of the subgraph

  • block_addresses (iterable) – A collection of block addresses that should be included in the subgraph if there is a path between starting_node and a CFGNode with the specified address, and all nodes on the path should also be included in the subgraph.

Returns:

A new CFG that only contain the specific subgraph.

Return type:

CFGEmulated

get_function_subgraph(start, max_call_depth=None)

Get a sub-graph of a certain function.

Parameters:
  • start – The function start. Currently it should be an integer.

  • max_call_depth – Call depth limit. None indicates no limit.

Returns:

A CFG instance which is a sub-graph of self.graph

property context_sensitivity_level
property graph: SpillingCFG
property unresolvables

Get those SimRuns that have non-resolvable exits.

Returns:

A set of SimRuns

Return type:

set

property deadends

Get all CFGNodes that has an out-degree of 0

Returns:

A list of CFGNode instances

Return type:

list

class angr.analyses.CFGFast

Bases: ForwardAnalysis[CFGNode, CFGNode, CFGJob, int, object], CFGBase

We find functions inside the given binary, and build a control-flow graph in very fast manners: instead of simulating program executions, keeping track of states, and performing expensive data-flow analysis, CFGFast will only perform light-weight analyses combined with some heuristics, and with some strong assumptions.

In order to identify as many functions as possible, and as accurate as possible, the following operation sequence is followed:

# Active scanning

  • If the binary has “function symbols” (TODO: this term is not accurate enough), they are starting points of the code scanning

  • If the binary does not have any “function symbol”, we will first perform a function prologue scanning on the entire binary, and start from those places that look like function beginnings

  • Otherwise, the binary’s entry point will be the starting point for scanning

# Passive scanning

  • After all active scans are done, we will go through the whole image and scan all code pieces

Due to the nature of those techniques that are used here, a base address is often not required to use this analysis routine. However, with a correct base address, CFG recovery will almost always yield a much better result. A custom analysis, called GirlScout, is specifically made to recover the base address of a binary blob. After the base address is determined, you may want to reload the binary with the new base address by creating a new Project object, and then re-recover the CFG.

PRINTABLES = b'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r'
SPECIAL_THUNKS = {'AMD64': {b'\xe8\x07\x00\x00\x00\xf3\x90\x0f\xae\xe8\xeb\xf9H\x89\x04$\xc3': ('jmp', 'rax'), b'\xe8\x07\x00\x00\x00\xf3\x90\x0f\xae\xe8\xeb\xf9H\x8dd$\x08\xc3': ('ret',)}}
tag: str = 'CFGFast'
addr_type: Literal['int', 'block_id', 'soot'] = 'int'
__init__(binary=None, objects=None, regions=None, pickle_intermediate_results=False, symbols=True, function_prologues=None, resolve_indirect_jumps=True, force_segment=False, force_smart_scan=None, force_complete_scan=False, indirect_jump_target_limit=100000, data_references=True, cross_references=False, normalize=False, start_at_entry=True, function_starts=None, extra_memory_regions=None, data_type_guessing_handlers=None, arch_options=None, indirect_jump_resolvers=None, base_state=None, exclude_sparse_regions=True, skip_specific_regions=True, heuristic_plt_resolving=None, detect_tail_calls=False, low_priority=False, cfb=None, model=None, eh_frame=True, exceptions=True, skip_unmapped_addrs=True, nodecode_window_size=512, nodecode_threshold=0.3, nodecode_step=16483, check_funcret_max_job=500, indirect_calls_always_return=None, jumptable_resolver_resolves_calls=None, retedges=False, drop_bad_funcs=True, start=None, end=None, collect_data_references=None, extra_cross_references=None, elf_eh_frame=None, **extra_arch_options)
Parameters:
  • binary – The binary to recover CFG on. By default the main binary is used.

  • objects – A list of objects to recover the CFG on. By default it will recover the CFG of all loaded objects.

  • regions (iterable) – A list of tuples in the form of (start address, end address) describing memory regions that the CFG should cover.

  • pickle_intermediate_results (bool) – If we want to store the intermediate results or not.

  • symbols (bool) – Get function beginnings from symbols in the binary.

  • function_prologues (bool | None) – Scan the binary for function prologues, and use those positions as function beginnings

  • resolve_indirect_jumps (bool) – Try to resolve indirect jumps. This is necessary to resolve jump targets from jump tables, etc.

  • force_segment (bool) – Force CFGFast to rely on binary segments instead of sections.

  • force_complete_scan (bool) – Perform a complete scan on the binary and maximize the number of identified code blocks.

  • data_references (bool) – Enables the collection of references to data used by individual instructions. This does not collect ‘cross-references’, particularly those that involve multiple instructions. For that, see cross_references

  • cross_references (bool) – Whether CFGFast should collect “cross-references” from the entire program or not. This will populate the knowledge base with references to and from each recognizable address constant found in the code. Note that, because this performs constant propagation on the entire program, it may be much slower and consume more memory. This option implies data_references=True.

  • normalize (bool) – Normalize the CFG as well as all function graphs after CFG recovery.

  • start_at_entry (bool) – Begin CFG recovery at the entry point of this project. Setting it to False prevents CFGFast from viewing the entry point as one of the starting points of code scanning.

  • function_starts (list) – A list of extra function starting points. CFGFast will try to resume scanning from each address in the list.

  • extra_memory_regions (list) – A list of 2-tuple (start-address, end-address) that shows extra memory regions. Integers falling inside will be considered as pointers.

  • indirect_jump_resolvers (list) – A custom list of indirect jump resolvers. If this list is None or empty, default indirect jump resolvers specific to this architecture and binary types will be loaded.

  • base_state – A state to use as a backer for all memory loads

  • detect_tail_calls (bool) – Enable aggressive tail-call optimization detection.

  • eh_frame (bool) – Retrieve function starts (and maybe sizes later) from the .eh_frame of ELF binaries or exception records of PE binaries.

  • skip_unmapped_addrs – Ignore all branches into unmapped regions. True by default. You may want to set it to False if you are analyzing manually patched binaries or malware samples.

  • indirect_calls_always_return (bool | None) – Should CFG assume indirect calls must return or not. Assuming indirect calls must return will significantly reduce the number of constant propagation runs, but may reduce the overall CFG recovery precision when facing non-returning indirect calls. By default, we only assume indirect calls always return for large binaries (region > 50KB).

  • jumptable_resolver_resolves_calls (bool | None) – Whether JumpTableResolver should resolve indirect calls or not. Most indirect calls in C++ binaries or UEFI binaries cannot be resolved using jump table resolver and must be resolved using their specific resolvers. By default, we will only disable JumpTableResolver from resolving indirect calls for large binaries (region > 50 KB).

  • check_funcret_max_job – When popping return-site jobs out of the job queue, angr will prioritize jobs for which the callee is known to return. This check may be slow when there are a large amount of jobs in different caller functions, and this situation often occurs in obfuscated binaries where many functions never return. This parameter acts as a threshold to disable this check when the number of jobs in the queue exceeds this threshold.

  • start (int) – (Deprecated) The beginning address of CFG recovery.

  • end (int) – (Deprecated) The end address of CFG recovery.

  • arch_options (CFGArchOptions) – Architecture-specific options.

  • extra_arch_options – Any key-value pair in kwargs will be seen as an arch-specific option and will be used to set the option value in self._arch_options.

  • retedges (bool) – Whether to add return edges (from function endpoints to their return sites) in the CFG. Return edges are not added by default because they are often not useful during analysis; You can set retedges to True or call make_return_edges() after CFG recovery to create return edges. Note that this option does not impact function graphs.

  • progress_callback – (Inherited from angr.Analysis.) Callback for CFG recovery progress.

  • show_progressbar (bool) – (Inherited from angr.Analysis.) Show a progressbar during CFG recovery.

  • force_smart_scan (bool | None)

  • drop_bad_funcs (bool)

Returns:

None

property graph: SpillingCFG
property memory_data
property jump_tables
property insn_addr_to_memory_data
do_full_xrefs(overlay_state=None)

Perform xref recovery on all functions.

Parameters:

overlay (SimState) – An overlay state for loading constant data.

Returns:

None

drop_bad_functions()
make_return_edges()

For each returning function, create return edges in self.graph.

Returns:

None

copy()
output()
class angr.analyses.CFGFastSoot

Bases: CFGFast

addr_type: Literal['int', 'block_id', 'soot'] = 'soot'
drop_bad_functions()
make_functions()

Revisit the entire control flow graph, create Function instances accordingly, and correctly put blocks into each function.

Although Function objects are crated during the CFG recovery, they are neither sound nor accurate. With a pre-constructed CFG, this method rebuilds all functions bearing the following rules:

  • A block may only belong to one function.

  • Small functions lying inside the startpoint and the endpoint of another function will be merged with the other function

  • Tail call optimizations are detected.

  • PLT stubs are aligned by 16.

Returns:

None

class angr.analyses.CalleeCleanupFinder

Bases: Analysis

__init__(starts=None, hook_all=False)
analyze(addr)
class angr.analyses.CallingConventionAnalysis

Bases: Analysis

Analyze the calling convention of a function and guess a probable prototype.

The calling convention of a function can be inferred at both its call sites and the function itself. At call sites, we consider all register and stack variables that are not alive after the function call as parameters to this function. In the function itself, we consider all register and stack variables that are read but without initialization as parameters. Then we synthesize the information from both locations and make a reasonable inference of calling convention of this function.

Variables:
  • _function – The function to recover calling convention for.

  • _variable_manager – A handy accessor to the variable manager.

  • _cfg – A reference of the CFGModel of the current binary. It is used to discover call sites of the current function in order to perform analysis at call sites.

  • analyze_callsites – True if we should analyze all call sites of the current function to determine the calling convention and arguments. This can be time-consuming if there are many call sites to analyze.

  • cc (SimCC | None) – The recovered calling convention for the function.

  • _collect_facts – True if we should run FunctionFactCollector to collect input arguments and return value size. False if input arguments and return value size are provided by the user.

__init__(func, cfg=None, analyze_callsites=False, caller_func_addr=None, callsite_block_addr=None, callsite_insn_addr=None, func_graph=None, input_args=None, retval_size=None, extra_pop=None, collect_facts=False, collect_facts_arg_uses=False, collect_facts_arg_passthru=False)
Parameters:
  • func (Function | int | str | None)

  • cfg (CFGModel | None)

  • analyze_callsites (bool)

  • caller_func_addr (int | None)

  • callsite_block_addr (int | None)

  • callsite_insn_addr (int | None)

  • func_graph (DiGraph | None)

  • input_args (list[SimRegArg | SimStackArg] | None)

  • retval_size (int | None)

  • extra_pop (int | None)

  • collect_facts (bool)

  • collect_facts_arg_uses (bool)

  • collect_facts_arg_passthru (bool)

is_va_start_amd64(func)
Return type:

tuple[bool, int | None]

Parameters:

func (Function)

class angr.analyses.ClassIdentifier

Bases: Analysis

This is a class identifier for non stripped or partially stripped binaries, it identifies classes based on the demangled function names, and also assigns functions to their respective classes based on their names. It also uses the results from the VtableFinder analysis to assign the corresponding vtable to the classes.

self.classes contains a mapping between class names and SimCppClass objects

e.g. A::tool() and A::qux() belong to the class A

__init__()
class angr.analyses.CodeCaveAnalysis

Bases: Analysis

Best-effort static location of potential vacant code caves for possible code injection: - Padding functions - Unreachable code

__init__()
codecaves: list[CodeCave]
class angr.analyses.CodeTagging

Bases: Analysis

__init__(func)
analyze()
has_xor()

Detects if there is any xor operation in the function.

Returns:

Tags

has_bitshifts()

Detects if there is any bitwise operation in the function.

Returns:

Tags.

has_sql()

Detects if there is any reference to strings that look like SQL queries.

class angr.analyses.CompleteCallingConventionsAnalysis

Bases: Analysis

Implements full-binary calling convention analysis. During the initial analysis of a binary, you may set recover_variables to True so that it will perform variable recovery on each function before performing calling convention analysis.

__init__(mode=CallingConventionAnalysisMode.FASTISH, recover_variables=False, low_priority=False, force=False, cfg=None, analyze_callsites=False, skip_signature_matched_functions=False, max_function_blocks=None, max_function_size=None, workers=0, cc_callback=None, prioritize_func_addrs=None, skip_other_funcs=False, auto_start=True, func_graphs=None, target_functions=None)
Parameters:
  • recover_variables – Recover variables on each function before performing calling convention analysis.

  • low_priority – Run in the background - periodically release GIL.

  • force – Perform calling convention analysis on functions even if they have calling conventions or prototypes already specified (or previously recovered).

  • cfg (CFGFast | CFGModel | None) – The control flow graph model, which will be passed to CallingConventionAnalysis.

  • analyze_callsites (bool) – Consider artifacts at call sites when performing calling convention analysis.

  • skip_signature_matched_functions (bool) – Do not perform calling convention analysis on functions that match against existing FLIRT signatures.

  • max_function_blocks (int | None) – Do not perform calling convention analysis on functions with more than the specified number of blocks. Setting it to None disables this check.

  • max_function_size (int | None) – Do not perform calling convention analysis on functions whose sizes are more than max_function_size. Setting it to None disables this check.

  • workers (int) – Number of multiprocessing workers.

  • mode (CallingConventionAnalysisMode)

  • cc_callback (Callable | None)

  • prioritize_func_addrs (list[int] | set[int] | None)

  • skip_other_funcs (bool)

  • auto_start (bool)

  • func_graphs (dict[int, DiGraph] | None)

  • target_functions (set[int] | None)

work()
prioritize_functions(func_addrs_to_prioritize)

Prioritize the analysis of specified functions.

Parameters:

func_addrs_to_prioritize (Iterable[int]) – A collection of function addresses to analyze first.

static function_needs_variable_recovery(func)

Check if running variable recovery on the function is the only way to determine the calling convention of the this function.

We do not need to run variable recovery to determine the calling convention of a function if: - The function is a SimProcedure. - The function is a PLT stub. - The function is a library function and we already know its prototype.

Parameters:

func – The function object.

Returns:

True if we must run VariableRecovery before we can determine what the calling convention of this function is. False otherwise.

Return type:

bool

class angr.analyses.CongruencyCheck

Bases: Analysis

This is an analysis to ensure that angr executes things identically with different execution backends (i.e., unicorn vs vex).

__init__(throw=False)

Initializes a CongruencyCheck analysis.

Parameters:

throw – whether to raise an exception if an incongruency is found.

set_state_options(left_add_options=None, left_remove_options=None, right_add_options=None, right_remove_options=None)

Checks that the specified state options result in the same states over the next depth states.

set_states(left_state, right_state)

Checks that the specified paths stay the same over the next depth states.

set_simgr(simgr)
run(depth=None)

Checks that the paths in the specified path group stay the same over the next depth bytes.

The path group should have a “left” and a “right” stash, each with a single path.

compare_path_group(pg)
compare_states(sl, sr)

Compares two states for similarity.

compare_paths(pl, pr)
class angr.analyses.DataDependencyGraphAnalysis

Bases: Analysis

This is a DYNAMIC data dependency graph that utilizes a given SimState to produce a DDG graph that is accurate to the path the program took during execution.

This analysis utilizes the SimActionData objects present in the provided SimState’s action history to generate the dependency graph.

__init__(end_state, start_from=None, end_at=None, block_addrs=None)
Parameters:
  • end_state (SimState) – Simulation state used to extract all SimActionData

  • start_from (int | None) – An address or None, Specifies where to start generation of DDG

  • end_at (int | None) – An address or None, Specifies where to end generation of DDG

  • block_addrs (list[int] | None) – List of block addresses that the DDG analysis should be run on

property graph: DiGraph | None
property simplified_graph: DiGraph | None
property sub_graph: DiGraph | None
get_data_dep(g_node, include_tmp_nodes, backwards)
Return type:

DiGraph | None

Parameters:
class angr.analyses.Decompiler

Bases: Analysis

The decompiler analysis.

Run this on a Function object for which a normalized CFG has been constructed. The fully processed output can be found in result.codegen.text

__init__(func, cfg=None, options=None, preset=None, optimization_passes=None, sp_tracker_track_memory=True, variable_kb=None, peephole_optimizations=None, vars_must_struct=None, flavor='pseudocode', expr_comments=None, stmt_comments=None, ite_exprs=None, binop_operators=None, decompile=True, regen_clinic=True, inline_functions=None, desired_variables=None, update_memory_data=True, want_full_graph=False, generate_code=True, use_cache=True, update_cache=True, expr_collapse_depth=16, clinic_graph=None, clinic_arg_vvars=None, clinic_start_stage=None, clinic_end_stage=None, clinic_skip_stages=(), static_vvars=None, static_buffers=None, codegen_cls=<class 'angr.analyses.decompiler.structured_codegen.c.CStructuredCodeGenerator'>)
Parameters:
reflow_variable_types(cache)

Re-run type inference on an existing variable recovery result, then rerun codegen to generate new results.

Returns:

Parameters:

cache (DecompilationCache)

find_data_references_and_update_memory_data(seq_node)
Parameters:

seq_node (SequenceNode)

transform_graph_from_ssa(ail_graph)

Translate an SSA AIL graph out of SSA form. This is useful for producing a non-SSA AIL graph for displaying in angr management.

Parameters:

ail_graph (DiGraph) – The AIL graph to transform out of SSA form.

Return type:

DiGraph

Returns:

The translated AIL graph.

transform_seqnode_from_ssa(seq_node)
Return type:

SequenceNode

Parameters:

seq_node (SequenceNode)

llm_refine()

Use the configured LLM to suggest improved variable names, function names, and variable types. Returns True if any changes were made.

Return type:

bool

llm_suggest_variable_names(llm_client=None, code_text=None, raise_exc=False)

Ask the LLM to suggest better variable names for the decompiled code. Returns True if any variables were renamed.

Parameters:
  • raise_exc (bool) – If True, exceptions from the LLM call are propagated to the caller. If False (default), exceptions are caught and the method returns False.

  • code_text (str | None)

Return type:

bool

llm_suggest_function_name(llm_client=None, code_text=None, raise_exc=False)

Ask the LLM to suggest a better function name. Only suggests rename for auto-generated names (starting with sub_ or fcn.). Returns True if the function was renamed.

Parameters:
  • raise_exc (bool) – If True, exceptions from the LLM call are propagated to the caller.

  • code_text (str | None)

Return type:

bool

llm_suggest_variable_types(llm_client=None, code_text=None, raise_exc=False)

Ask the LLM to suggest better C types for variables. Returns True if any variable types were changed.

Parameters:
  • raise_exc (bool) – If True, exceptions from the LLM call are propagated to the caller.

  • code_text (str | None)

Return type:

bool

llm_summarize_function(llm_client=None, code_text=None, raise_exc=False)

Ask the LLM to produce a natural-language summary of what the decompiled function does. The summary is stored in the DecompilationCache and returned.

Returns the summary string, or None if summarization failed.

Parameters:
  • raise_exc (bool) – If True, exceptions from the LLM call are propagated to the caller.

  • code_text (str | None)

Return type:

str | None

static options_to_params(options)

Convert decompilation options to a dict of params.

Parameters:

options (list[tuple[DecompilationOption, Any]]) – The decompilation options.

Return type:

dict[str, Any]

Returns:

A dict of keyword arguments.

class angr.analyses.Disassembly

Bases: Analysis

Produce formatted machine code disassembly.

__init__(function=None, ranges=None, thumb=False, include_ir=False, block_bytes=None)
Parameters:
func_lookup(block)
parse_block(block)

Parse instructions for a given block node

Return type:

None

Parameters:

block (BlockNode)

render(formatting=None, show_edges=True, show_addresses=True, show_bytes=False, ascii_only=None, color=True, min_edge_depth=0)

Render the disassembly to a string, with optional edges and addresses.

Color will be added by default, if enabled. To disable color pass an empty formatting dict.

Return type:

str

Parameters:
  • show_edges (bool)

  • show_addresses (bool)

  • show_bytes (bool)

  • ascii_only (bool | None)

  • color (bool)

  • min_edge_depth (int)

class angr.analyses.DominanceFrontier

Bases: Generic

Computes the dominance frontier of all nodes in a function graph, and provides an easy-to-use interface for querying the frontier information.

__init__(func, func_graph=None, entry=None, exception_edges=False)
Overloads:
  • self, func (Function), func_graph (networkx.DiGraph[T_co]), entry (T_co), exception_edges (bool)

  • self (DominanceFrontier[CodeNode]), func (Function), func_graph (networkx.DiGraph[CodeNode] | None), entry (CodeNode | None), exception_edges (bool)

class angr.analyses.FactCollector

Bases: Analysis

An extremely fast analysis that extracts necessary facts of a function for CallingConventionAnalysis to make decision on the calling convention and prototype of a function.

__init__(func, max_depth=100, track_arg_uses=False, track_arg_passthru=False)
Parameters:
class angr.analyses.FastConstantPropagation

Bases: Analysis

An extremely fast constant propagation analysis that finds function-wide constant values with potentially high false negative rates.

__init__(func, blocks=None, vex_cross_insn_opt=False, load_callback=None)
Parameters:
class angr.analyses.FlirtAnalysis

Bases: Analysis

FlirtAnalysis accomplishes two purposes:

  • If a FLIRT signature file is specified, it will match the given signature file against the current binary and rename recognized functions accordingly.

  • If no FLIRT signature file is specified, it will use strings to determine possible libraries embedded in the current binary, and then match all possible signatures for the architecture.

__init__(sig=None, max_mismatched_bytes=0, dry_run=False, match_named_functions=False)
Parameters:
class angr.analyses.ForwardAnalysis

Bases: Generic

This is my very first attempt to build a static forward analysis framework that can serve as the base of multiple static analyses in angr, including CFG analysis, VFG analysis, DDG, etc.

In short, ForwardAnalysis performs a forward data-flow analysis by traversing a graph, compute on abstract values, and store results in abstract states. The user can specify what graph to traverse, how a graph should be traversed, how abstract values and abstract states are defined, etc.

ForwardAnalysis has a few options to toggle, making it suitable to be the base class of several different styles of forward data-flow analysis implementations.

ForwardAnalysis supports a special mode when no graph is available for traversal (for example, when a CFG is being initialized and constructed, no other graph can be used). In that case, the graph traversal functionality is disabled, and the optimal graph traversal order is not guaranteed. The user can provide a job sorting method to sort the jobs in queue and optimize traversal order.

Feel free to discuss with me (Fish) if you have any suggestions or complaints.

__init__(order_jobs=False, allow_merging=False, allow_widening=False, status_callback=None, graph_visitor=None)

Constructor

Parameters:
  • order_jobs (bool) – If all jobs should be ordered or not.

  • allow_merging (bool) – If job merging is allowed.

  • allow_widening (bool) – If job widening is allowed.

  • graph_visitor (Optional[GraphVisitor[TypeVar(NodeType)]]) – A graph visitor to provide successors.

  • status_callback (Callable[[ForwardAnalysis], Any] | None)

Returns:

None

property should_abort

Should the analysis be terminated. :return: True/False

property graph: DiGraph
property jobs
abort()

Abort the analysis :return: None

has_job(job)

Checks whether there exists another job which has the same job key. :type job: TypeVar(JobType) :param job: The job to check.

Return type:

bool

Returns:

True if there exists another job with the same key, False otherwise.

Parameters:

job (JobType)

downsize()
class angr.analyses.Identifier

Bases: Analysis

__init__(cfg=None, require_predecessors=True, only_find=None)
run(only_find=None)
can_call_same_name(addr, name)
get_func_info(func)
static constrain_all_zero(before_state, state, regs)
identify_func(function)
check_tests(cfg_func, match_func)
map_callsites()
do_trace(addr_trace, reverse_accesses, func_info)
get_call_args(func, callsite)
static get_reg_name(arch, reg_offset)
Parameters:
  • arch – the architecture

  • reg_offset – Tries to find the name of a register given the offset in the registers.

Returns:

The register name

find_stack_vars_x86(func)
static make_initial_state(project, stack_length)
Returns:

an initial state with a symbolic stack and good options for rop

static make_symbolic_state(project, reg_list, stack_length=80)

converts an input state into a state with symbolic registers :return: the symbolic state

class angr.analyses.InitializationFinder

Bases: ForwardAnalysis, Analysis

Finds possible initializations for global data sections and generate an overlay to be used in other analyses later on.

class angr.analyses.LanguageDetector

Bases: Analysis

Detect the original programming language and compiler used to build a binary.

Supports detection of C (gcc, clang, msvc), Rust, Go, and Swift through multiple heuristic layers: DWARF debug info, .comment sections, symbol patterns, section names, and linked library names.

Usage:

result = project.analyses.LanguageDetector()
print(result.language)     # "rust"
print(result.compiler)     # "rustc"
print(result.confidence)   # "high"
print(result.evidence)     # ["symbol: __rust_alloc", ...]
__init__()
property language: str
property compiler: str | None
property compiler_version: str | None
property confidence: LanguageDetectionConfidenceLevel
property evidence: list[str]
class angr.analyses.LoopAnalysis

Bases: Analysis

Analyze loop nodes in a structured C code representation and extract relevant information about the loop, including - Loop block addresses - Loop exits - Loop type - Loop condition - Max iterations - Fixed iterations

__init__(cfunc)
Parameters:

cfunc (CFunction)

class angr.analyses.LoopFinder

Bases: Analysis

Extracts all the loops from all the functions in a binary.

__init__(functions=None, normalize=True)
class angr.analyses.LoopUnroller

Bases: Analysis

Unroll a loop in an AIL graph for a specified number of iterations.

__init__(graph, loop_body, unroll_times, save_original, loop_body_incomplete=False)
Parameters:
  • graph (DiGraph)

  • loop_body (set[tuple[int, int | None]])

  • unroll_times (int)

  • save_original (bool)

  • loop_body_incomplete (bool)

class angr.analyses.PackingDetector

Bases: Analysis

This analysis detects if a binary is likely packed or not. We may extend it to identify which packer is in use in the future.

PACKED_MIN_BYTES = 256
PACKED_ENTROPY_MIN_THRESHOLD = 0.88
__init__(cfg=None, region_size_threshold=32)
Parameters:
analyze()
class angr.analyses.PatchFinderAnalysis

Bases: Analysis

Looks for binary patches using some basic heuristics: - Looking for interleaved functions - Looking for unaligned functions

__init__()
atypical_alignments: list[AtypicallyAlignedFunction]
possibly_patched_out: list[PatchedOutFunctionality]
class angr.analyses.Pathfinder

Bases: Analysis

__init__(start_state, goal_addr, cfg, cache_size=10000)
Parameters:
cache_state(state)
Parameters:

state (SimState)

marker_to_state(marker)
Return type:

SimState | None

Parameters:

marker (SimStateMarker)

analyze()
Return type:

bool

find_best_hypothesis_path()
Return type:

tuple[int, ...]

diagnose_unsat(state)
Parameters:

state (SimState)

test_path(bbl_addr_trace)
Return type:

TestPathReport

Parameters:

bbl_addr_trace (tuple[int, ...])

class angr.analyses.PropagatorAnalysis

Bases: ForwardAnalysis, Analysis

PropagatorAnalysis implements copy propagation. It propagates values (either constant values or variables) and expressions inside a block or across a function.

PropagatorAnalysis only supports VEX. For AIL, please use SPropagator.

PropagatorAnalysis performs certain arithmetic operations between constants, including but are not limited to:

  • addition

  • subtraction

  • multiplication

  • division

  • xor

It also performs the following memory operations:

  • Loading values from a known address

  • Writing values to a stack variable

property prop_key: tuple[str | None, str, int, bool, bool, bool]

Gets a key that represents the function and the “flavor” of the propagation result.

property replacements
class angr.analyses.ProximityGraphAnalysis

Bases: Analysis

Generate a proximity graph.

__init__(func, cfg_model, xrefs, decompilation=None, expand_funcs=None)
Parameters:
class angr.analyses.ReachingDefinitionsAnalysis

Bases: ForwardAnalysis[ReachingDefinitionsState, NodeType, object, object, object], Analysis

ReachingDefinitionsAnalysis is a text-book implementation of a static data-flow analysis that works on either a function or a block. It supports both VEX and AIL. By registering observers to observation points, users may use this analysis to generate use-def chains, def-use chains, and reaching definitions, and perform other traditional data-flow analyses such as liveness analysis.

  • I’ve always wanted to find a better name for this analysis. Now I gave up and decided to live with this name for the foreseeable future (until a better name is proposed by someone else).

  • Aliasing is definitely a problem, and I forgot how aliasing is resolved in this implementation. I’ll leave this as a post-graduation TODO.

  • Some more documentation and examples would be nice.

__init__(subject, func_graph=None, max_iterations=30, track_tmps=False, track_consts=True, observation_points=None, init_state=None, init_context=None, state_initializer=None, cc=None, function_handler=None, observe_all=False, visited_blocks=None, dep_graph=True, observe_callback=None, canonical_size=8, stack_pointer_tracker=None, use_callee_saved_regs_at_return=True, interfunction_level=0, track_liveness=True, func_addr=None, element_limit=5, merge_into_tops=True)
Parameters:
  • subject (Subject | Block | Block | Function | str) – The subject of the analysis: a function, or a single basic block

  • func_graph – Alternative graph for function.graph.

  • max_iterations – The maximum number of iterations before the analysis is terminated.

  • track_tmps – Whether or not temporary variables should be taken into consideration during the analysis.

  • observation_points (Iterable[tuple[Literal['insn', 'node', 'stmt', 'exit'], int | tuple[int, int] | tuple[int, int, int], ObservationPointType]] | None) – A collection of tuples of (“node”|”insn”, ins_addr, OP_TYPE) defining where reaching definitions should be copied and stored. OP_TYPE can be OP_BEFORE or OP_AFTER.

  • init_state (ReachingDefinitionsState | None) – An optional initialization state. The analysis creates and works on a copy. Default to None: the analysis then initialize its own abstract state, based on the given <Subject>.

  • init_context – If init_state is not given, this is used to initialize the context field of the initial state’s CodeLocation. The only default-supported type which may go here is a tuple of integers, i.e. a callstack. Anything else requires a custom FunctionHandler.

  • cc – Calling convention of the function.

  • function_handler (FunctionHandler | None) – The function handler to update the analysis state and results on function calls.

  • observe_all – Observe every statement, both before and after.

  • visited_blocks – A set of previously visited blocks.

  • dep_graph (DepGraph | bool | None) – An initial dependency graph to add the result of the analysis to. Set it to None to skip dependency graph generation.

  • canonical_size – The sizes (in bytes) that objects with an UNKNOWN_SIZE are treated as for operations where sizes are necessary.

  • dep_graph – Set this to True to generate a dependency graph for the subject. It will be available as result.dep_graph.

  • interfunction_level (int) – The number of functions we should recurse into. This parameter is only used if function_handler is not provided.

  • track_liveness (bool) – Whether to track liveness information. This can consume sizeable amounts of RAM on large functions. (e.g. ~15GB for a function with 4k nodes)

  • merge_into_tops (bool) – Merge known values into TOP if TOP is present. If True: {TOP} V {0xabc} = {TOP} If False: {TOP} V {0xabc} = {TOP, 0xabc}

  • state_initializer (RDAStateInitializer | None)

  • func_addr (int | None)

  • element_limit (int)

property observed_results: dict[tuple[str, int, int], LiveDefinitions]
property all_definitions
property all_uses
property one_result
property dep_graph: DepGraph
property visited_blocks
get_reaching_definitions_by_insn(ins_addr, op_type)
get_reaching_definitions_by_node(node_addr, op_type)
node_observe(node_addr, state, op_type, node_idx=None)
Parameters:
  • node_addr (int) – Address of the node.

  • state (ReachingDefinitionsState) – The analysis state.

  • op_type (ObservationPointType) – Type of the observation point. Must be one of the following: OP_BEFORE, OP_AFTER.

  • node_idx (int | None) – ID of the node. Used in AIL to differentiate blocks with the same address.

Return type:

None

insn_observe(insn_addr, stmt, block, state, op_type)
Parameters:
Return type:

None

stmt_observe(stmt_idx, stmt, block, state, op_type)
Parameters:
Return type:

None

Returns:

exit_observe(node_addr, exit_stmt_idx, block, state, node_idx=None)
Parameters:
property subject
callsites_to(target)
Return type:

Iterable[FunctionCallRelationships]

Parameters:

target (int | str | Function)

class angr.analyses.Reassembler

Bases: Analysis

High-level representation of a binary with a linear representation of all instructions and data regions. After calling “symbolize”, it essentially acts as a binary reassembler.

Tested on CGC, x86 and x86-64 binaries.

Disclaimer: The reassembler is an empirical solution. Don’t be surprised if it does not work on some binaries.

__init__(syntax='intel', remove_cgc_attachments=True, log_relocations=True)
property instructions

Get a list of all instructions in the binary

Returns:

A list of (address, instruction)

Return type:

tuple

property relocations
property inserted_asm_before_label
property inserted_asm_after_label
property main_executable_regions

return:

property main_nonexecutable_regions

return:

section_alignment(section_name)

Get the alignment for the specific section. If the section is not found, 16 is used as default.

Parameters:

section_name (str) – The section.

Returns:

The alignment in bytes.

Return type:

int

main_executable_regions_contain(addr)
Parameters:

addr

Returns:

main_executable_region_limbos_contain(addr)

Sometimes there exists a pointer that points to a few bytes before the beginning of a section, or a few bytes after the beginning of the section. We take care of that here.

Parameters:

addr (int) – The address to check.

Returns:

A 2-tuple of (bool, the closest base address)

Return type:

tuple

main_nonexecutable_regions_contain(addr)
Parameters:

addr (int) – The address to check.

Returns:

True if the address is inside a non-executable region, False otherwise.

Return type:

bool

main_nonexecutable_region_limbos_contain(addr, tolerance_before=64, tolerance_after=64)

Sometimes there exists a pointer that points to a few bytes before the beginning of a section, or a few bytes after the beginning of the section. We take care of that here.

Parameters:

addr (int) – The address to check.

Returns:

A 2-tuple of (bool, the closest base address)

Return type:

tuple

register_instruction_reference(insn_addr, ref_addr, sort, operand_offset)
register_data_reference(data_addr, ref_addr)
add_label(name, addr)

Add a new label to the symbol manager.

Parameters:
  • name (str) – Name of the label.

  • addr (int) – Address of the label.

Returns:

None

insert_asm(addr, asm_code, before_label=False)

Insert some assembly code at the specific address. There must be an instruction starting at that address.

Parameters:
  • addr (int) – Address of insertion

  • asm_code (str) – The assembly code to insert

Returns:

None

append_procedure(name, asm_code)

Add a new procedure with specific name and assembly code.

Parameters:
  • name (str) – The name of the new procedure.

  • asm_code (str) – The assembly code of the procedure

Returns:

None

append_data(name, initial_content, size, readonly=False, sort='unknown')

Append a new data entry into the binary with specific name, content, and size.

Parameters:
  • name (str) – Name of the data entry. Will be used as the label.

  • initial_content (bytes) – The initial content of the data entry.

  • size (int) – Size of the data entry.

  • readonly (bool) – If the data entry belongs to the readonly region.

  • sort (str) – Type of the data.

Returns:

None

remove_instruction(ins_addr)
Parameters:

ins_addr

Returns:

randomize_procedures()
Returns:

symbolize()
assembly(comments=False, symbolized=True)
remove_cgc_attachments()

Remove CGC attachments.

Returns:

True if CGC attachments are found and removed, False otherwise

Return type:

bool

remove_unnecessary_stuff()

Remove unnecessary functions and data

Returns:

None

remove_unnecessary_stuff_glibc()
fast_memory_load(addr, size, data_type, endness='Iend_LE')

Load memory bytes from loader’s memory backend.

Parameters:
  • addr (int) – The address to begin memory loading.

  • size (int) – Size in bytes.

  • data_type – Type of the data.

  • endness (str) – Endianness of this memory load.

Returns:

Data read out of the memory.

Return type:

int or bytes or str or None

class angr.analyses.SLivenessAnalysis

Bases: Analysis

Calculates LiveIn and LiveOut sets for each block in a partial-SSA function.

__init__(func, func_graph, entry=None, func_addr=None, arg_vvars=None)
interference_graph()

Generate an interference graph based on the liveness analysis result.

Returns:

A networkx.Graph instance.

Return type:

networkx.Graph[int]

live_vars_by_stmt()

Get a mapping from statements to live variables at the point of the statement.

Return type:

defaultdict[tuple[int, int | None], dict[int, set[int]]]

Returns:

A dictionary mapping statements to sets of live variable IDs.

class angr.analyses.SPropagatorAnalysis

Bases: Analysis

Constant and expression propagation that only supports SSA AIL graphs.

__init__(subject, *, ail_manager, func_graph=None, only_consts=True, stack_pointer_tracker=None, func_args=None, func_addr=None, stack_arg_offsets=None)
Parameters:
property replacements
property dead_vvar_ids
static is_global_variable_updated(func_graph, block_dict, varid, gv_addr, gv_size, defloc, useloc)
Return type:

bool

Parameters:
static is_vvar_used_for_addr_loading_switch_case(uselocs, blocks)

Check if a virtual variable is used for loading an address in a switch-case construct.

Parameters:
  • uselocs (set[AILCodeLocation]) – The use locations of the virtual variable.

  • blocks – All blocks of the current function.

Return type:

bool

Returns:

True if the virtual variable is used for loading an address in a switch-case construct, False otherwise.

static replace(replacements, loc, expr, value)
Return type:

None

Parameters:
class angr.analyses.SReachingDefinitionsAnalysis

Bases: Analysis

Constant and expression propagation that only supports SSA AIL graphs.

__init__(subject, func_addr=None, func_graph=None, func_args=None, use_callee_saved_regs_at_return=False, track_tmps=False)
class angr.analyses.SelfModifyingCodeAnalysis

Bases: Analysis

Determine if some piece of code is self-modifying.

This determination is made by simply executing. If an address is executed that is also written to, the code is determined to be self-modifying. The determination is stored in the result property. The regions property contains a list of (addr, length) regions that were both written to and executed.

__init__(subject, max_bytes=0, state=None)
Parameters:
  • subject (None | int | str | Function) – Subject of analysis

  • max_bytes (int) – Maximum number of bytes from subject address. 0 for no limit (default).

  • state (SimState | None) – State to begin executing from.

regions: list[tuple[int, int]]
result: bool
class angr.analyses.SootClassHierarchy

Bases: Analysis

Generate complete hierarchy.

__init__()
init_hierarchy()
has_super_class(cls)
is_subclass_including(cls_child, cls_parent)
is_subclass(cls_child, cls_parent)
is_visible_method(cls, method)
is_visible_class(cls_from, cls_to)
get_super_classes(cls)
get_super_classes_including(cls)
get_implementers(interface)
get_sub_interfaces_including(interface)
get_sub_interfaces(interface)
get_sub_classes(cls)
get_sub_classes_including(cls)
resolve_abstract_dispatch(cls, method)
resolve_concrete_dispatch(cls, method)
resolve_special_dispatch(method, container)
resolve_invoke(invoke_expr, method, container)
class angr.analyses.StackPointerTracker

Bases: Analysis, ForwardAnalysis

Track the offset of stack pointer at the end of each basic block of a function.

offset_after(addr, reg)
offset_before(addr, reg)
offset_after_block(block_addr, reg)
offset_before_block(block_addr, reg)
constant_after(addr, reg)
constant_before(addr, reg)
constant_after_block(block_addr, reg)
constant_before_block(block_addr, reg)
property inconsistent
inconsistent_for(reg)
offsets_for(reg)
class angr.analyses.StaticHooker

Bases: Analysis

This analysis works on statically linked binaries - it finds the library functions statically linked into the binary and hooks them with the appropriate simprocedures.

Right now it only works on unstripped binaries, but hey! There’s room to grow!

__init__(library, binary=None)
class angr.analyses.StaticObjectFinder

Bases: Analysis

This analysis tries to find objects on the heap based on calls to new(), and subsequent calls to constructors with

the ‘this’ pointer

__init__()
class angr.analyses.Typehoon

Bases: Analysis

A spiritual tribute to the long-standing typehoon project that @jmg (John Grosen) worked on during his days in the angr team. Now I feel really bad of asking the poor guy to work directly on VEX IR without any fancy static analysis support as we have right now…

Typehoon analysis implements a pushdown system that simplifies and solves type constraints. Our type constraints are largely an implementation of the paper Polymorphic Type Inference for Machine Code by Noonan, Loginov, and Cok from GrammaTech (with missing functionality support and bugs, of course). Type constraints are collected by running VariableRecoveryFast (maybe VariableRecovery later as well) on a function, and then solved using this analysis.

User may specify ground truth, which will override all types at certain program points during constraint solving.

__init__(constraints, func_var, ground_truth=None, var_mapping=None, must_struct=None, stackvar_max_sizes=None, stack_offset_tvs=None, constraint_set_degradation_threshold=150, type_translator=None, tv_manager=None)
Parameters:
update_variable_types(func_addr, var_to_typevars, stack_offset_tvs=None)
Return type:

None

Parameters:
pp_constraints()

Pretty-print constraints between variables using the variable mapping.

Return type:

None

pp_solution()

Pretty-print solutions using the variable mapping.

Return type:

None

class angr.analyses.VariableRecovery

Bases: ForwardAnalysis, VariableRecoveryBase

Recover “variables” from a function using forced execution.

While variables play a very important role in programming, it does not really exist after compiling. However, we can still identify and recovery their counterparts in binaries. It is worth noting that not every variable in source code can be identified in binaries, and not every recognized variable in binaries have a corresponding variable in the original source code. In short, there is no guarantee that the variables we identified/recognized in a binary are the same variables in its source code.

This analysis uses heuristics to identify and recovers the following types of variables: - Register variables. - Stack variables. - Heap variables. (not implemented yet) - Global variables. (not implemented yet)

This analysis takes a function as input, and performs a data-flow analysis on nodes. It runs concrete execution on every statement and hooks all register/memory accesses to discover all places that are accessing variables. It is slow, but has a more accurate analysis result. For a fast but inaccurate variable recovery, you may consider using VariableRecoveryFast.

This analysis follows SSA, which means every write creates a new variable in registers or memory (statck, heap, etc.). Things may get tricky when overlapping variable (in memory, as you cannot really have overlapping accesses to registers) accesses exist, and in such cases, a new variable will be created, and this new variable will overlap with one or more existing variables. A decision procedure (which is pretty much TODO) is required at the end of this analysis to resolve the conflicts between overlapping variables.

__init__(func, max_iterations=20, store_live_variables=False)
Parameters:

func (knowledge.Function) – The function to analyze.

class angr.analyses.VariableRecoveryFast

Bases: ForwardAnalysis, VariableRecoveryBase

Recover “variables” from a function by keeping track of stack pointer offsets and pattern matching VEX statements.

If calling conventions are recovered prior to running VariableRecoveryFast, variables can be recognized more accurately. However, it is not a requirement. In this case, the function graph you pass must contain information indicating the call-out sites inside the analyzed function. These graph edges must be annotated with either "type": "call" or "outside": True.

class angr.analyses.Veritesting

Bases: Analysis

An exploration technique made for condensing chunks of code to single (nested) if-then-else constraints via CFG accurate to conduct Static Symbolic Execution SSE (conversion to single constraint)

cfg_cache = {}
all_stashes = ('successful', 'errored', 'deadended', 'deviated', 'unconstrained')
__init__(input_state, boundaries=None, loop_unrolling_limit=10, enable_function_inlining=False, terminator=None, deviation_filter=None)

SSE stands for Static Symbolic Execution, and we also implemented an extended version of Veritesting (Avgerinos, Thanassis, et al, ICSE 2014).

Parameters:
  • input_state – The initial state to begin the execution with.

  • boundaries – Addresses where execution should stop.

  • loop_unrolling_limit – The maximum times that Veritesting should unroll a loop for.

  • enable_function_inlining – Whether we should enable function inlining and syscall inlining.

  • terminator – A callback function that takes a state as parameter. Veritesting will terminate if this function returns True.

  • deviation_filter – A callback function that takes a state as parameter. Veritesting will put the state into “deviated” stash if this function returns True.

is_not_in_cfg(s)

Returns if s.addr is not a proper node in our CFG.

Parameters:

s (SimState) – The SimState instance to test.

Returns bool:

False if our CFG contains p.addr, True otherwise.

is_overbound(state)

Filter out all states that run out of boundaries or loop too many times.

param SimState state: SimState instance to check returns bool: True if outside of mem/loop_ctr boundary

class angr.analyses.VtableFinder

Bases: Analysis

This analysis locates Vtables in a binary based on heuristics taken from - “Reconstruction of Class Hierarchies for Decompilation of C++ Programs”

__init__()
is_cross_referenced(addr)
is_function(addr)
analyze()
create_extract_vtable(start_addr, sec_size)
class angr.analyses.XRefsAnalysis

Bases: ForwardAnalysis, Analysis

XRefsAnalysis recovers in-depth x-refs (cross-references) in disassembly code.

Here is an example:

.text:
000023C8                 LDR     R2, =time_now
000023CA                 LDR     R3, [R2]
000023CC                 ADDS    R3, #1
000023CE                 STR     R3, [R2]
000023D0                 BX      LR

.bss:
1FFF36F4 time_now        % 4

You will have the following x-refs for time_now:

23c8 - offset
23ca - read access
23ce - write access
angr.analyses.register_analysis(cls, name)

Submodules