angr.analyses¶
- class angr.analyses.CDG
Bases:
AnalysisImplements a control dependence graph.
- __init__(cfg, start=None, no_construct=False)
Constructor.
- Parameters:
cfg – The control flow graph upon which this control dependence graph will build
start – The starting point to begin constructing the control dependence graph
no_construct – Skip the construction step. Only used in unit-testing.
- property graph
- get_post_dominators()
Return the post-dom tree
- get_dependants(run)
Return a list of nodes that are control dependent on the given node in the control dependence graph
- get_guardians(run)
Return a list of nodes on whom the specific node is control dependent in the control dependence graph
- class angr.analyses.CFG
Bases:
CFGFasttl;dr: CFG is just a wrapper around CFGFast for compatibility issues. It will be fully replaced by CFGFast in future releases. Feel free to use CFG if you intend to use CFGFast. Please use CFGEmulated if you have to use the old, slow, dynamically-generated version of CFG.
For multiple historical reasons, angr’s CFG is accurate but slow, which does not meet what most people expect. We developed CFGFast for light-speed CFG recovery, and renamed the old CFG class to CFGEmulated. For compatibility concerns, CFG was kept as an alias to CFGEmulated.
However, so many new users of angr would load up a binary and generate a CFG immediately after running “pip install angr”, and draw the conclusion that “angr’s CFG is so slow - angr must be unusable!” Therefore, we made the hard decision: CFG will be an alias to CFGFast, instead of CFGEmulated.
To ease the transition of your existing code and script, the following changes are made:
A CFG class, which is a sub class of CFGFast, is created.
You will see both a warning message printed out to stderr and an exception raised by angr if you are passing CFG any parameter that only CFGEmulated supports. This exception is not a sub class of AngrError, so you wouldn’t capture it with your old code by mistake.
In the near future, this wrapper class will be removed completely, and CFG will be a simple alias to CFGFast.
We expect most interfaces are the same between CFGFast and CFGEmulated. Apparently some functionalities (like context-sensitivity, and state keeping) only exist in CFGEmulated, which is when you want to use CFGEmulated instead.
- class angr.analyses.DDG
Bases:
AnalysisThis is a fast data dependence graph directly generated from our CFG analysis result. The only reason for its existence is the speed. There is zero guarantee for being sound or accurate. You are supposed to use it only when you want to track the simplest data dependence, and you do not care about soundness or accuracy.
For a better data dependence graph, please consider performing a better static analysis first (like Value-set Analysis), and then construct a dependence graph on top of the analysis result (for example, the VFG in angr).
The DDG is based on a CFG, which should ideally be a CFGEmulated generated with the following options:
keep_state=True to keep all input states
state_add_options=angr.options.refs to store memory, register, and temporary value accesses
You may want to consider a high value for context_sensitivity_level as well when generating the CFG.
Also note that since we are using states from CFG, any improvement in analysis performed on CFG (like a points-to analysis) will directly benefit the DDG.
- __init__(cfg, start=None, call_depth=None, block_addrs=None)
- Parameters:
cfg – Control flow graph. Please make sure each node has an associated state with it, e.g. by passing the keep_state=True and state_add_options=angr.options.refs arguments to CFGEmulated.
start – An address, Specifies where we start the generation of this data dependence graph.
call_depth – None or integers. A non-negative integer specifies how deep we would like to track in the call tree. None disables call_depth limit.
block_addrs (iterable or None) – A collection of block addresses that the DDG analysis should be performed on.
- property graph
A networkx DiGraph instance representing the dependence relations between statements. :rtype: networkx.DiGraph
- Type:
returns
- property data_graph
Get the data dependence graph.
- Returns:
A networkx DiGraph instance representing data dependence.
- Return type:
networkx.DiGraph
- property simplified_data_graph
return:
- property ast_graph
- pp()
Pretty printing.
- dbg_repr()
Representation for debugging.
- get_predecessors(code_location)
Returns all predecessors of the code location.
- Parameters:
code_location – A CodeLocation instance.
- Returns:
A list of all predecessors.
- function_dependency_graph(func)
Get a dependency graph for the function func.
- Parameters:
func – The Function object in CFG.function_manager.
- Returns:
A networkx.DiGraph instance.
- data_sub_graph(pv, simplified=True, killing_edges=False, excluding_types=None)
Get a subgraph from the data graph or the simplified data graph that starts from node pv.
- Parameters:
pv (ProgramVariable) – The starting point of the subgraph.
simplified (bool) – When True, the simplified data graph is used, otherwise the data graph is used.
killing_edges (bool) – Are killing edges included or not.
excluding_types (iterable) – Excluding edges whose types are among those excluded types.
- Returns:
A subgraph.
- Return type:
networkx.MultiDiGraph
- find_definitions(variable, location=None, simplified_graph=True)
Find all definitions of the given variable.
- Parameters:
variable (SimVariable)
simplified_graph (bool) – True if you just want to search in the simplified graph instead of the normal graph. Usually the simplified graph suffices for finding definitions of register or memory variables.
- Returns:
A collection of all variable definitions to the specific variable.
- Return type:
- find_consumers(var_def, simplified_graph=True)
Find all consumers to the specified variable definition.
- Parameters:
var_def (ProgramVariable) – The variable definition.
simplified_graph (bool) – True if we want to search in the simplified graph, False otherwise.
- Returns:
A collection of all consumers to the specified variable definition.
- Return type:
- find_killers(var_def, simplified_graph=True)
Find all killers to the specified variable definition.
- Parameters:
var_def (ProgramVariable) – The variable definition.
simplified_graph (bool) – True if we want to search in the simplified graph, False otherwise.
- Returns:
A collection of all killers to the specified variable definition.
- Return type:
- find_sources(var_def, simplified_graph=True)
Find all sources to the specified variable definition.
- Parameters:
var_def (ProgramVariable) – The variable definition.
simplified_graph (bool) – True if we want to search in the simplified graph, False otherwise.
- Returns:
A collection of all sources to the specified variable definition.
- Return type:
- class angr.analyses.VFG
Bases:
ForwardAnalysis[SimState,VFGNode,VFGJob,BlockID,SimState],AnalysisThis class represents a control-flow graph with static analysis result.
Perform abstract interpretation analysis starting from the given function address. The output is an invariant at the beginning (or the end) of each basic block.
Steps:
Generate a CFG first if CFG is not provided.
Identify all merge points (denote the set of merge points as Pw) in the CFG.
Cut those loop back edges (can be derived from Pw) so that we gain an acyclic CFG.
- Identify all variables that are 1) from memory loading 2) from initial values, or 3) phi functions. Denote
the set of those variables as S_{var}.
- Start real AI analysis and try to compute a fix point of each merge point. Perform widening/narrowing only on
variables in S_{var}.
- __init__(cfg=None, context_sensitivity_level=2, start=None, function_start=None, interfunction_level=0, initial_state=None, avoid_runs=None, remove_options=None, timeout=None, max_iterations_before_widening=8, max_iterations=40, widening_interval=3, final_state_callback=None, status_callback=None, record_function_final_states=False)
- Parameters:
cfg (
CFGEmulated|None) – The control-flow graph to base this analysis on. If none is provided, we will construct a CFGEmulated.context_sensitivity_level (
int) – The level of context-sensitivity of this VFG. It ranges from 0 to infinity. Default 2.function_start (
int|None) – The address of the function to analyze.interfunction_level (
int) – The level of interfunction-ness to beinitial_state (
SimState|None) – A state to use as the initial oneremove_options (
set[str] |None) – State options to remove from the initial state. It only works when initial_state is Nonefinal_state_callback (
Callable[[SimState,CallStack],Any] |None) – callback function when countering final statestatus_callback (
Callable[[VFG],Any] |None) – callback function used in _analysis_core_baremetalstart (int | None)
max_iterations_before_widening (int)
max_iterations (int)
widening_interval (int)
record_function_final_states (bool)
- Return type:
None
- property function_initial_states
- property function_final_states
- get_any_node(addr)
Get any VFG node corresponding to the basic block at @addr. Note that depending on the context sensitivity level, there might be multiple nodes corresponding to different contexts. This function will return the first one it encounters, which might not be what you want.
- irsb_from_node(node)
- copy()
- class angr.analyses.VSA_DDG
Bases:
AnalysisA Data dependency graph based on VSA states. That means we don’t (and shouldn’t) expect any symbolic expressions.
- __init__(vfg=None, start_addr=None, interfunction_level=0, context_sensitivity_level=2, keep_data=False)
Constructor.
- Parameters:
vfg – An already constructed VFG. If not specified, a new VFG will be created with other specified parameters. vfg and start_addr cannot both be unspecified.
start_addr – The address where to start the analysis (typically, a function’s entry point).
interfunction_level – See VFG analysis.
context_sensitivity_level – See VFG analysis.
keep_data – Whether we keep set of addresses as edges in the graph, or just the cardinality of the sets, which can be used as a “weight”.
- get_predecessors(code_location)
Returns all predecessors of code_location.
- Parameters:
code_location – A CodeLocation instance.
- Returns:
A list of all predecessors.
- get_all_nodes(simrun_addr, stmt_idx)
Get all DDG nodes matching the given basic block address and statement index.
- class angr.analyses.AnalysesHub
Bases:
PluginVendor[Any]This class contains functions for all the registered and runnable analyses,
- __init__(project)
- class angr.analyses.Analysis
Bases:
objectThis class represents an analysis on the program.
- Variables:
project – The project for this analysis.
kb (KnowledgeBase) – The knowledgebase object.
_progress_callback – A callback function for receiving the progress of this analysis. It only takes one argument, which is a float number from 0.0 to 100.0 indicating the current progress.
_show_progressbar (bool) – If a progressbar should be shown during the analysis. It’s independent from _progress_callback.
_progressbar (progress.Progress) – The progress bar object.
- project: Project
- kb: KnowledgeBase
- errors: list[AnalysisLogEntry] = []¶
- named_errors: defaultdict[str, list[AnalysisLogEntry]] = {}
- log: list
- property ram_usage: float
Return the current RAM usage of the Python process, in bytes. The value is updated at most once per second.
- class angr.analyses.BackwardSlice
Bases:
AnalysisRepresents a backward slice of the program.
- __init__(cfg, cdg, ddg, targets=None, cfg_node=None, stmt_id=None, control_flow_slice=False, same_function=False, no_construct=False)
Create a backward slice from a specific statement based on provided control flow graph (CFG), control dependence graph (CDG), and data dependence graph (DDG).
The data dependence graph can be either CFG-based, or Value-set analysis based. A CFG-based DDG is much faster to generate, but it only reflects those states while generating the CFG, and it is neither sound nor accurate. The VSA based DDG (called VSA_DDG) is based on static analysis, which gives you a much better result.
- Parameters:
cfg – The control flow graph.
cdg – The control dependence graph.
ddg – The data dependence graph.
targets – A list of “target” that specify targets of the backward slices. Each target can be a tuple in form of (cfg_node, stmt_idx), or a CodeLocation instance.
cfg_node – Deprecated. The target CFGNode to reach. It should exist in the CFG.
stmt_id – Deprecated. The target statement to reach.
control_flow_slice – True/False, indicates whether we should slice only based on CFG. Sometimes when acquiring DDG is difficult or impossible, you can just create a slice on your CFG. Well, if you don’t even have a CFG, then…
no_construct – Only used for testing and debugging to easily create a BackwardSlice object.
- dbg_repr(max_display=10)
Debugging output of this slice.
- Parameters:
max_display – The maximum number of SimRun slices to show.
- Returns:
A string representation.
- dbg_repr_run(run_addr)
Debugging output of a single SimRun slice.
- Parameters:
run_addr – Address of the SimRun.
- Returns:
A string representation.
- annotated_cfg(start_point=None)
Returns an AnnotatedCFG based on slicing result.
- is_taint_related_to_ip(simrun_addr, stmt_idx, taint_type, simrun_whitelist=None)
Query in taint graph to check if a specific taint will taint the IP in the future or not. The taint is specified with the tuple (simrun_addr, stmt_idx, taint_type).
- Parameters:
simrun_addr – Address of the SimRun.
stmt_idx – Statement ID.
taint_type – Type of the taint, might be one of the following: ‘reg’, ‘tmp’, ‘mem’.
simrun_whitelist – A list of SimRun addresses that are whitelisted, i.e. the tainted exit will be ignored if it is in those SimRuns.
- Returns:
True/False
- is_taint_impacting_stack_pointers(simrun_addr, stmt_idx, taint_type, simrun_whitelist=None)
Query in taint graph to check if a specific taint will taint the stack pointer in the future or not. The taint is specified with the tuple (simrun_addr, stmt_idx, taint_type).
- Parameters:
simrun_addr – Address of the SimRun.
stmt_idx – Statement ID.
taint_type – Type of the taint, might be one of the following: ‘reg’, ‘tmp’, ‘mem’.
simrun_whitelist – A list of SimRun addresses that are whitelisted.
- Returns:
True/False.
- class angr.analyses.BinDiff
Bases:
AnalysisThis class computes the a diff between two binaries represented by angr Projects
- __init__(other_project, cfg_a=None, cfg_b=None)
- Parameters:
other_project – The second project to diff
- functions_probably_identical(func_a_addr, func_b_addr, check_consts=False)
Compare two functions and return True if they appear identical.
- Parameters:
func_a_addr – The address of the first function (in the first binary).
func_b_addr – The address of the second function (in the second binary).
- Returns:
Whether or not the functions appear to be identical.
- property identical_functions
A list of function matches that appear to be identical
- Type:
returns
- property differing_functions
A list of function matches that appear to differ
- Type:
returns
- differing_functions_with_consts()
- Returns:
A list of function matches that appear to differ including just by constants
- property differing_blocks
A list of block matches that appear to differ
- Type:
returns
- property identical_blocks
return A list of all block matches that appear to be identical
- property blocks_with_differing_constants
A dict of block matches with differing constants to the tuple of constants
- Type:
return
- property unmatched_functions
- get_function_diff(function_addr_a, function_addr_b)
- Parameters:
function_addr_a – The address of the first function (in the first binary)
function_addr_b – The address of the second function (in the second binary)
- Returns:
the FunctionDiff of the two functions
- class angr.analyses.BinaryOptimizer
Bases:
AnalysisThis is a collection of binary optimization techniques we used in Mechanical Phish during the finals of Cyber Grand Challenge. It focuses on dealing with some serious speed-impacting code constructs, and sort of worked on some CGC binaries compiled with O0. Use this analysis as a reference of how to use data dependency graph and such.
There is no guarantee that BinaryOptimizer will ever work on non-CGC binaries. Feel free to give us PR or MR, but please do not ask for support of non-CGC binaries.
- BLOCKS_THRESHOLD = 500¶
- __init__(cfg, techniques)
- optimize()
- class angr.analyses.BoyScout
Bases:
AnalysisTry to determine the architecture and endieness of a binary blob
- __init__(cookiesize=1)
- class angr.analyses.CFGArchOptions
Bases:
objectStores architecture-specific options and settings, as well as the detailed explanation of those options and settings.
Suppose ao is the CFGArchOptions object, and there is an option called ret_jumpkind_heuristics, you can access it by ao.ret_jumpkind_heuristics and set its value via ao.ret_jumpkind_heuristics = True
- Variables:
OPTIONS (dict) – A dict of all default options for different architectures.
arch (archinfo.Arch) – The architecture object.
_options (dict) – Values of all CFG options that are specific to the current architecture.
- OPTIONS = {'ARMCortexM': {'has_arm_code': (<class 'bool'>, False), 'pattern_match_ifuncs': (<class 'bool'>, True), 'ret_jumpkind_heuristics': (<class 'bool'>, True), 'switch_mode_on_nodecode': (<class 'bool'>, False)}, 'ARMEL': {'has_arm_code': (<class 'bool'>, True), 'pattern_match_ifuncs': (<class 'bool'>, True), 'ret_jumpkind_heuristics': (<class 'bool'>, True), 'switch_mode_on_nodecode': (<class 'bool'>, True)}, 'ARMHF': {'has_arm_code': (<class 'bool'>, True), 'pattern_match_ifuncs': (<class 'bool'>, True), 'ret_jumpkind_heuristics': (<class 'bool'>, True), 'switch_mode_on_nodecode': (<class 'bool'>, True)}}¶
- __init__(arch, **options)
Constructor.
- Parameters:
arch (archinfo.Arch) – The architecture instance.
options (dict) – Architecture-specific options, which will be used to initialize this object.
- arch = None¶
- class angr.analyses.CFGEmulated
Bases:
ForwardAnalysis,CFGBaseThis class represents a control-flow graph.
- __init__(context_sensitivity_level=1, start=None, avoid_runs=None, enable_function_hints=False, call_depth=None, call_tracing_filter=None, initial_state=None, starts=None, keep_state=False, indirect_jump_target_limit=100000, resolve_indirect_jumps=True, enable_advanced_backward_slicing=False, enable_symbolic_back_traversal=False, indirect_jump_resolvers=None, additional_edges=None, no_construct=False, normalize=False, max_iterations=1, address_whitelist=None, base_graph=None, iropt_level=None, max_steps=None, state_add_options=None, state_remove_options=None, model=None)
All parameters are optional.
- Parameters:
context_sensitivity_level – The level of context-sensitivity of this CFG (see documentation for further details). It ranges from 0 to infinity. Default 1.
avoid_runs – A list of runs to avoid.
enable_function_hints – Whether to use function hints (constants that might be used as exit targets) or not.
call_depth – How deep in the call stack to trace.
call_tracing_filter – Filter to apply on a given path and jumpkind to determine if it should be skipped when call_depth is reached.
initial_state – An initial state to use to begin analysis.
starts (iterable) – A collection of starting points to begin analysis. It can contain the following three different types of entries: an address specified as an integer, a 2-tuple that includes an integer address and a jumpkind, or a SimState instance. Unsupported entries in starts will lead to an AngrCFGError being raised.
keep_state – Whether to keep the SimStates for each CFGNode.
resolve_indirect_jumps – Whether to enable the indirect jump resolvers for resolving indirect jumps
enable_advanced_backward_slicing – Whether to enable an intensive technique for resolving indirect jumps
enable_symbolic_back_traversal – Whether to enable an intensive technique for resolving indirect jumps
indirect_jump_resolvers (list) – A custom list of indirect jump resolvers. If this list is None or empty, default indirect jump resolvers specific to this architecture and binary types will be loaded.
additional_edges – A dict mapping addresses of basic blocks to addresses of successors to manually include and analyze forward from.
no_construct (bool) – Skip the construction procedure. Only used in unit-testing.
normalize (bool) – If the CFG as well as all Function graphs should be normalized or not.
max_iterations (int) – The maximum number of iterations that each basic block should be “executed”. 1 by default. Larger numbers of iterations are usually required for complex analyses like loop analysis.
address_whitelist (iterable) – A list of allowed addresses. Any basic blocks outside of this collection of addresses will be ignored.
base_graph (networkx.DiGraph) – A basic control flow graph to follow. Each node inside this graph must have the following properties: addr and size. CFG recovery will strictly follow nodes and edges shown in the graph, and discard any control flow that does not follow an existing edge in the base graph. For example, you can pass in a Function local transition graph as the base graph, and CFGEmulated will traverse nodes and edges and extract useful information.
iropt_level (int) – The optimization level of VEX IR (0, 1, 2). The default level will be used if iropt_level is None.
max_steps (int) – The maximum number of basic blocks to recover forthe longest path from each start before pausing the recovery procedure.
state_add_options – State options that will be added to the initial state.
state_remove_options – State options that will be removed from the initial state.
- copy()
Make a copy of the CFG.
- Return type:
- Returns:
A copy of the CFG instance.
- resume(starts=None, max_steps=None)
Resume a paused or terminated control flow graph recovery.
- Parameters:
starts (iterable) – A collection of new starts to resume from. If starts is None, we will resume CFG recovery from where it was paused before.
max_steps (int) – The maximum number of blocks on the longest path starting from each start before pausing the recovery.
- Returns:
None
- remove_cycles()
Forces graph to become acyclic, removes all loop back edges and edges between overlapped loop headers and their successors.
- downsize()
Remove saved states from all CFGNodes to reduce memory usage.
- Returns:
None
- unroll_loops(max_loop_unrolling_times)
Unroll loops for each function. The resulting CFG may still contain loops due to recursion, function calls, etc.
- Parameters:
max_loop_unrolling_times (int) – The maximum iterations of unrolling.
- Returns:
None
- force_unroll_loops(max_loop_unrolling_times)
Unroll loops globally. The resulting CFG does not contain any loop, but this method is slow on large graphs.
- Parameters:
max_loop_unrolling_times (int) – The maximum iterations of unrolling.
- Returns:
None
- immediate_dominators(start, target_graph=None)
Get all immediate dominators of sub graph from given node upwards.
- immediate_postdominators(end, target_graph=None)
Get all immediate postdominators of sub graph from given node upwards.
- remove_fakerets()
Get rid of fake returns (i.e., Ijk_FakeRet edges) from this CFG
- Returns:
None
- get_topological_order(cfg_node)
Get the topological order of a CFG Node.
- Parameters:
cfg_node – A CFGNode instance.
- Returns:
An integer representing its order, or None if the CFGNode does not exist in the graph.
- get_subgraph(starting_node, block_addresses)
Get a sub-graph out of a bunch of basic block addresses.
- Parameters:
starting_node (CFGNode) – The beginning of the subgraph
block_addresses (iterable) – A collection of block addresses that should be included in the subgraph if there is a path between starting_node and a CFGNode with the specified address, and all nodes on the path should also be included in the subgraph.
- Returns:
A new CFG that only contain the specific subgraph.
- Return type:
- get_function_subgraph(start, max_call_depth=None)
Get a sub-graph of a certain function.
- Parameters:
start – The function start. Currently it should be an integer.
max_call_depth – Call depth limit. None indicates no limit.
- Returns:
A CFG instance which is a sub-graph of self.graph
- property context_sensitivity_level
- property graph: SpillingCFG
- property unresolvables
Get those SimRuns that have non-resolvable exits.
- Returns:
A set of SimRuns
- Return type:
- property deadends
Get all CFGNodes that has an out-degree of 0
- Returns:
A list of CFGNode instances
- Return type:
- class angr.analyses.CFGFast
Bases:
ForwardAnalysis[CFGNode,CFGNode,CFGJob,int,object],CFGBaseWe find functions inside the given binary, and build a control-flow graph in very fast manners: instead of simulating program executions, keeping track of states, and performing expensive data-flow analysis, CFGFast will only perform light-weight analyses combined with some heuristics, and with some strong assumptions.
In order to identify as many functions as possible, and as accurate as possible, the following operation sequence is followed:
# Active scanning
If the binary has “function symbols” (TODO: this term is not accurate enough), they are starting points of the code scanning
If the binary does not have any “function symbol”, we will first perform a function prologue scanning on the entire binary, and start from those places that look like function beginnings
Otherwise, the binary’s entry point will be the starting point for scanning
# Passive scanning
After all active scans are done, we will go through the whole image and scan all code pieces
Due to the nature of those techniques that are used here, a base address is often not required to use this analysis routine. However, with a correct base address, CFG recovery will almost always yield a much better result. A custom analysis, called GirlScout, is specifically made to recover the base address of a binary blob. After the base address is determined, you may want to reload the binary with the new base address by creating a new Project object, and then re-recover the CFG.
- PRINTABLES = b'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r'¶
- SPECIAL_THUNKS = {'AMD64': {b'\xe8\x07\x00\x00\x00\xf3\x90\x0f\xae\xe8\xeb\xf9H\x89\x04$\xc3': ('jmp', 'rax'), b'\xe8\x07\x00\x00\x00\xf3\x90\x0f\xae\xe8\xeb\xf9H\x8dd$\x08\xc3': ('ret',)}}¶
- __init__(binary=None, objects=None, regions=None, pickle_intermediate_results=False, symbols=True, function_prologues=None, resolve_indirect_jumps=True, force_segment=False, force_smart_scan=None, force_complete_scan=False, indirect_jump_target_limit=100000, data_references=True, cross_references=False, normalize=False, start_at_entry=True, function_starts=None, extra_memory_regions=None, data_type_guessing_handlers=None, arch_options=None, indirect_jump_resolvers=None, base_state=None, exclude_sparse_regions=True, skip_specific_regions=True, heuristic_plt_resolving=None, detect_tail_calls=False, low_priority=False, cfb=None, model=None, eh_frame=True, exceptions=True, skip_unmapped_addrs=True, nodecode_window_size=512, nodecode_threshold=0.3, nodecode_step=16483, check_funcret_max_job=500, indirect_calls_always_return=None, jumptable_resolver_resolves_calls=None, retedges=False, drop_bad_funcs=True, start=None, end=None, collect_data_references=None, extra_cross_references=None, elf_eh_frame=None, **extra_arch_options)
- Parameters:
binary – The binary to recover CFG on. By default the main binary is used.
objects – A list of objects to recover the CFG on. By default it will recover the CFG of all loaded objects.
regions (iterable) – A list of tuples in the form of (start address, end address) describing memory regions that the CFG should cover.
pickle_intermediate_results (bool) – If we want to store the intermediate results or not.
symbols (bool) – Get function beginnings from symbols in the binary.
function_prologues (
bool|None) – Scan the binary for function prologues, and use those positions as function beginningsresolve_indirect_jumps (bool) – Try to resolve indirect jumps. This is necessary to resolve jump targets from jump tables, etc.
force_segment (bool) – Force CFGFast to rely on binary segments instead of sections.
force_complete_scan (bool) – Perform a complete scan on the binary and maximize the number of identified code blocks.
data_references (bool) – Enables the collection of references to data used by individual instructions. This does not collect ‘cross-references’, particularly those that involve multiple instructions. For that, see cross_references
cross_references (bool) – Whether CFGFast should collect “cross-references” from the entire program or not. This will populate the knowledge base with references to and from each recognizable address constant found in the code. Note that, because this performs constant propagation on the entire program, it may be much slower and consume more memory. This option implies data_references=True.
normalize (bool) – Normalize the CFG as well as all function graphs after CFG recovery.
start_at_entry (bool) – Begin CFG recovery at the entry point of this project. Setting it to False prevents CFGFast from viewing the entry point as one of the starting points of code scanning.
function_starts (list) – A list of extra function starting points. CFGFast will try to resume scanning from each address in the list.
extra_memory_regions (list) – A list of 2-tuple (start-address, end-address) that shows extra memory regions. Integers falling inside will be considered as pointers.
indirect_jump_resolvers (list) – A custom list of indirect jump resolvers. If this list is None or empty, default indirect jump resolvers specific to this architecture and binary types will be loaded.
base_state – A state to use as a backer for all memory loads
detect_tail_calls (bool) – Enable aggressive tail-call optimization detection.
eh_frame (bool) – Retrieve function starts (and maybe sizes later) from the .eh_frame of ELF binaries or exception records of PE binaries.
skip_unmapped_addrs – Ignore all branches into unmapped regions. True by default. You may want to set it to False if you are analyzing manually patched binaries or malware samples.
indirect_calls_always_return (
bool|None) – Should CFG assume indirect calls must return or not. Assuming indirect calls must return will significantly reduce the number of constant propagation runs, but may reduce the overall CFG recovery precision when facing non-returning indirect calls. By default, we only assume indirect calls always return for large binaries (region > 50KB).jumptable_resolver_resolves_calls (
bool|None) – Whether JumpTableResolver should resolve indirect calls or not. Most indirect calls in C++ binaries or UEFI binaries cannot be resolved using jump table resolver and must be resolved using their specific resolvers. By default, we will only disable JumpTableResolver from resolving indirect calls for large binaries (region > 50 KB).check_funcret_max_job – When popping return-site jobs out of the job queue, angr will prioritize jobs for which the callee is known to return. This check may be slow when there are a large amount of jobs in different caller functions, and this situation often occurs in obfuscated binaries where many functions never return. This parameter acts as a threshold to disable this check when the number of jobs in the queue exceeds this threshold.
start (int) – (Deprecated) The beginning address of CFG recovery.
end (int) – (Deprecated) The end address of CFG recovery.
arch_options (CFGArchOptions) – Architecture-specific options.
extra_arch_options – Any key-value pair in kwargs will be seen as an arch-specific option and will be used to set the option value in self._arch_options.
retedges (
bool) – Whether to add return edges (from function endpoints to their return sites) in the CFG. Return edges are not added by default because they are often not useful during analysis; You can set retedges to True or call make_return_edges() after CFG recovery to create return edges. Note that this option does not impact function graphs.progress_callback – (Inherited from angr.Analysis.) Callback for CFG recovery progress.
show_progressbar (bool) – (Inherited from angr.Analysis.) Show a progressbar during CFG recovery.
force_smart_scan (bool | None)
drop_bad_funcs (bool)
- Returns:
None
- property graph: SpillingCFG
- property memory_data
- property jump_tables
- property insn_addr_to_memory_data
- do_full_xrefs(overlay_state=None)
Perform xref recovery on all functions.
- Parameters:
overlay (SimState) – An overlay state for loading constant data.
- Returns:
None
- drop_bad_functions()
- make_return_edges()
For each returning function, create return edges in self.graph.
- Returns:
None
- copy()
- output()
- class angr.analyses.CFGFastSoot
Bases:
CFGFast- drop_bad_functions()
- make_functions()
Revisit the entire control flow graph, create Function instances accordingly, and correctly put blocks into each function.
Although Function objects are crated during the CFG recovery, they are neither sound nor accurate. With a pre-constructed CFG, this method rebuilds all functions bearing the following rules:
A block may only belong to one function.
Small functions lying inside the startpoint and the endpoint of another function will be merged with the other function
Tail call optimizations are detected.
PLT stubs are aligned by 16.
- Returns:
None
- class angr.analyses.CalleeCleanupFinder
Bases:
Analysis- __init__(starts=None, hook_all=False)
- analyze(addr)
- class angr.analyses.CallingConventionAnalysis
Bases:
AnalysisAnalyze the calling convention of a function and guess a probable prototype.
The calling convention of a function can be inferred at both its call sites and the function itself. At call sites, we consider all register and stack variables that are not alive after the function call as parameters to this function. In the function itself, we consider all register and stack variables that are read but without initialization as parameters. Then we synthesize the information from both locations and make a reasonable inference of calling convention of this function.
- Variables:
_function – The function to recover calling convention for.
_variable_manager – A handy accessor to the variable manager.
_cfg – A reference of the CFGModel of the current binary. It is used to discover call sites of the current function in order to perform analysis at call sites.
analyze_callsites – True if we should analyze all call sites of the current function to determine the calling convention and arguments. This can be time-consuming if there are many call sites to analyze.
cc (
SimCC|None) – The recovered calling convention for the function._collect_facts – True if we should run FunctionFactCollector to collect input arguments and return value size. False if input arguments and return value size are provided by the user.
- __init__(func, cfg=None, analyze_callsites=False, caller_func_addr=None, callsite_block_addr=None, callsite_insn_addr=None, func_graph=None, input_args=None, retval_size=None, extra_pop=None, collect_facts=False, collect_facts_arg_uses=False, collect_facts_arg_passthru=False)
- Parameters:
cfg (CFGModel | None)
analyze_callsites (bool)
caller_func_addr (int | None)
callsite_block_addr (int | None)
callsite_insn_addr (int | None)
func_graph (DiGraph | None)
input_args (list[SimRegArg | SimStackArg] | None)
retval_size (int | None)
extra_pop (int | None)
collect_facts (bool)
collect_facts_arg_uses (bool)
collect_facts_arg_passthru (bool)
- class angr.analyses.ClassIdentifier
Bases:
AnalysisThis is a class identifier for non stripped or partially stripped binaries, it identifies classes based on the demangled function names, and also assigns functions to their respective classes based on their names. It also uses the results from the VtableFinder analysis to assign the corresponding vtable to the classes.
self.classes contains a mapping between class names and SimCppClass objects
e.g. A::tool() and A::qux() belong to the class A
- __init__()
- class angr.analyses.CodeCaveAnalysis
Bases:
AnalysisBest-effort static location of potential vacant code caves for possible code injection: - Padding functions - Unreachable code
- __init__()
- class angr.analyses.CodeTagging
Bases:
Analysis- __init__(func)
- analyze()
- has_xor()
Detects if there is any xor operation in the function.
- Returns:
Tags
- has_bitshifts()
Detects if there is any bitwise operation in the function.
- Returns:
Tags.
- has_sql()
Detects if there is any reference to strings that look like SQL queries.
- class angr.analyses.CompleteCallingConventionsAnalysis
Bases:
AnalysisImplements full-binary calling convention analysis. During the initial analysis of a binary, you may set recover_variables to True so that it will perform variable recovery on each function before performing calling convention analysis.
- __init__(mode=CallingConventionAnalysisMode.FASTISH, recover_variables=False, low_priority=False, force=False, cfg=None, analyze_callsites=False, skip_signature_matched_functions=False, max_function_blocks=None, max_function_size=None, workers=0, cc_callback=None, prioritize_func_addrs=None, skip_other_funcs=False, auto_start=True, func_graphs=None, target_functions=None)
- Parameters:
recover_variables – Recover variables on each function before performing calling convention analysis.
low_priority – Run in the background - periodically release GIL.
force – Perform calling convention analysis on functions even if they have calling conventions or prototypes already specified (or previously recovered).
cfg (
CFGFast|CFGModel|None) – The control flow graph model, which will be passed to CallingConventionAnalysis.analyze_callsites (
bool) – Consider artifacts at call sites when performing calling convention analysis.skip_signature_matched_functions (
bool) – Do not perform calling convention analysis on functions that match against existing FLIRT signatures.max_function_blocks (
int|None) – Do not perform calling convention analysis on functions with more than the specified number of blocks. Setting it to None disables this check.max_function_size (
int|None) – Do not perform calling convention analysis on functions whose sizes are more than max_function_size. Setting it to None disables this check.workers (
int) – Number of multiprocessing workers.cc_callback (Callable | None)
skip_other_funcs (bool)
auto_start (bool)
- work()
- prioritize_functions(func_addrs_to_prioritize)
Prioritize the analysis of specified functions.
- static function_needs_variable_recovery(func)
Check if running variable recovery on the function is the only way to determine the calling convention of the this function.
We do not need to run variable recovery to determine the calling convention of a function if: - The function is a SimProcedure. - The function is a PLT stub. - The function is a library function and we already know its prototype.
- Parameters:
func – The function object.
- Returns:
True if we must run VariableRecovery before we can determine what the calling convention of this function is. False otherwise.
- Return type:
- class angr.analyses.CongruencyCheck
Bases:
AnalysisThis is an analysis to ensure that angr executes things identically with different execution backends (i.e., unicorn vs vex).
- __init__(throw=False)
Initializes a CongruencyCheck analysis.
- Parameters:
throw – whether to raise an exception if an incongruency is found.
- set_state_options(left_add_options=None, left_remove_options=None, right_add_options=None, right_remove_options=None)
Checks that the specified state options result in the same states over the next depth states.
- set_states(left_state, right_state)
Checks that the specified paths stay the same over the next depth states.
- set_simgr(simgr)
- run(depth=None)
Checks that the paths in the specified path group stay the same over the next depth bytes.
The path group should have a “left” and a “right” stash, each with a single path.
- compare_path_group(pg)
- compare_states(sl, sr)
Compares two states for similarity.
- compare_paths(pl, pr)
- class angr.analyses.DataDependencyGraphAnalysis
Bases:
AnalysisThis is a DYNAMIC data dependency graph that utilizes a given SimState to produce a DDG graph that is accurate to the path the program took during execution.
This analysis utilizes the SimActionData objects present in the provided SimState’s action history to generate the dependency graph.
- __init__(end_state, start_from=None, end_at=None, block_addrs=None)
- Parameters:
end_state (
SimState) – Simulation state used to extract all SimActionDatastart_from (
int|None) – An address or None, Specifies where to start generation of DDGend_at (
int|None) – An address or None, Specifies where to end generation of DDGblock_addrs (
list[int] |None) – List of block addresses that the DDG analysis should be run on
- property graph: DiGraph | None
- property simplified_graph: DiGraph | None
- property sub_graph: DiGraph | None
- get_data_dep(g_node, include_tmp_nodes, backwards)
- Return type:
DiGraph|None- Parameters:
g_node (BaseDepNode)
include_tmp_nodes (bool)
backwards (bool)
- class angr.analyses.Decompiler
Bases:
AnalysisThe decompiler analysis.
Run this on a Function object for which a normalized CFG has been constructed. The fully processed output can be found in result.codegen.text
- __init__(func, cfg=None, options=None, preset=None, optimization_passes=None, sp_tracker_track_memory=True, variable_kb=None, peephole_optimizations=None, vars_must_struct=None, flavor='pseudocode', expr_comments=None, stmt_comments=None, ite_exprs=None, binop_operators=None, decompile=True, regen_clinic=True, inline_functions=None, desired_variables=None, update_memory_data=True, want_full_graph=False, generate_code=True, use_cache=True, update_cache=True, expr_collapse_depth=16, clinic_graph=None, clinic_arg_vvars=None, clinic_start_stage=None, clinic_end_stage=None, clinic_skip_stages=(), static_vvars=None, static_buffers=None, codegen_cls=<class 'angr.analyses.decompiler.structured_codegen.c.CStructuredCodeGenerator'>)
- Parameters:
preset (str | DecompilationPreset | None)
peephole_optimizations (Iterable[type[PeepholeOptimizationStmtBase] | type[PeepholeOptimizationExprBase]] | None)
update_memory_data (bool)
want_full_graph (bool)
generate_code (bool)
use_cache (bool)
update_cache (bool)
expr_collapse_depth (int)
static_vvars (dict | None)
static_buffers (dict | None)
- reflow_variable_types(cache)
Re-run type inference on an existing variable recovery result, then rerun codegen to generate new results.
- Returns:
- Parameters:
cache (DecompilationCache)
- find_data_references_and_update_memory_data(seq_node)
- Parameters:
seq_node (SequenceNode)
- transform_graph_from_ssa(ail_graph)
Translate an SSA AIL graph out of SSA form. This is useful for producing a non-SSA AIL graph for displaying in angr management.
- Parameters:
ail_graph (
DiGraph) – The AIL graph to transform out of SSA form.- Return type:
DiGraph- Returns:
The translated AIL graph.
- transform_seqnode_from_ssa(seq_node)
- Return type:
- Parameters:
seq_node (SequenceNode)
- llm_refine()
Use the configured LLM to suggest improved variable names, function names, and variable types. Returns True if any changes were made.
- Return type:
- llm_suggest_variable_names(llm_client=None, code_text=None, raise_exc=False)
Ask the LLM to suggest better variable names for the decompiled code. Returns True if any variables were renamed.
- llm_suggest_function_name(llm_client=None, code_text=None, raise_exc=False)
Ask the LLM to suggest a better function name. Only suggests rename for auto-generated names (starting with
sub_orfcn.). Returns True if the function was renamed.
- llm_suggest_variable_types(llm_client=None, code_text=None, raise_exc=False)
Ask the LLM to suggest better C types for variables. Returns True if any variable types were changed.
- llm_summarize_function(llm_client=None, code_text=None, raise_exc=False)
Ask the LLM to produce a natural-language summary of what the decompiled function does. The summary is stored in the DecompilationCache and returned.
Returns the summary string, or None if summarization failed.
- class angr.analyses.Disassembly
Bases:
AnalysisProduce formatted machine code disassembly.
- __init__(function=None, ranges=None, thumb=False, include_ir=False, block_bytes=None)
- func_lookup(block)
- parse_block(block)
Parse instructions for a given block node
- render(formatting=None, show_edges=True, show_addresses=True, show_bytes=False, ascii_only=None, color=True, min_edge_depth=0)
Render the disassembly to a string, with optional edges and addresses.
Color will be added by default, if enabled. To disable color pass an empty formatting dict.
- class angr.analyses.DominanceFrontier
Bases:
GenericComputes the dominance frontier of all nodes in a function graph, and provides an easy-to-use interface for querying the frontier information.
- __init__(func, func_graph=None, entry=None, exception_edges=False)
- Overloads:
self, func (Function), func_graph (networkx.DiGraph[T_co]), entry (T_co), exception_edges (bool)
self (DominanceFrontier[CodeNode]), func (Function), func_graph (networkx.DiGraph[CodeNode] | None), entry (CodeNode | None), exception_edges (bool)
- class angr.analyses.FactCollector
Bases:
AnalysisAn extremely fast analysis that extracts necessary facts of a function for CallingConventionAnalysis to make decision on the calling convention and prototype of a function.
- class angr.analyses.FastConstantPropagation
Bases:
AnalysisAn extremely fast constant propagation analysis that finds function-wide constant values with potentially high false negative rates.
- class angr.analyses.FlirtAnalysis
Bases:
AnalysisFlirtAnalysis accomplishes two purposes:
If a FLIRT signature file is specified, it will match the given signature file against the current binary and rename recognized functions accordingly.
If no FLIRT signature file is specified, it will use strings to determine possible libraries embedded in the current binary, and then match all possible signatures for the architecture.
- __init__(sig=None, max_mismatched_bytes=0, dry_run=False, match_named_functions=False)
- Parameters:
sig (FlirtSignature | str | None)
max_mismatched_bytes (int)
dry_run (bool)
match_named_functions (bool)
- class angr.analyses.ForwardAnalysis
Bases:
GenericThis is my very first attempt to build a static forward analysis framework that can serve as the base of multiple static analyses in angr, including CFG analysis, VFG analysis, DDG, etc.
In short, ForwardAnalysis performs a forward data-flow analysis by traversing a graph, compute on abstract values, and store results in abstract states. The user can specify what graph to traverse, how a graph should be traversed, how abstract values and abstract states are defined, etc.
ForwardAnalysis has a few options to toggle, making it suitable to be the base class of several different styles of forward data-flow analysis implementations.
ForwardAnalysis supports a special mode when no graph is available for traversal (for example, when a CFG is being initialized and constructed, no other graph can be used). In that case, the graph traversal functionality is disabled, and the optimal graph traversal order is not guaranteed. The user can provide a job sorting method to sort the jobs in queue and optimize traversal order.
Feel free to discuss with me (Fish) if you have any suggestions or complaints.
- __init__(order_jobs=False, allow_merging=False, allow_widening=False, status_callback=None, graph_visitor=None)
Constructor
- Parameters:
order_jobs (bool) – If all jobs should be ordered or not.
allow_merging (bool) – If job merging is allowed.
allow_widening (bool) – If job widening is allowed.
graph_visitor (
Optional[GraphVisitor[TypeVar(NodeType)]]) – A graph visitor to provide successors.status_callback (Callable[[ForwardAnalysis], Any] | None)
- Returns:
None
- property should_abort
Should the analysis be terminated. :return: True/False
- property graph: DiGraph
- property jobs
- abort()
Abort the analysis :return: None
- has_job(job)
Checks whether there exists another job which has the same job key. :type job:
TypeVar(JobType) :param job: The job to check.- Return type:
- Returns:
True if there exists another job with the same key, False otherwise.
- Parameters:
job (JobType)
- downsize()
- class angr.analyses.Identifier
Bases:
Analysis- __init__(cfg=None, require_predecessors=True, only_find=None)
- run(only_find=None)
- can_call_same_name(addr, name)
- get_func_info(func)
- static constrain_all_zero(before_state, state, regs)
- identify_func(function)
- check_tests(cfg_func, match_func)
- map_callsites()
- do_trace(addr_trace, reverse_accesses, func_info)
- get_call_args(func, callsite)
- static get_reg_name(arch, reg_offset)
- Parameters:
arch – the architecture
reg_offset – Tries to find the name of a register given the offset in the registers.
- Returns:
The register name
- find_stack_vars_x86(func)
- static make_initial_state(project, stack_length)
- Returns:
an initial state with a symbolic stack and good options for rop
- static make_symbolic_state(project, reg_list, stack_length=80)
converts an input state into a state with symbolic registers :return: the symbolic state
- class angr.analyses.InitializationFinder
Bases:
ForwardAnalysis,AnalysisFinds possible initializations for global data sections and generate an overlay to be used in other analyses later on.
- class angr.analyses.LanguageDetector
Bases:
AnalysisDetect the original programming language and compiler used to build a binary.
Supports detection of C (gcc, clang, msvc), Rust, Go, and Swift through multiple heuristic layers: DWARF debug info, .comment sections, symbol patterns, section names, and linked library names.
Usage:
result = project.analyses.LanguageDetector() print(result.language) # "rust" print(result.compiler) # "rustc" print(result.confidence) # "high" print(result.evidence) # ["symbol: __rust_alloc", ...]
- __init__()
- property language: str
- property confidence: LanguageDetectionConfidenceLevel
- class angr.analyses.LoopAnalysis
Bases:
AnalysisAnalyze loop nodes in a structured C code representation and extract relevant information about the loop, including - Loop block addresses - Loop exits - Loop type - Loop condition - Max iterations - Fixed iterations
- __init__(cfunc)
- Parameters:
cfunc (CFunction)
- class angr.analyses.LoopFinder
Bases:
AnalysisExtracts all the loops from all the functions in a binary.
- __init__(functions=None, normalize=True)
- class angr.analyses.LoopUnroller
Bases:
AnalysisUnroll a loop in an AIL graph for a specified number of iterations.
- class angr.analyses.PackingDetector
Bases:
AnalysisThis analysis detects if a binary is likely packed or not. We may extend it to identify which packer is in use in the future.
- PACKED_MIN_BYTES = 256¶
- PACKED_ENTROPY_MIN_THRESHOLD = 0.88¶
- __init__(cfg=None, region_size_threshold=32)
- analyze()
- class angr.analyses.PatchFinderAnalysis
Bases:
AnalysisLooks for binary patches using some basic heuristics: - Looking for interleaved functions - Looking for unaligned functions
- __init__()
- atypical_alignments: list[AtypicallyAlignedFunction]
- possibly_patched_out: list[PatchedOutFunctionality]
- class angr.analyses.Pathfinder
Bases:
Analysis- __init__(start_state, goal_addr, cfg, cache_size=10000)
- cache_state(state)
- Parameters:
state (SimState)
- marker_to_state(marker)
- Return type:
- Parameters:
marker (SimStateMarker)
- analyze()
- Return type:
- diagnose_unsat(state)
- Parameters:
state (SimState)
- test_path(bbl_addr_trace)
- Return type:
- Parameters:
- class angr.analyses.PropagatorAnalysis
Bases:
ForwardAnalysis,AnalysisPropagatorAnalysis implements copy propagation. It propagates values (either constant values or variables) and expressions inside a block or across a function.
PropagatorAnalysis only supports VEX. For AIL, please use SPropagator.
PropagatorAnalysis performs certain arithmetic operations between constants, including but are not limited to:
addition
subtraction
multiplication
division
xor
It also performs the following memory operations:
Loading values from a known address
Writing values to a stack variable
- property prop_key: tuple[str | None, str, int, bool, bool, bool]
Gets a key that represents the function and the “flavor” of the propagation result.
- property replacements
- class angr.analyses.ProximityGraphAnalysis
Bases:
AnalysisGenerate a proximity graph.
- __init__(func, cfg_model, xrefs, decompilation=None, expand_funcs=None)
- Parameters:
func (Function)
cfg_model (CFGModel)
xrefs (XRefManager)
decompilation (Decompiler | None)
- class angr.analyses.ReachingDefinitionsAnalysis
Bases:
ForwardAnalysis[ReachingDefinitionsState,NodeType,object,object,object],AnalysisReachingDefinitionsAnalysis is a text-book implementation of a static data-flow analysis that works on either a function or a block. It supports both VEX and AIL. By registering observers to observation points, users may use this analysis to generate use-def chains, def-use chains, and reaching definitions, and perform other traditional data-flow analyses such as liveness analysis.
I’ve always wanted to find a better name for this analysis. Now I gave up and decided to live with this name for the foreseeable future (until a better name is proposed by someone else).
Aliasing is definitely a problem, and I forgot how aliasing is resolved in this implementation. I’ll leave this as a post-graduation TODO.
Some more documentation and examples would be nice.
- __init__(subject, func_graph=None, max_iterations=30, track_tmps=False, track_consts=True, observation_points=None, init_state=None, init_context=None, state_initializer=None, cc=None, function_handler=None, observe_all=False, visited_blocks=None, dep_graph=True, observe_callback=None, canonical_size=8, stack_pointer_tracker=None, use_callee_saved_regs_at_return=True, interfunction_level=0, track_liveness=True, func_addr=None, element_limit=5, merge_into_tops=True)
- Parameters:
subject (
Subject|Block|Block|Function|str) – The subject of the analysis: a function, or a single basic blockfunc_graph – Alternative graph for function.graph.
max_iterations – The maximum number of iterations before the analysis is terminated.
track_tmps – Whether or not temporary variables should be taken into consideration during the analysis.
observation_points (
Iterable[tuple[Literal['insn','node','stmt','exit'],int|tuple[int,int] |tuple[int,int,int],ObservationPointType]] |None) – A collection of tuples of (“node”|”insn”, ins_addr, OP_TYPE) defining where reaching definitions should be copied and stored. OP_TYPE can be OP_BEFORE or OP_AFTER.init_state (
ReachingDefinitionsState|None) – An optional initialization state. The analysis creates and works on a copy. Default to None: the analysis then initialize its own abstract state, based on the given <Subject>.init_context – If init_state is not given, this is used to initialize the context field of the initial state’s CodeLocation. The only default-supported type which may go here is a tuple of integers, i.e. a callstack. Anything else requires a custom FunctionHandler.
cc – Calling convention of the function.
function_handler (
FunctionHandler|None) – The function handler to update the analysis state and results on function calls.observe_all – Observe every statement, both before and after.
visited_blocks – A set of previously visited blocks.
dep_graph (
DepGraph|bool|None) – An initial dependency graph to add the result of the analysis to. Set it to None to skip dependency graph generation.canonical_size – The sizes (in bytes) that objects with an UNKNOWN_SIZE are treated as for operations where sizes are necessary.
dep_graph – Set this to True to generate a dependency graph for the subject. It will be available as result.dep_graph.
interfunction_level (
int) – The number of functions we should recurse into. This parameter is only used if function_handler is not provided.track_liveness (
bool) – Whether to track liveness information. This can consume sizeable amounts of RAM on large functions. (e.g. ~15GB for a function with 4k nodes)merge_into_tops (
bool) – Merge known values into TOP if TOP is present. If True: {TOP} V {0xabc} = {TOP} If False: {TOP} V {0xabc} = {TOP, 0xabc}state_initializer (RDAStateInitializer | None)
func_addr (int | None)
element_limit (int)
- property all_definitions
- property all_uses
- property one_result
- property dep_graph: DepGraph
- property visited_blocks
- get_reaching_definitions_by_insn(ins_addr, op_type)
- get_reaching_definitions_by_node(node_addr, op_type)
- node_observe(node_addr, state, op_type, node_idx=None)
- Parameters:
node_addr (
int) – Address of the node.state (
ReachingDefinitionsState) – The analysis state.op_type (
ObservationPointType) – Type of the observation point. Must be one of the following: OP_BEFORE, OP_AFTER.node_idx (
int|None) – ID of the node. Used in AIL to differentiate blocks with the same address.
- Return type:
- insn_observe(insn_addr, stmt, block, state, op_type)
- Parameters:
insn_addr (
int) – Address of the instruction.state (
ReachingDefinitionsState) – The abstract analysis state.op_type (
ObservationPointType) – Type of the observation point. Must be one of the following: OP_BEORE, OP_AFTER.
- Return type:
- stmt_observe(stmt_idx, stmt, block, state, op_type)
- Parameters:
stmt_idx (
int)state (
ReachingDefinitionsState)op_type (
ObservationPointType)
- Return type:
- Returns:
- exit_observe(node_addr, exit_stmt_idx, block, state, node_idx=None)
- property subject
- callsites_to(target)
- Return type:
- Parameters:
- class angr.analyses.Reassembler
Bases:
AnalysisHigh-level representation of a binary with a linear representation of all instructions and data regions. After calling “symbolize”, it essentially acts as a binary reassembler.
Tested on CGC, x86 and x86-64 binaries.
Disclaimer: The reassembler is an empirical solution. Don’t be surprised if it does not work on some binaries.
- __init__(syntax='intel', remove_cgc_attachments=True, log_relocations=True)
- property instructions
Get a list of all instructions in the binary
- Returns:
A list of (address, instruction)
- Return type:
- property relocations
- property inserted_asm_before_label
- property inserted_asm_after_label
- property main_executable_regions
return:
- property main_nonexecutable_regions
return:
- section_alignment(section_name)
Get the alignment for the specific section. If the section is not found, 16 is used as default.
- main_executable_regions_contain(addr)
- Parameters:
addr
- Returns:
- main_executable_region_limbos_contain(addr)
Sometimes there exists a pointer that points to a few bytes before the beginning of a section, or a few bytes after the beginning of the section. We take care of that here.
- main_nonexecutable_regions_contain(addr)
- main_nonexecutable_region_limbos_contain(addr, tolerance_before=64, tolerance_after=64)
Sometimes there exists a pointer that points to a few bytes before the beginning of a section, or a few bytes after the beginning of the section. We take care of that here.
- register_instruction_reference(insn_addr, ref_addr, sort, operand_offset)
- register_data_reference(data_addr, ref_addr)
- add_label(name, addr)
Add a new label to the symbol manager.
- insert_asm(addr, asm_code, before_label=False)
Insert some assembly code at the specific address. There must be an instruction starting at that address.
- append_procedure(name, asm_code)
Add a new procedure with specific name and assembly code.
- append_data(name, initial_content, size, readonly=False, sort='unknown')
Append a new data entry into the binary with specific name, content, and size.
- remove_instruction(ins_addr)
- Parameters:
ins_addr
- Returns:
- randomize_procedures()
- Returns:
- symbolize()
- assembly(comments=False, symbolized=True)
- remove_cgc_attachments()
Remove CGC attachments.
- Returns:
True if CGC attachments are found and removed, False otherwise
- Return type:
- remove_unnecessary_stuff()
Remove unnecessary functions and data
- Returns:
None
- remove_unnecessary_stuff_glibc()
- fast_memory_load(addr, size, data_type, endness='Iend_LE')
Load memory bytes from loader’s memory backend.
- class angr.analyses.SLivenessAnalysis
Bases:
AnalysisCalculates LiveIn and LiveOut sets for each block in a partial-SSA function.
- __init__(func, func_graph, entry=None, func_addr=None, arg_vvars=None)
- interference_graph()
Generate an interference graph based on the liveness analysis result.
- Returns:
A networkx.Graph instance.
- Return type:
networkx.Graph[int]
- class angr.analyses.SPropagatorAnalysis
Bases:
AnalysisConstant and expression propagation that only supports SSA AIL graphs.
- __init__(subject, *, ail_manager, func_graph=None, only_consts=True, stack_pointer_tracker=None, func_args=None, func_addr=None, stack_arg_offsets=None)
- property replacements
- property dead_vvar_ids
- static is_global_variable_updated(func_graph, block_dict, varid, gv_addr, gv_size, defloc, useloc)
- Return type:
- Parameters:
varid (int)
gv_addr (int)
gv_size (int)
defloc (AILCodeLocation)
useloc (AILCodeLocation)
- static is_vvar_used_for_addr_loading_switch_case(uselocs, blocks)
Check if a virtual variable is used for loading an address in a switch-case construct.
- Parameters:
uselocs (
set[AILCodeLocation]) – The use locations of the virtual variable.blocks – All blocks of the current function.
- Return type:
- Returns:
True if the virtual variable is used for loading an address in a switch-case construct, False otherwise.
- static replace(replacements, loc, expr, value)
- Return type:
- Parameters:
replacements (dict[AILCodeLocation, dict[VirtualVariable | Tmp, Expression]])
loc (AILCodeLocation)
expr (VirtualVariable | Tmp)
value (Expression)
- class angr.analyses.SReachingDefinitionsAnalysis
Bases:
AnalysisConstant and expression propagation that only supports SSA AIL graphs.
- __init__(subject, func_addr=None, func_graph=None, func_args=None, use_callee_saved_regs_at_return=False, track_tmps=False)
- class angr.analyses.SelfModifyingCodeAnalysis
Bases:
AnalysisDetermine if some piece of code is self-modifying.
This determination is made by simply executing. If an address is executed that is also written to, the code is determined to be self-modifying. The determination is stored in the result property. The regions property contains a list of (addr, length) regions that were both written to and executed.
- __init__(subject, max_bytes=0, state=None)
- result: bool
- class angr.analyses.SootClassHierarchy
Bases:
AnalysisGenerate complete hierarchy.
- __init__()
- init_hierarchy()
- has_super_class(cls)
- is_subclass_including(cls_child, cls_parent)
- is_subclass(cls_child, cls_parent)
- is_visible_method(cls, method)
- is_visible_class(cls_from, cls_to)
- get_super_classes(cls)
- get_super_classes_including(cls)
- get_implementers(interface)
- get_sub_interfaces_including(interface)
- get_sub_interfaces(interface)
- get_sub_classes(cls)
- get_sub_classes_including(cls)
- resolve_abstract_dispatch(cls, method)
- resolve_concrete_dispatch(cls, method)
- resolve_special_dispatch(method, container)
- resolve_invoke(invoke_expr, method, container)
- class angr.analyses.StackPointerTracker
Bases:
Analysis,ForwardAnalysisTrack the offset of stack pointer at the end of each basic block of a function.
- offset_after(addr, reg)
- offset_before(addr, reg)
- offset_after_block(block_addr, reg)
- offset_before_block(block_addr, reg)
- constant_after(addr, reg)
- constant_before(addr, reg)
- constant_after_block(block_addr, reg)
- constant_before_block(block_addr, reg)
- property inconsistent
- inconsistent_for(reg)
- offsets_for(reg)
- class angr.analyses.StaticHooker
Bases:
AnalysisThis analysis works on statically linked binaries - it finds the library functions statically linked into the binary and hooks them with the appropriate simprocedures.
Right now it only works on unstripped binaries, but hey! There’s room to grow!
- __init__(library, binary=None)
- class angr.analyses.StaticObjectFinder
Bases:
Analysis- This analysis tries to find objects on the heap based on calls to new(), and subsequent calls to constructors with
the ‘this’ pointer
- __init__()
- class angr.analyses.Typehoon
Bases:
AnalysisA spiritual tribute to the long-standing typehoon project that @jmg (John Grosen) worked on during his days in the angr team. Now I feel really bad of asking the poor guy to work directly on VEX IR without any fancy static analysis support as we have right now…
Typehoon analysis implements a pushdown system that simplifies and solves type constraints. Our type constraints are largely an implementation of the paper Polymorphic Type Inference for Machine Code by Noonan, Loginov, and Cok from GrammaTech (with missing functionality support and bugs, of course). Type constraints are collected by running VariableRecoveryFast (maybe VariableRecovery later as well) on a function, and then solved using this analysis.
User may specify ground truth, which will override all types at certain program points during constraint solving.
- __init__(constraints, func_var, ground_truth=None, var_mapping=None, must_struct=None, stackvar_max_sizes=None, stack_offset_tvs=None, constraint_set_degradation_threshold=150, type_translator=None, tv_manager=None)
- Parameters:
constraints
ground_truth – A set of SimType-style solutions for some or all type variables. They will be respected during type solving.
var_mapping (
dict[SimVariable,set[TypeVariable]] |None)must_struct (
set[TypeVariable] |None)stackvar_max_sizes (dict[TypeVariable, int] | None)
stack_offset_tvs (dict[int, TypeVariable] | None)
constraint_set_degradation_threshold (int)
type_translator (TypeTranslator | None)
tv_manager (TypeVariableManager | None)
- update_variable_types(func_addr, var_to_typevars, stack_offset_tvs=None)
- Return type:
- Parameters:
var_to_typevars (dict[SimVariable, set[TypeVariable]])
stack_offset_tvs (dict[int, TypeVariable] | None)
- pp_constraints()
Pretty-print constraints between variables using the variable mapping.
- Return type:
- pp_solution()
Pretty-print solutions using the variable mapping.
- Return type:
- class angr.analyses.VariableRecovery
Bases:
ForwardAnalysis,VariableRecoveryBaseRecover “variables” from a function using forced execution.
While variables play a very important role in programming, it does not really exist after compiling. However, we can still identify and recovery their counterparts in binaries. It is worth noting that not every variable in source code can be identified in binaries, and not every recognized variable in binaries have a corresponding variable in the original source code. In short, there is no guarantee that the variables we identified/recognized in a binary are the same variables in its source code.
This analysis uses heuristics to identify and recovers the following types of variables: - Register variables. - Stack variables. - Heap variables. (not implemented yet) - Global variables. (not implemented yet)
This analysis takes a function as input, and performs a data-flow analysis on nodes. It runs concrete execution on every statement and hooks all register/memory accesses to discover all places that are accessing variables. It is slow, but has a more accurate analysis result. For a fast but inaccurate variable recovery, you may consider using VariableRecoveryFast.
This analysis follows SSA, which means every write creates a new variable in registers or memory (statck, heap, etc.). Things may get tricky when overlapping variable (in memory, as you cannot really have overlapping accesses to registers) accesses exist, and in such cases, a new variable will be created, and this new variable will overlap with one or more existing variables. A decision procedure (which is pretty much TODO) is required at the end of this analysis to resolve the conflicts between overlapping variables.
- __init__(func, max_iterations=20, store_live_variables=False)
- Parameters:
func (knowledge.Function) – The function to analyze.
- class angr.analyses.VariableRecoveryFast
Bases:
ForwardAnalysis,VariableRecoveryBaseRecover “variables” from a function by keeping track of stack pointer offsets and pattern matching VEX statements.
If calling conventions are recovered prior to running VariableRecoveryFast, variables can be recognized more accurately. However, it is not a requirement. In this case, the function graph you pass must contain information indicating the call-out sites inside the analyzed function. These graph edges must be annotated with either
"type": "call"or"outside": True.
- class angr.analyses.Veritesting
Bases:
AnalysisAn exploration technique made for condensing chunks of code to single (nested) if-then-else constraints via CFG accurate to conduct Static Symbolic Execution SSE (conversion to single constraint)
- cfg_cache = {}¶
- all_stashes = ('successful', 'errored', 'deadended', 'deviated', 'unconstrained')¶
- __init__(input_state, boundaries=None, loop_unrolling_limit=10, enable_function_inlining=False, terminator=None, deviation_filter=None)
SSE stands for Static Symbolic Execution, and we also implemented an extended version of Veritesting (Avgerinos, Thanassis, et al, ICSE 2014).
- Parameters:
input_state – The initial state to begin the execution with.
boundaries – Addresses where execution should stop.
loop_unrolling_limit – The maximum times that Veritesting should unroll a loop for.
enable_function_inlining – Whether we should enable function inlining and syscall inlining.
terminator – A callback function that takes a state as parameter. Veritesting will terminate if this function returns True.
deviation_filter – A callback function that takes a state as parameter. Veritesting will put the state into “deviated” stash if this function returns True.
- is_not_in_cfg(s)
Returns if s.addr is not a proper node in our CFG.
- Parameters:
s (SimState) – The SimState instance to test.
- Returns bool:
False if our CFG contains p.addr, True otherwise.
- is_overbound(state)
Filter out all states that run out of boundaries or loop too many times.
param SimState state: SimState instance to check returns bool: True if outside of mem/loop_ctr boundary
- class angr.analyses.VtableFinder
Bases:
AnalysisThis analysis locates Vtables in a binary based on heuristics taken from - “Reconstruction of Class Hierarchies for Decompilation of C++ Programs”
- __init__()
- is_cross_referenced(addr)
- is_function(addr)
- analyze()
- create_extract_vtable(start_addr, sec_size)
- class angr.analyses.XRefsAnalysis
Bases:
ForwardAnalysis,AnalysisXRefsAnalysis recovers in-depth x-refs (cross-references) in disassembly code.
Here is an example:
.text: 000023C8 LDR R2, =time_now 000023CA LDR R3, [R2] 000023CC ADDS R3, #1 000023CE STR R3, [R2] 000023D0 BX LR .bss: 1FFF36F4 time_now % 4
You will have the following x-refs for time_now:
23c8 - offset 23ca - read access 23ce - write access
- angr.analyses.register_analysis(cls, name)
Submodules