angr.analyses.cfg

class angr.analyses.cfg.CFG

Bases: CFGFast

tl;dr: CFG is just a wrapper around CFGFast for compatibility issues. It will be fully replaced by CFGFast in future releases. Feel free to use CFG if you intend to use CFGFast. Please use CFGEmulated if you have to use the old, slow, dynamically-generated version of CFG.

For multiple historical reasons, angr’s CFG is accurate but slow, which does not meet what most people expect. We developed CFGFast for light-speed CFG recovery, and renamed the old CFG class to CFGEmulated. For compatibility concerns, CFG was kept as an alias to CFGEmulated.

However, so many new users of angr would load up a binary and generate a CFG immediately after running “pip install angr”, and draw the conclusion that “angr’s CFG is so slow - angr must be unusable!” Therefore, we made the hard decision: CFG will be an alias to CFGFast, instead of CFGEmulated.

To ease the transition of your existing code and script, the following changes are made:

  • A CFG class, which is a sub class of CFGFast, is created.

  • You will see both a warning message printed out to stderr and an exception raised by angr if you are passing CFG any parameter that only CFGEmulated supports. This exception is not a sub class of AngrError, so you wouldn’t capture it with your old code by mistake.

  • In the near future, this wrapper class will be removed completely, and CFG will be a simple alias to CFGFast.

We expect most interfaces are the same between CFGFast and CFGEmulated. Apparently some functionalities (like context-sensitivity, and state keeping) only exist in CFGEmulated, which is when you want to use CFGEmulated instead.

class angr.analyses.cfg.CFBlanket

Bases: Analysis

A Control-Flow Blanket is a representation for storing all instructions, data entries, and bytes of a full program.

Region types: - section - segment - extern - tls - kernel

__init__(exclude_region_types=None, on_object_added=None)
Parameters:
  • on_object_added (Callable[[int, Any], None] | None) – Callable with parameters (addr, obj) called after an object is added to the blanket.

  • exclude_region_types (set[str] | None)

property regions

Return all memory regions.

floor_addr(addr)
floor_item(addr)
floor_items(addr=None, reverse=False)
ceiling_addr(addr)
ceiling_item(addr)
ceiling_items(addr=None, reverse=False, include_first=True)
add_obj(addr, obj)

Adds an object obj to the blanket at the specified address addr

add_function(func)

Add a function func and all blocks of this function to the blanket.

dbg_repr()

The debugging representation of this CFBlanket.

Returns:

The debugging representation of this CFBlanket.

Return type:

str

class angr.analyses.cfg.CFGArchOptions

Bases: object

Stores architecture-specific options and settings, as well as the detailed explanation of those options and settings.

Suppose ao is the CFGArchOptions object, and there is an option called ret_jumpkind_heuristics, you can access it by ao.ret_jumpkind_heuristics and set its value via ao.ret_jumpkind_heuristics = True

Variables:
  • OPTIONS (dict) – A dict of all default options for different architectures.

  • arch (archinfo.Arch) – The architecture object.

  • _options (dict) – Values of all CFG options that are specific to the current architecture.

OPTIONS = {'ARMCortexM': {'has_arm_code': (<class 'bool'>, False), 'pattern_match_ifuncs': (<class 'bool'>, True), 'ret_jumpkind_heuristics': (<class 'bool'>, True), 'switch_mode_on_nodecode': (<class 'bool'>, False)}, 'ARMEL': {'has_arm_code': (<class 'bool'>, True), 'pattern_match_ifuncs': (<class 'bool'>, True), 'ret_jumpkind_heuristics': (<class 'bool'>, True), 'switch_mode_on_nodecode': (<class 'bool'>, True)}, 'ARMHF': {'has_arm_code': (<class 'bool'>, True), 'pattern_match_ifuncs': (<class 'bool'>, True), 'ret_jumpkind_heuristics': (<class 'bool'>, True), 'switch_mode_on_nodecode': (<class 'bool'>, True)}}
__init__(arch, **options)

Constructor.

Parameters:
  • arch (archinfo.Arch) – The architecture instance.

  • options (dict) – Architecture-specific options, which will be used to initialize this object.

arch = None
class angr.analyses.cfg.CFGBase

Bases: Analysis

The base class for control flow graphs.

tag: str = None
addr_type: Literal['int', 'block_id', 'soot'] = None
__init__(sort, context_sensitivity_level, normalize=False, binary=None, objects=None, regions=None, exclude_sparse_regions=True, skip_specific_regions=True, force_segment=False, base_state=None, resolve_indirect_jumps=True, indirect_jump_resolvers=None, indirect_jump_target_limit=100000, detect_tail_calls=False, low_priority=False, skip_unmapped_addrs=True, sp_tracking_track_memory=True, model=None)
Parameters:
  • sort (str) – ‘fast’ or ‘emulated’.

  • context_sensitivity_level (int) – The level of context-sensitivity of this CFG (see documentation for further details). It ranges from 0 to infinity.

  • normalize (bool) – Whether the CFG as well as all Function graphs should be normalized.

  • binary (cle.backends.Backend) – The binary to recover CFG on. By default, the main binary is used.

  • objects – A list of objects to recover the CFG on. By default, it will recover the CFG of all loaded objects.

  • regions (iterable) – A list of tuples in the form of (start address, end address) describing memory regions that the CFG should cover.

  • force_segment (bool) – Force CFGFast to rely on binary segments instead of sections.

  • base_state (angr.SimState) – A state to use as a backer for all memory loads.

  • resolve_indirect_jumps (bool) – Whether to try to resolve indirect jumps. This is necessary to resolve jump targets from jump tables, etc.

  • indirect_jump_resolvers (list) – A custom list of indirect jump resolvers. If this list is None or empty, default indirect jump resolvers specific to this architecture and binary types will be loaded.

  • indirect_jump_target_limit (int) – Maximum indirect jump targets to be recovered.

  • skip_unmapped_addrs – Ignore all branches into unmapped regions. True by default. You may want to set it to False if you are analyzing manually patched binaries or malware samples.

  • detect_tail_calls (bool) – Aggressive tail-call optimization detection. This option is only respected in make_functions().

  • sp_tracking_track_memory (bool) – Whether or not to track memory writes if tracking the stack pointer. This increases the accuracy of stack pointer tracking, especially for architectures without a base pointer. Only used if detect_tail_calls is enabled.

  • model (None or CFGModel) – The CFGModel instance to write to. A new CFGModel instance will be created and registered with the knowledge base if model is None.

Returns:

None

indirect_jumps: dict[int, IndirectJump]
property model: CFGModel

Get the CFGModel instance. :return: The CFGModel instance that this analysis currently uses.

property normalized
property context_sensitivity_level
property functions

A reference to the FunctionManager in the current knowledge base.

Returns:

FunctionManager with all functions

Return type:

angr.knowledge_plugins.FunctionManager

make_copy(copy_to)

Copy self attributes to the new object.

Parameters:

copy_to (CFGBase) – The target to copy to.

Returns:

None

copy()
output()
generate_index()

Generate an index of all nodes in the graph in order to speed up get_any_node() with anyaddr=True.

Returns:

None

get_loop_back_edges()
property graph: SpillingCFG
remove_edge(block_from, block_to)
is_thumb_addr(addr)
record_memory_data_addr(addr)

Record the address of a newly added memory data object.

Return type:

None

Parameters:

addr (int)

reset_memory_data_addrs()

Reset the set of addresses of newly added memory data objects.

Return type:

None

normalize()

Normalize the CFG, making sure that there are no overlapping basic blocks.

Note that this method will not alter transition graphs of each function in self.kb.functions. You may call normalize() on each Function object to normalize their transition graphs.

Returns:

None

mark_function_alignments()

Find all potential function alignments and mark them.

Note that it is not always correct to simply remove them, because these functions may not be actual alignments but part of an actual function, and is incorrectly marked as an individual function because of failures in resolving indirect jumps. An example is in the test binary x86_64/dir_gcc_-O0 0x40541d (indirect jump at 0x4051b0). If the indirect jump cannot be correctly resolved, removing function 0x40541d will cause a missing label failure in reassembler.

Returns:

None

make_functions()

Revisit the entire control flow graph, create Function instances accordingly, and correctly put blocks into each function.

Although Function objects are created during the CFG recovery, they are neither sound nor accurate. With a pre-constructed CFG, this method rebuilds all functions bearing the following rules:

  • A block may only belong to one function.

  • Small functions lying inside the startpoint and the endpoint of another function will be merged with the other function

  • Tail call optimizations are detected.

  • PLT stubs are aligned by 16.

Returns:

None

class angr.analyses.cfg.CFGEmulated

Bases: ForwardAnalysis, CFGBase

This class represents a control-flow graph.

tag: str = 'CFGEmulated'
addr_type: Literal['int', 'block_id', 'soot'] = 'block_id'
__init__(context_sensitivity_level=1, start=None, avoid_runs=None, enable_function_hints=False, call_depth=None, call_tracing_filter=None, initial_state=None, starts=None, keep_state=False, indirect_jump_target_limit=100000, resolve_indirect_jumps=True, enable_advanced_backward_slicing=False, enable_symbolic_back_traversal=False, indirect_jump_resolvers=None, additional_edges=None, no_construct=False, normalize=False, max_iterations=1, address_whitelist=None, base_graph=None, iropt_level=None, max_steps=None, state_add_options=None, state_remove_options=None, model=None)

All parameters are optional.

Parameters:
  • context_sensitivity_level – The level of context-sensitivity of this CFG (see documentation for further details). It ranges from 0 to infinity. Default 1.

  • avoid_runs – A list of runs to avoid.

  • enable_function_hints – Whether to use function hints (constants that might be used as exit targets) or not.

  • call_depth – How deep in the call stack to trace.

  • call_tracing_filter – Filter to apply on a given path and jumpkind to determine if it should be skipped when call_depth is reached.

  • initial_state – An initial state to use to begin analysis.

  • starts (iterable) – A collection of starting points to begin analysis. It can contain the following three different types of entries: an address specified as an integer, a 2-tuple that includes an integer address and a jumpkind, or a SimState instance. Unsupported entries in starts will lead to an AngrCFGError being raised.

  • keep_state – Whether to keep the SimStates for each CFGNode.

  • resolve_indirect_jumps – Whether to enable the indirect jump resolvers for resolving indirect jumps

  • enable_advanced_backward_slicing – Whether to enable an intensive technique for resolving indirect jumps

  • enable_symbolic_back_traversal – Whether to enable an intensive technique for resolving indirect jumps

  • indirect_jump_resolvers (list) – A custom list of indirect jump resolvers. If this list is None or empty, default indirect jump resolvers specific to this architecture and binary types will be loaded.

  • additional_edges – A dict mapping addresses of basic blocks to addresses of successors to manually include and analyze forward from.

  • no_construct (bool) – Skip the construction procedure. Only used in unit-testing.

  • normalize (bool) – If the CFG as well as all Function graphs should be normalized or not.

  • max_iterations (int) – The maximum number of iterations that each basic block should be “executed”. 1 by default. Larger numbers of iterations are usually required for complex analyses like loop analysis.

  • address_whitelist (iterable) – A list of allowed addresses. Any basic blocks outside of this collection of addresses will be ignored.

  • base_graph (networkx.DiGraph) – A basic control flow graph to follow. Each node inside this graph must have the following properties: addr and size. CFG recovery will strictly follow nodes and edges shown in the graph, and discard any control flow that does not follow an existing edge in the base graph. For example, you can pass in a Function local transition graph as the base graph, and CFGEmulated will traverse nodes and edges and extract useful information.

  • iropt_level (int) – The optimization level of VEX IR (0, 1, 2). The default level will be used if iropt_level is None.

  • max_steps (int) – The maximum number of basic blocks to recover forthe longest path from each start before pausing the recovery procedure.

  • state_add_options – State options that will be added to the initial state.

  • state_remove_options – State options that will be removed from the initial state.

copy()

Make a copy of the CFG.

Return type:

CFGEmulated

Returns:

A copy of the CFG instance.

resume(starts=None, max_steps=None)

Resume a paused or terminated control flow graph recovery.

Parameters:
  • starts (iterable) – A collection of new starts to resume from. If starts is None, we will resume CFG recovery from where it was paused before.

  • max_steps (int) – The maximum number of blocks on the longest path starting from each start before pausing the recovery.

Returns:

None

remove_cycles()

Forces graph to become acyclic, removes all loop back edges and edges between overlapped loop headers and their successors.

downsize()

Remove saved states from all CFGNodes to reduce memory usage.

Returns:

None

unroll_loops(max_loop_unrolling_times)

Unroll loops for each function. The resulting CFG may still contain loops due to recursion, function calls, etc.

Parameters:

max_loop_unrolling_times (int) – The maximum iterations of unrolling.

Returns:

None

force_unroll_loops(max_loop_unrolling_times)

Unroll loops globally. The resulting CFG does not contain any loop, but this method is slow on large graphs.

Parameters:

max_loop_unrolling_times (int) – The maximum iterations of unrolling.

Returns:

None

immediate_dominators(start, target_graph=None)

Get all immediate dominators of sub graph from given node upwards.

Parameters:
  • start (str) – id of the node to navigate forwards from.

  • target_graph (networkx.classes.digraph.DiGraph) – graph to analyse, default is self.graph.

Returns:

each node of graph as index values, with element as respective node’s immediate dominator.

Return type:

dict

immediate_postdominators(end, target_graph=None)

Get all immediate postdominators of sub graph from given node upwards.

Parameters:
  • start (str) – id of the node to navigate forwards from.

  • target_graph (networkx.classes.digraph.DiGraph) – graph to analyse, default is self.graph.

Returns:

each node of graph as index values, with element as respective node’s immediate dominator.

Return type:

dict

remove_fakerets()

Get rid of fake returns (i.e., Ijk_FakeRet edges) from this CFG

Returns:

None

get_topological_order(cfg_node)

Get the topological order of a CFG Node.

Parameters:

cfg_node – A CFGNode instance.

Returns:

An integer representing its order, or None if the CFGNode does not exist in the graph.

get_subgraph(starting_node, block_addresses)

Get a sub-graph out of a bunch of basic block addresses.

Parameters:
  • starting_node (CFGNode) – The beginning of the subgraph

  • block_addresses (iterable) – A collection of block addresses that should be included in the subgraph if there is a path between starting_node and a CFGNode with the specified address, and all nodes on the path should also be included in the subgraph.

Returns:

A new CFG that only contain the specific subgraph.

Return type:

CFGEmulated

get_function_subgraph(start, max_call_depth=None)

Get a sub-graph of a certain function.

Parameters:
  • start – The function start. Currently it should be an integer.

  • max_call_depth – Call depth limit. None indicates no limit.

Returns:

A CFG instance which is a sub-graph of self.graph

property context_sensitivity_level
property graph: SpillingCFG
property unresolvables

Get those SimRuns that have non-resolvable exits.

Returns:

A set of SimRuns

Return type:

set

property deadends

Get all CFGNodes that has an out-degree of 0

Returns:

A list of CFGNode instances

Return type:

list

class angr.analyses.cfg.CFGFast

Bases: ForwardAnalysis[CFGNode, CFGNode, CFGJob, int, object], CFGBase

We find functions inside the given binary, and build a control-flow graph in very fast manners: instead of simulating program executions, keeping track of states, and performing expensive data-flow analysis, CFGFast will only perform light-weight analyses combined with some heuristics, and with some strong assumptions.

In order to identify as many functions as possible, and as accurate as possible, the following operation sequence is followed:

# Active scanning

  • If the binary has “function symbols” (TODO: this term is not accurate enough), they are starting points of the code scanning

  • If the binary does not have any “function symbol”, we will first perform a function prologue scanning on the entire binary, and start from those places that look like function beginnings

  • Otherwise, the binary’s entry point will be the starting point for scanning

# Passive scanning

  • After all active scans are done, we will go through the whole image and scan all code pieces

Due to the nature of those techniques that are used here, a base address is often not required to use this analysis routine. However, with a correct base address, CFG recovery will almost always yield a much better result. A custom analysis, called GirlScout, is specifically made to recover the base address of a binary blob. After the base address is determined, you may want to reload the binary with the new base address by creating a new Project object, and then re-recover the CFG.

PRINTABLES = b'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r'
SPECIAL_THUNKS = {'AMD64': {b'\xe8\x07\x00\x00\x00\xf3\x90\x0f\xae\xe8\xeb\xf9H\x89\x04$\xc3': ('jmp', 'rax'), b'\xe8\x07\x00\x00\x00\xf3\x90\x0f\xae\xe8\xeb\xf9H\x8dd$\x08\xc3': ('ret',)}}
tag: str = 'CFGFast'
addr_type: Literal['int', 'block_id', 'soot'] = 'int'
__init__(binary=None, objects=None, regions=None, pickle_intermediate_results=False, symbols=True, function_prologues=None, resolve_indirect_jumps=True, force_segment=False, force_smart_scan=None, force_complete_scan=False, indirect_jump_target_limit=100000, data_references=True, cross_references=False, normalize=False, start_at_entry=True, function_starts=None, extra_memory_regions=None, data_type_guessing_handlers=None, arch_options=None, indirect_jump_resolvers=None, base_state=None, exclude_sparse_regions=True, skip_specific_regions=True, heuristic_plt_resolving=None, detect_tail_calls=False, low_priority=False, cfb=None, model=None, eh_frame=True, exceptions=True, skip_unmapped_addrs=True, nodecode_window_size=512, nodecode_threshold=0.3, nodecode_step=16483, check_funcret_max_job=500, indirect_calls_always_return=None, jumptable_resolver_resolves_calls=None, retedges=False, drop_bad_funcs=True, start=None, end=None, collect_data_references=None, extra_cross_references=None, elf_eh_frame=None, **extra_arch_options)
Parameters:
  • binary – The binary to recover CFG on. By default the main binary is used.

  • objects – A list of objects to recover the CFG on. By default it will recover the CFG of all loaded objects.

  • regions (iterable) – A list of tuples in the form of (start address, end address) describing memory regions that the CFG should cover.

  • pickle_intermediate_results (bool) – If we want to store the intermediate results or not.

  • symbols (bool) – Get function beginnings from symbols in the binary.

  • function_prologues (bool | None) – Scan the binary for function prologues, and use those positions as function beginnings

  • resolve_indirect_jumps (bool) – Try to resolve indirect jumps. This is necessary to resolve jump targets from jump tables, etc.

  • force_segment (bool) – Force CFGFast to rely on binary segments instead of sections.

  • force_complete_scan (bool) – Perform a complete scan on the binary and maximize the number of identified code blocks.

  • data_references (bool) – Enables the collection of references to data used by individual instructions. This does not collect ‘cross-references’, particularly those that involve multiple instructions. For that, see cross_references

  • cross_references (bool) – Whether CFGFast should collect “cross-references” from the entire program or not. This will populate the knowledge base with references to and from each recognizable address constant found in the code. Note that, because this performs constant propagation on the entire program, it may be much slower and consume more memory. This option implies data_references=True.

  • normalize (bool) – Normalize the CFG as well as all function graphs after CFG recovery.

  • start_at_entry (bool) – Begin CFG recovery at the entry point of this project. Setting it to False prevents CFGFast from viewing the entry point as one of the starting points of code scanning.

  • function_starts (list) – A list of extra function starting points. CFGFast will try to resume scanning from each address in the list.

  • extra_memory_regions (list) – A list of 2-tuple (start-address, end-address) that shows extra memory regions. Integers falling inside will be considered as pointers.

  • indirect_jump_resolvers (list) – A custom list of indirect jump resolvers. If this list is None or empty, default indirect jump resolvers specific to this architecture and binary types will be loaded.

  • base_state – A state to use as a backer for all memory loads

  • detect_tail_calls (bool) – Enable aggressive tail-call optimization detection.

  • eh_frame (bool) – Retrieve function starts (and maybe sizes later) from the .eh_frame of ELF binaries or exception records of PE binaries.

  • skip_unmapped_addrs – Ignore all branches into unmapped regions. True by default. You may want to set it to False if you are analyzing manually patched binaries or malware samples.

  • indirect_calls_always_return (bool | None) – Should CFG assume indirect calls must return or not. Assuming indirect calls must return will significantly reduce the number of constant propagation runs, but may reduce the overall CFG recovery precision when facing non-returning indirect calls. By default, we only assume indirect calls always return for large binaries (region > 50KB).

  • jumptable_resolver_resolves_calls (bool | None) – Whether JumpTableResolver should resolve indirect calls or not. Most indirect calls in C++ binaries or UEFI binaries cannot be resolved using jump table resolver and must be resolved using their specific resolvers. By default, we will only disable JumpTableResolver from resolving indirect calls for large binaries (region > 50 KB).

  • check_funcret_max_job – When popping return-site jobs out of the job queue, angr will prioritize jobs for which the callee is known to return. This check may be slow when there are a large amount of jobs in different caller functions, and this situation often occurs in obfuscated binaries where many functions never return. This parameter acts as a threshold to disable this check when the number of jobs in the queue exceeds this threshold.

  • start (int) – (Deprecated) The beginning address of CFG recovery.

  • end (int) – (Deprecated) The end address of CFG recovery.

  • arch_options (CFGArchOptions) – Architecture-specific options.

  • extra_arch_options – Any key-value pair in kwargs will be seen as an arch-specific option and will be used to set the option value in self._arch_options.

  • retedges (bool) – Whether to add return edges (from function endpoints to their return sites) in the CFG. Return edges are not added by default because they are often not useful during analysis; You can set retedges to True or call make_return_edges() after CFG recovery to create return edges. Note that this option does not impact function graphs.

  • progress_callback – (Inherited from angr.Analysis.) Callback for CFG recovery progress.

  • show_progressbar (bool) – (Inherited from angr.Analysis.) Show a progressbar during CFG recovery.

  • force_smart_scan (bool | None)

  • drop_bad_funcs (bool)

Returns:

None

stage: str
property graph: SpillingCFG
property memory_data
property jump_tables
property insn_addr_to_memory_data
do_full_xrefs(overlay_state=None)

Perform xref recovery on all functions.

Parameters:

overlay (SimState) – An overlay state for loading constant data.

Returns:

None

drop_bad_functions()
make_return_edges()

For each returning function, create return edges in self.graph.

Returns:

None

copy()
output()
class angr.analyses.cfg.CFGFastSoot

Bases: CFGFast

addr_type: Literal['int', 'block_id', 'soot'] = 'soot'
drop_bad_functions()
make_functions()

Revisit the entire control flow graph, create Function instances accordingly, and correctly put blocks into each function.

Although Function objects are crated during the CFG recovery, they are neither sound nor accurate. With a pre-constructed CFG, this method rebuilds all functions bearing the following rules:

  • A block may only belong to one function.

  • Small functions lying inside the startpoint and the endpoint of another function will be merged with the other function

  • Tail call optimizations are detected.

  • PLT stubs are aligned by 16.

Returns:

None

Submodules

cfb

cfg

cfg_arch_options

cfg_base

cfg_emulated

cfg_fast

cfg_fast_soot

cfg_job_base

indirect_jump_resolvers

meta_structs

pe_msvc_eh_structs

Struct definitions and parsing utilities for MSVC C++ exception handling structures found in x86 PE binaries.