angr.analyses.cfg.cfg_fast

exception angr.analyses.cfg.cfg_fast.ContinueScanningNotification

Bases: RuntimeError

A notification raised by _next_code_addr_core() to indicate no code address is found and _next_code_addr_core() should be invoked again.

class angr.analyses.cfg.cfg_fast.ARMDecodingMode

Bases: object

Enums indicating decoding mode for ARM code.

ARM = 0
THUMB = 1
class angr.analyses.cfg.cfg_fast.DecodingAssumption

Bases: object

Describes the decoding mode (ARM/THUMB) for a given basic block identified by its address.

__init__(addr, size, mode)
Parameters:
add_data_seg(addr, size)
Return type:

None

Parameters:
class angr.analyses.cfg.cfg_fast.FunctionReturn

Bases: object

FunctionReturn describes a function call in a specific location and its return location. Hashable and equatable

__init__(callee_func_addr, caller_func_addr, call_site_addr, return_to)
callee_func_addr
caller_func_addr
call_site_addr
return_to
class angr.analyses.cfg.cfg_fast.PendingJobs

Bases: object

A collection of pending jobs during CFG recovery.

__init__(kb, deregister_job_callback)
add_job(job)
pop_job(returning=True)

Pop a job from the pending jobs list.

When returning == True, we prioritize the jobs whose functions are known to be returning (function.returning is True). As an optimization, we are sorting the pending jobs list according to job.function.returning.

Parameters:

returning (bool) – Only pop a pending job if the corresponding function returns.

Returns:

A pending job if we can find one, or None if we cannot find any that satisfies the requirement.

Return type:

angr.analyses.cfg.cfg_fast.CFGJob

cleanup()

Remove those pending exits if: a) they are the return exits of non-returning SimProcedures b) they are the return exits of non-returning syscalls b) they are the return exits of non-returning functions

Returns:

None

add_returning_function(func_addr)

Mark a function as returning.

Parameters:

func_addr (int) – Address of the function that returns.

Returns:

None

add_nonreturning_function(func_addr)

Mark a function as not returning.

Parameters:

func_addr (int) – Address of the function that does not return.

Returns:

None

clear_updated_functions()

Clear the updated_functions set.

Returns:

None

class angr.analyses.cfg.cfg_fast.FunctionEdge

Bases: object

Describes an edge in functions’ transition graphs. Base class for all types of edges.

apply(cfg)
ins_addr
src_func_addr
stmt_idx
class angr.analyses.cfg.cfg_fast.FunctionTransitionEdge

Bases: FunctionEdge

Describes a transition edge in functions’ transition graphs.

__init__(src_node, dst_addr, src_func_addr, to_outside=False, dst_func_addr=None, stmt_idx=None, ins_addr=None, is_exception=False)
src_node
dst_addr
to_outside
dst_func_addr
is_exception
apply(cfg)
class angr.analyses.cfg.cfg_fast.FunctionCallEdge

Bases: FunctionEdge

Describes a call edge in functions’ transition graphs.

__init__(src_node, dst_addr, ret_addr, src_func_addr, syscall=False, stmt_idx=None, ins_addr=None)
src_node
dst_addr
ret_addr
syscall
apply(cfg)
class angr.analyses.cfg.cfg_fast.FunctionFakeRetEdge

Bases: FunctionEdge

Describes a FakeReturn (also called fall-through) edge in functions’ transition graphs.

__init__(src_node, dst_addr, src_func_addr, confirmed=None)
src_node
dst_addr
confirmed
apply(cfg)
class angr.analyses.cfg.cfg_fast.FunctionReturnEdge

Bases: FunctionEdge

Describes a return (from a function call or a syscall) edge in functions’ transition graphs.

__init__(ret_from_addr, ret_to_addr, dst_func_addr)
ret_from_addr
ret_to_addr
dst_func_addr
apply(cfg)
class angr.analyses.cfg.cfg_fast.CFGJobType

Bases: Enum

Defines the type of work of a CFGJob

NORMAL = 0
FUNCTION_PROLOGUE = 1
COMPLETE_SCANNING = 2
IFUNC_HINTS = 3
DATAREF_HINTS = 4
EH_FRAME_HINTS = 5
class angr.analyses.cfg.cfg_fast.CFGJob

Bases: object

Defines a job to work on during the CFG recovery

__init__(addr, func_addr, jumpkind, ret_target=None, last_addr=None, src_node=None, src_ins_addr=None, src_stmt_idx=None, returning_source=None, syscall=False, func_edges=None, job_type=CFGJobType.NORMAL, gp=None)
Parameters:
  • addr (int)

  • func_addr (int)

  • jumpkind (str)

  • ret_target (int | None)

  • last_addr (int | None)

  • src_node (CFGNode | None)

  • src_ins_addr (int | None)

  • src_stmt_idx (int | None)

  • syscall (bool)

  • func_edges (list | None)

  • job_type (CFGJobType)

  • gp (int | None)

addr
func_addr
jumpkind
ret_target
last_addr
src_node
src_ins_addr
src_stmt_idx
returning_source
syscall
job_type
gp
add_function_edge(edge)
apply_function_edges(cfg, clear=False)
class angr.analyses.cfg.cfg_fast.CFGFast

Bases: ForwardAnalysis[CFGNode, CFGNode, CFGJob, int, object], CFGBase

We find functions inside the given binary, and build a control-flow graph in very fast manners: instead of simulating program executions, keeping track of states, and performing expensive data-flow analysis, CFGFast will only perform light-weight analyses combined with some heuristics, and with some strong assumptions.

In order to identify as many functions as possible, and as accurate as possible, the following operation sequence is followed:

# Active scanning

  • If the binary has “function symbols” (TODO: this term is not accurate enough), they are starting points of the code scanning

  • If the binary does not have any “function symbol”, we will first perform a function prologue scanning on the entire binary, and start from those places that look like function beginnings

  • Otherwise, the binary’s entry point will be the starting point for scanning

# Passive scanning

  • After all active scans are done, we will go through the whole image and scan all code pieces

Due to the nature of those techniques that are used here, a base address is often not required to use this analysis routine. However, with a correct base address, CFG recovery will almost always yield a much better result. A custom analysis, called GirlScout, is specifically made to recover the base address of a binary blob. After the base address is determined, you may want to reload the binary with the new base address by creating a new Project object, and then re-recover the CFG.

PRINTABLES = b'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r'
SPECIAL_THUNKS = {'AMD64': {b'\xe8\x07\x00\x00\x00\xf3\x90\x0f\xae\xe8\xeb\xf9H\x89\x04$\xc3': ('jmp', 'rax'), b'\xe8\x07\x00\x00\x00\xf3\x90\x0f\xae\xe8\xeb\xf9H\x8dd$\x08\xc3': ('ret',)}}
tag: str = 'CFGFast'
addr_type: Literal['int', 'block_id', 'soot'] = 'int'
__init__(binary=None, objects=None, regions=None, pickle_intermediate_results=False, symbols=True, function_prologues=None, resolve_indirect_jumps=True, force_segment=False, force_smart_scan=None, force_complete_scan=False, indirect_jump_target_limit=100000, data_references=True, cross_references=False, normalize=False, start_at_entry=True, function_starts=None, extra_memory_regions=None, data_type_guessing_handlers=None, arch_options=None, indirect_jump_resolvers=None, base_state=None, exclude_sparse_regions=True, skip_specific_regions=True, heuristic_plt_resolving=None, detect_tail_calls=False, low_priority=False, cfb=None, model=None, eh_frame=True, exceptions=True, skip_unmapped_addrs=True, nodecode_window_size=512, nodecode_threshold=0.3, nodecode_step=16483, check_funcret_max_job=500, indirect_calls_always_return=None, jumptable_resolver_resolves_calls=None, retedges=False, drop_bad_funcs=True, start=None, end=None, collect_data_references=None, extra_cross_references=None, elf_eh_frame=None, **extra_arch_options)
Parameters:
  • binary – The binary to recover CFG on. By default the main binary is used.

  • objects – A list of objects to recover the CFG on. By default it will recover the CFG of all loaded objects.

  • regions (iterable) – A list of tuples in the form of (start address, end address) describing memory regions that the CFG should cover.

  • pickle_intermediate_results (bool) – If we want to store the intermediate results or not.

  • symbols (bool) – Get function beginnings from symbols in the binary.

  • function_prologues (bool | None) – Scan the binary for function prologues, and use those positions as function beginnings

  • resolve_indirect_jumps (bool) – Try to resolve indirect jumps. This is necessary to resolve jump targets from jump tables, etc.

  • force_segment (bool) – Force CFGFast to rely on binary segments instead of sections.

  • force_complete_scan (bool) – Perform a complete scan on the binary and maximize the number of identified code blocks.

  • data_references (bool) – Enables the collection of references to data used by individual instructions. This does not collect ‘cross-references’, particularly those that involve multiple instructions. For that, see cross_references

  • cross_references (bool) – Whether CFGFast should collect “cross-references” from the entire program or not. This will populate the knowledge base with references to and from each recognizable address constant found in the code. Note that, because this performs constant propagation on the entire program, it may be much slower and consume more memory. This option implies data_references=True.

  • normalize (bool) – Normalize the CFG as well as all function graphs after CFG recovery.

  • start_at_entry (bool) – Begin CFG recovery at the entry point of this project. Setting it to False prevents CFGFast from viewing the entry point as one of the starting points of code scanning.

  • function_starts (list) – A list of extra function starting points. CFGFast will try to resume scanning from each address in the list.

  • extra_memory_regions (list) – A list of 2-tuple (start-address, end-address) that shows extra memory regions. Integers falling inside will be considered as pointers.

  • indirect_jump_resolvers (list) – A custom list of indirect jump resolvers. If this list is None or empty, default indirect jump resolvers specific to this architecture and binary types will be loaded.

  • base_state – A state to use as a backer for all memory loads

  • detect_tail_calls (bool) – Enable aggressive tail-call optimization detection.

  • eh_frame (bool) – Retrieve function starts (and maybe sizes later) from the .eh_frame of ELF binaries or exception records of PE binaries.

  • skip_unmapped_addrs – Ignore all branches into unmapped regions. True by default. You may want to set it to False if you are analyzing manually patched binaries or malware samples.

  • indirect_calls_always_return (bool | None) – Should CFG assume indirect calls must return or not. Assuming indirect calls must return will significantly reduce the number of constant propagation runs, but may reduce the overall CFG recovery precision when facing non-returning indirect calls. By default, we only assume indirect calls always return for large binaries (region > 50KB).

  • jumptable_resolver_resolves_calls (bool | None) – Whether JumpTableResolver should resolve indirect calls or not. Most indirect calls in C++ binaries or UEFI binaries cannot be resolved using jump table resolver and must be resolved using their specific resolvers. By default, we will only disable JumpTableResolver from resolving indirect calls for large binaries (region > 50 KB).

  • check_funcret_max_job – When popping return-site jobs out of the job queue, angr will prioritize jobs for which the callee is known to return. This check may be slow when there are a large amount of jobs in different caller functions, and this situation often occurs in obfuscated binaries where many functions never return. This parameter acts as a threshold to disable this check when the number of jobs in the queue exceeds this threshold.

  • start (int) – (Deprecated) The beginning address of CFG recovery.

  • end (int) – (Deprecated) The end address of CFG recovery.

  • arch_options (CFGArchOptions) – Architecture-specific options.

  • extra_arch_options – Any key-value pair in kwargs will be seen as an arch-specific option and will be used to set the option value in self._arch_options.

  • retedges (bool) – Whether to add return edges (from function endpoints to their return sites) in the CFG. Return edges are not added by default because they are often not useful during analysis; You can set retedges to True or call make_return_edges() after CFG recovery to create return edges. Note that this option does not impact function graphs.

  • progress_callback – (Inherited from angr.Analysis.) Callback for CFG recovery progress.

  • show_progressbar (bool) – (Inherited from angr.Analysis.) Show a progressbar during CFG recovery.

  • force_smart_scan (bool | None)

  • drop_bad_funcs (bool)

Returns:

None

stage: str
property graph: SpillingCFG
property memory_data
property jump_tables
property insn_addr_to_memory_data
do_full_xrefs(overlay_state=None)

Perform xref recovery on all functions.

Parameters:

overlay (SimState) – An overlay state for loading constant data.

Returns:

None

drop_bad_functions()
make_return_edges()

For each returning function, create return edges in self.graph.

Returns:

None

copy()
output()