pypcode

Pythonic interface to SLEIGH

class pypcode.Address

Bases: object

Low level machine byte address.

property offset

The offset within the space.

property space

The address space.

class pypcode.AddrSpace

Bases: object

A region where processor data is stored.

property name

The name of this address space.

class pypcode.Arch(name, ldefpath)[source]

Bases: object

Main class representing an architecture describing available languages.

Parameters:
  • name (str)

  • ldefpath (str)

archpath: str
archname: str
ldefpath: str
ldef: ElementTree
languages: Sequence[ArchLanguage]
classmethod enumerate()[source]

Enumerate all available architectures and languages.

Language definitions are sourced from definitions shipped with pypcode and can be found in processors/<architecture>/data/languages/<variant>.ldefs

Return type:

Generator[Arch, None, None]

class pypcode.ArchLanguage(archdir, ldef)[source]

Bases: object

A specific language for an architecture. Provides access to language, pspec, and cspecs.

Parameters:
  • archdir (str)

  • ldef (Element)

archdir: str
ldef: Element
property pspec_path: str
property slafile_path: str
property description: str
property pspec: Element | None
property cspecs: Mapping[Tuple[str, str], Element]
init_context_from_pspec(ctx)[source]
Return type:

None

Parameters:

ctx (Context)

classmethod from_id(langid)[source]

Return language with given id, or None if the language could not be found.

Return type:

Optional[ArchLanguage]

Parameters:

langid (str)

exception pypcode.BadDataError

Bases: Exception

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class pypcode.Context(language)[source]

Bases: Context

Context for translation.

Parameters:

language (ArchLanguage)

language: ArchLanguage
registers: Dict[str, Varnode]
disassemble

Disassemble and format machine code as assembly code.

In [1]: import pypcode
   ...: ctx = pypcode.Context("x86:LE:64:default")
   ...: dx = ctx.disassemble(b"\x48\x35\x78\x56\x34\x12\xc3")
   ...: for ins in dx.instructions:
   ...:     print(f"{ins.addr.offset:#x}/{ins.length}: {ins.mnem} {ins.body}")
   ...: 
0x0/6: XOR RAX,0x12345678
0x6/1: RET 
Instructions are decoded from buf and formatted in Instruction s:
  • the end of the buffer is reached,

  • max_bytes or max_instructions is reached, or

  • an exception occurs.

If an exception occurs following successful disassembly of at least one instruction, the exception is discarded and the successful disassembly is returned. If the exception occurs at disassembly of the first instruction, it will be raised. See below for possible exceptions.

Parameters:
  • buf (bytes) – Machine code to disassemble.

  • base_address (int) – Base address of the code at offset being decoded, 0 by default.

  • offset (int) – Offset into bytes to begin disassembly, 0 by default.

  • max_bytes (int) – Maximum number of bytes to disassemble, or 0 for no limit (default).

  • max_instructions (int) – Maximum number of instructions to disassemble, or 0 for no limit (default).

Returns:

The disassembled machine code. Instructions are accessible through Disassembly.instructions.

Return type:

Disassembly

Raises:

BadDataError – The instruction at base_address could be decoded.

getAllRegisters

Get a mapping of all register locations to their corresponding names.

getRegisterName

Get the name of a register.

Parameters:
  • space (AddrSpace) – The address space.

  • offset (int) – Offset within the address space.

  • size (int) – Size of the register, in bytes.

Returns:

The register name, or the empty string if the register could not be identified.

Return type:

str

reset

Reset the context.

setVariableDefault

Provide a default value for a context variable.

translate

Translate machine code to P-Code.

In [1]: import pypcode
   ...: ctx = pypcode.Context("x86:LE:64:default")
   ...: tx = ctx.translate(b"\x48\x35\x78\x56\x34\x12\xc3")  # xor rax, 0x12345678; ret
   ...: for op in tx.ops:
   ...:     print(pypcode.PcodePrettyPrinter.fmt_op(op))
   ...: 
IMARK ram[0:6]
CF = 0x0
OF = 0x0
RAX = RAX ^ 0x12345678
SF = RAX s< 0x0
ZF = RAX == 0x0
unique[13480:8] = RAX & 0xff
unique[13500:1] = popcount(unique[13480:8])
unique[13580:1] = unique[13500:1] & 0x1
PF = unique[13580:1] == 0x0
IMARK ram[6:1]
RIP = *[ram]RSP
RSP = RSP + 0x8
return RIP
Instructions are decoded from buf and translated to a sequence of PcodeOp s until:
  • the end of the buffer is reached,

  • max_bytes or max_instructions is reached,

  • if the BB_TERMINATING flag is set, an instruction which performs a branch is encountered, or

  • an exception occurs.

A PcodeOp with opcode OpCode.IMARK is used to identify machine instructions corresponding to a translation. OpCode.IMARK ops precede the corresponding P-Code translation, and will have one or more input Varnode s identifying the address and length in bytes of the source machine instruction(s). The number of input Varnode s depends on the number of instructions that were decoded for the translation of the particular instruction.

On architectures with branch delay slots, the effects of the delay slot instructions will be included in the translation of the branch instruction. For this reason, it is possible that more instructions than specified in max_instructions may be translated. The OpCode.IMARK op identifying the branch instruction will contain an input Varnode corresponding to the branch instruction, with additional input Varnode identifying corresponding delay slot instructions.

If an exception occurs following successful translation of at least one instruction, the exception is discarded and the successful translation is returned. If the exception occurs during translation of the first instruction, the exception will be raised. See below for possible exceptions.

Parameters:
  • buf (bytes) – Machine code to translate.

  • base_address (int) – Base address of the code at offset being decoded.

  • offset (int) – Offset into bytes to begin translation.

  • max_bytes (int) – Maximum number of bytes to translate.

  • max_instructions (int) – Maximum number of instructions to translate.

  • flags (int) – Flags controlling translation. See TranslateFlags.

Returns:

The P-Code translation of the input machine code. P-Code ops are accessible through Translation.ops.

Return type:

Translation

Raises:
  • BadDataError – The instruction at base_address could not be decoded.

  • UnimplError – The P-Code for instruction at base_address is not yet implemented.

exception pypcode.DecoderError

Bases: Exception

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class pypcode.Disassembly

Bases: object

Machine Code Disassembly.

property instructions

The disassembled instructions.

class pypcode.Instruction

Bases: object

Disassembled machine code instruction.

property addr

Address of this instruction.

property body

Operand string of this instruction.

property length

Length, in bytes, of this instruction.

property mnem

Mnemonic string of this instruction.

exception pypcode.LowlevelError

Bases: Exception

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class pypcode.OpCode

Bases: object

BOOL_AND = pypcode.pypcode_native.OpCode.BOOL_AND
BOOL_NEGATE = pypcode.pypcode_native.OpCode.BOOL_NEGATE
BOOL_OR = pypcode.pypcode_native.OpCode.BOOL_OR
BOOL_XOR = pypcode.pypcode_native.OpCode.BOOL_XOR
BRANCH = pypcode.pypcode_native.OpCode.BRANCH
BRANCHIND = pypcode.pypcode_native.OpCode.BRANCHIND
CALL = pypcode.pypcode_native.OpCode.CALL
CALLIND = pypcode.pypcode_native.OpCode.CALLIND
CALLOTHER = pypcode.pypcode_native.OpCode.CALLOTHER
CAST = pypcode.pypcode_native.OpCode.CAST
CBRANCH = pypcode.pypcode_native.OpCode.CBRANCH
COPY = pypcode.pypcode_native.OpCode.COPY
CPOOLREF = pypcode.pypcode_native.OpCode.CPOOLREF
EXTRACT = pypcode.pypcode_native.OpCode.EXTRACT
FLOAT_ABS = pypcode.pypcode_native.OpCode.FLOAT_ABS
FLOAT_ADD = pypcode.pypcode_native.OpCode.FLOAT_ADD
FLOAT_CEIL = pypcode.pypcode_native.OpCode.FLOAT_CEIL
FLOAT_DIV = pypcode.pypcode_native.OpCode.FLOAT_DIV
FLOAT_EQUAL = pypcode.pypcode_native.OpCode.FLOAT_EQUAL
FLOAT_FLOAT2FLOAT = pypcode.pypcode_native.OpCode.FLOAT_FLOAT2FLOAT
FLOAT_FLOOR = pypcode.pypcode_native.OpCode.FLOAT_FLOOR
FLOAT_INT2FLOAT = pypcode.pypcode_native.OpCode.FLOAT_INT2FLOAT
FLOAT_LESS = pypcode.pypcode_native.OpCode.FLOAT_LESS
FLOAT_LESSEQUAL = pypcode.pypcode_native.OpCode.FLOAT_LESSEQUAL
FLOAT_MULT = pypcode.pypcode_native.OpCode.FLOAT_MULT
FLOAT_NAN = pypcode.pypcode_native.OpCode.FLOAT_NAN
FLOAT_NEG = pypcode.pypcode_native.OpCode.FLOAT_NEG
FLOAT_NOTEQUAL = pypcode.pypcode_native.OpCode.FLOAT_NOTEQUAL
FLOAT_ROUND = pypcode.pypcode_native.OpCode.FLOAT_ROUND
FLOAT_SQRT = pypcode.pypcode_native.OpCode.FLOAT_SQRT
FLOAT_SUB = pypcode.pypcode_native.OpCode.FLOAT_SUB
FLOAT_TRUNC = pypcode.pypcode_native.OpCode.FLOAT_TRUNC
IMARK = pypcode.pypcode_native.OpCode.IMARK
INDIRECT = pypcode.pypcode_native.OpCode.INDIRECT
INSERT = pypcode.pypcode_native.OpCode.INSERT
INT_2COMP = pypcode.pypcode_native.OpCode.INT_2COMP
INT_ADD = pypcode.pypcode_native.OpCode.INT_ADD
INT_AND = pypcode.pypcode_native.OpCode.INT_AND
INT_CARRY = pypcode.pypcode_native.OpCode.INT_CARRY
INT_DIV = pypcode.pypcode_native.OpCode.INT_DIV
INT_EQUAL = pypcode.pypcode_native.OpCode.INT_EQUAL
INT_LEFT = pypcode.pypcode_native.OpCode.INT_LEFT
INT_LESS = pypcode.pypcode_native.OpCode.INT_LESS
INT_LESSEQUAL = pypcode.pypcode_native.OpCode.INT_LESSEQUAL
INT_MULT = pypcode.pypcode_native.OpCode.INT_MULT
INT_NEGATE = pypcode.pypcode_native.OpCode.INT_NEGATE
INT_NOTEQUAL = pypcode.pypcode_native.OpCode.INT_NOTEQUAL
INT_OR = pypcode.pypcode_native.OpCode.INT_OR
INT_REM = pypcode.pypcode_native.OpCode.INT_REM
INT_RIGHT = pypcode.pypcode_native.OpCode.INT_RIGHT
INT_SBORROW = pypcode.pypcode_native.OpCode.INT_SBORROW
INT_SCARRY = pypcode.pypcode_native.OpCode.INT_SCARRY
INT_SDIV = pypcode.pypcode_native.OpCode.INT_SDIV
INT_SEXT = pypcode.pypcode_native.OpCode.INT_SEXT
INT_SLESS = pypcode.pypcode_native.OpCode.INT_SLESS
INT_SLESSEQUAL = pypcode.pypcode_native.OpCode.INT_SLESSEQUAL
INT_SREM = pypcode.pypcode_native.OpCode.INT_SREM
INT_SRIGHT = pypcode.pypcode_native.OpCode.INT_SRIGHT
INT_SUB = pypcode.pypcode_native.OpCode.INT_SUB
INT_XOR = pypcode.pypcode_native.OpCode.INT_XOR
INT_ZEXT = pypcode.pypcode_native.OpCode.INT_ZEXT
LOAD = pypcode.pypcode_native.OpCode.LOAD
LZCOUNT = pypcode.pypcode_native.OpCode.LZCOUNT
MULTIEQUAL = pypcode.pypcode_native.OpCode.MULTIEQUAL
NEW = pypcode.pypcode_native.OpCode.NEW
PIECE = pypcode.pypcode_native.OpCode.PIECE
POPCOUNT = pypcode.pypcode_native.OpCode.POPCOUNT
PTRADD = pypcode.pypcode_native.OpCode.PTRADD
PTRSUB = pypcode.pypcode_native.OpCode.PTRSUB
RETURN = pypcode.pypcode_native.OpCode.RETURN
SEGMENTOP = pypcode.pypcode_native.OpCode.SEGMENTOP
STORE = pypcode.pypcode_native.OpCode.STORE
SUBPIECE = pypcode.pypcode_native.OpCode.SUBPIECE
class pypcode.OpFormat[source]

Bases: object

General op pretty-printer.

static fmt_vn(vn)[source]
Return type:

str

Parameters:

vn (Varnode)

fmt(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

class pypcode.OpFormatBinary(operator)[source]

Bases: OpFormat

General binary op pretty-printer.

Parameters:

operator (str)

operator
fmt(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

static fmt_vn(vn)
Return type:

str

Parameters:

vn (Varnode)

class pypcode.OpFormatFunc(operator)[source]

Bases: OpFormat

Function-call style op pretty-printer.

Parameters:

operator (str)

operator
fmt(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

static fmt_vn(vn)
Return type:

str

Parameters:

vn (Varnode)

class pypcode.OpFormatSpecial[source]

Bases: OpFormat

Specialized op pretty-printers.

fmt_BRANCH(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

fmt_BRANCHIND(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

fmt_CALL(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

fmt_CALLIND(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

static fmt_vn(vn)
Return type:

str

Parameters:

vn (Varnode)

fmt_CBRANCH(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

fmt_LOAD(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

fmt_RETURN(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

fmt_STORE(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

fmt(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

class pypcode.OpFormatUnary(operator)[source]

Bases: OpFormat

General unary op pretty-printer.

Parameters:

operator (str)

operator
fmt(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

static fmt_vn(vn)
Return type:

str

Parameters:

vn (Varnode)

class pypcode.PcodeOp

Bases: object

Low-level representation of a single P-Code operation.

property inputs

Input varnodes for this operation.

property opcode

Opcode for this operation.

property output

Output varnode for this operation.

class pypcode.PcodePrettyPrinter[source]

Bases: object

P-code pretty-printer.

DEFAULT_OP_FORMAT = <pypcode.OpFormat object>
OP_FORMATS = {pypcode.pypcode_native.OpCode.COPY: <pypcode.OpFormatUnary object>, pypcode.pypcode_native.OpCode.LOAD: <pypcode.OpFormatSpecial object>, pypcode.pypcode_native.OpCode.STORE: <pypcode.OpFormatSpecial object>, pypcode.pypcode_native.OpCode.BRANCH: <pypcode.OpFormatSpecial object>, pypcode.pypcode_native.OpCode.CBRANCH: <pypcode.OpFormatSpecial object>, pypcode.pypcode_native.OpCode.BRANCHIND: <pypcode.OpFormatSpecial object>, pypcode.pypcode_native.OpCode.CALL: <pypcode.OpFormatSpecial object>, pypcode.pypcode_native.OpCode.CALLIND: <pypcode.OpFormatSpecial object>, pypcode.pypcode_native.OpCode.RETURN: <pypcode.OpFormatSpecial object>, pypcode.pypcode_native.OpCode.INT_EQUAL: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_NOTEQUAL: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_SLESS: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_SLESSEQUAL: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_LESS: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_LESSEQUAL: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_ZEXT: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.INT_SEXT: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.INT_ADD: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_SUB: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_CARRY: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.INT_SCARRY: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.INT_SBORROW: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.INT_2COMP: <pypcode.OpFormatUnary object>, pypcode.pypcode_native.OpCode.INT_NEGATE: <pypcode.OpFormatUnary object>, pypcode.pypcode_native.OpCode.INT_XOR: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_AND: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_OR: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_LEFT: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_RIGHT: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_SRIGHT: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_MULT: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_DIV: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_SDIV: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_REM: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.INT_SREM: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.BOOL_NEGATE: <pypcode.OpFormatUnary object>, pypcode.pypcode_native.OpCode.BOOL_XOR: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.BOOL_AND: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.BOOL_OR: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.FLOAT_EQUAL: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.FLOAT_NOTEQUAL: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.FLOAT_LESS: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.FLOAT_LESSEQUAL: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.FLOAT_NAN: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.FLOAT_ADD: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.FLOAT_DIV: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.FLOAT_MULT: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.FLOAT_SUB: <pypcode.OpFormatBinary object>, pypcode.pypcode_native.OpCode.FLOAT_NEG: <pypcode.OpFormatUnary object>, pypcode.pypcode_native.OpCode.FLOAT_ABS: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.FLOAT_SQRT: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.FLOAT_INT2FLOAT: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.FLOAT_FLOAT2FLOAT: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.FLOAT_TRUNC: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.FLOAT_CEIL: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.FLOAT_FLOOR: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.FLOAT_ROUND: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.CPOOLREF: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.NEW: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.POPCOUNT: <pypcode.OpFormatFunc object>, pypcode.pypcode_native.OpCode.LZCOUNT: <pypcode.OpFormatFunc object>}
classmethod fmt_op(op)[source]
Return type:

str

Parameters:

op (PcodeOp)

class pypcode.TranslateFlags(value)[source]

Bases: IntEnum

Flags that can be passed to Context::translate

BB_TERMINATING = 1
conjugate()

Returns self, the complex conjugate of any int.

bit_length()

Number of bits necessary to represent self in binary.

>>> bin(37)
'0b100101'
>>> (37).bit_length()
6
bit_count()

Number of ones in the binary representation of the absolute value of self.

Also known as the population count.

>>> bin(13)
'0b1101'
>>> (13).bit_count()
3
to_bytes(length, byteorder, *, signed=False)

Return an array of bytes representing an integer.

length

Length of bytes object to use. An OverflowError is raised if the integer is not representable with the given number of bytes.

byteorder

The byte order used to represent the integer. If byteorder is ‘big’, the most significant byte is at the beginning of the byte array. If byteorder is ‘little’, the most significant byte is at the end of the byte array. To request the native byte order of the host system, use `sys.byteorder’ as the byte order value.

signed

Determines whether two’s complement is used to represent the integer. If signed is False and a negative integer is given, an OverflowError is raised.

from_bytes(byteorder, *, signed=False)

Return the integer represented by the given array of bytes.

bytes

Holds the array of bytes to convert. The argument must either support the buffer protocol or be an iterable object producing bytes. Bytes and bytearray are examples of built-in objects that support the buffer protocol.

byteorder

The byte order used to represent the integer. If byteorder is ‘big’, the most significant byte is at the beginning of the byte array. If byteorder is ‘little’, the most significant byte is at the end of the byte array. To request the native byte order of the host system, use `sys.byteorder’ as the byte order value.

signed

Indicates whether two’s complement is used to represent the integer.

as_integer_ratio()

Return integer ratio.

Return a pair of integers, whose ratio is exactly equal to the original int and with a positive denominator.

>>> (10).as_integer_ratio()
(10, 1)
>>> (-10).as_integer_ratio()
(-10, 1)
>>> (0).as_integer_ratio()
(0, 1)
real

the real part of a complex number

imag

the imaginary part of a complex number

numerator

the numerator of a rational number in lowest terms

denominator

the denominator of a rational number in lowest terms

class pypcode.Translation

Bases: object

P-Code translation.

property ops

The translated sequence of P-Code ops.

exception pypcode.UnimplError

Bases: Exception

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class pypcode.Varnode

Bases: object

Data defining a specific memory location.

getRegisterName

Return the register name if this Varnode references a register, otherwise return the empty string.

getSpaceFromConst

Recover encoded address space from constant value.

property offset

The offset within the space.

property size

The number of bytes in the location.

property space

The address space.