Binary Ninja Intermediate Language Series, Part 2: Medium Level IL¶
Binary Ninja Intermediate Language: Medium Level IL¶
The Medium Level Intermediate Language (MLIL) is the second major representation in the Binary Ninja Intermediate Language (BNIL) family of intermediate languages. Much like LLIL this representation is tree based and has many of the same instructions. This representation is distinct in a few key ways.
- Registers have been translated to variables.
- The stack as a concept is not present.
- Variables have types associated with them.
- Call sites have their parameters inferred and associated with them.
- Data flow has been calculated and constants are propagated.
- Some dead code is eliminated (MLIL only, Mapped MLIL doesn't do this)
Purposes of MLIL¶
- Simplified representation
- Small discrete operations
- Can be more accurate to binary representation than decompilation
- Powerful Data flow (PossibleValueSet) APIs
- Accurate (though verbose) variable identification
In the rest of this article we will explore the variable object, the type object, the confidence system, and finally the instruction set.
The Variable Object¶
First, it's important to understand what we mean when we talk about a MLIL variable. Continuing from our example above we can get a Variable
object.
>>> inst.output
[<var int64_t rax>]
>>> var = inst.output[0]
>>> type(var)
<class 'binaryninja.function.Variable'>
Variables in MLIL have a very specific meaning, that is not completely obvious at first. They represent a single storage location within the scope of a single function. To those not well versed in program analysis, a storage location is where a value is located at a given point in time. In the process of compilation a compiler conducts a step called Register Allocation; this is the process of figuring out how to map the potentially infinite number of variables specified in the original source code to a finite set of registers. When there are more variables and intermediate values than registers available, the compiler spills them on to the stack. Thus a single high-level-language variable can be mapped across a number of storage locations. A variable can simultaneously be in multiple registers and on the stack at the same time. However, unlike high-level-language variables, MLIL variables represent one and only one storage location. Binary Ninja's High Level IL (HLIL) will be responsible for storing this mapping.
So let's look at the properties available on a Variable
object.
source_type
¶
The source_type
represents the storage location type and can be one of the following :
enum VariableSourceType
{
StackVariableSourceType,
RegisterVariableSourceType,
FlagVariableSourceType
};
>>> var.source_type
<VariableSourceType.RegisterVariableSourceType: 1>
storage
¶
The storage
property changes meaning depending on the VariableSourceType
. When a variable is of type RegisterVariableSourceType
, its storage
property represents the index into the register list for the given architecture. If the source_type
is StackVariableSourceType
, its storage
property represents the stack offset of the variable.
>>> var
<var int64_t rax>
>>> var.source_type
<VariableSourceType.RegisterVariableSourceType: 1>
>>> bv.arch._regs_by_index[var.storage]
'rax'
>>> var2
<var int64_t var_260>
>>> var2.source_type
<VariableSourceType.StackVariableSourceType: 0>
>>> hex(var2.storage)
'-0x260'
Given the above information it might now be intuitive how variable names are constructed. First we determine the source_type
of the variable. If it's a RegisterVariableSourceType
we just use the register's name directly. If it’s a StackVariableSourceType
then we use var_
+ hex(-storage)
. Finally, we append a count each time that that storage location is reused.
index
¶
The index
is an identifier chosen to be unique across different analysis passes.
type
¶
The type
property returns the Type
object associated with the variable:
>>> var.type
<type: int64_t, 0% confidence>
Type objects are described in detail in the next section.
The Type Object¶
Type objects are very similar to standard C types. A Type object's type can be determined through the object’s type_class
property. Valid types are in the TypeClass
enumeration:
enum TypeClass
{
VoidTypeClass = 0,
BoolTypeClass = 1,
IntegerTypeClass = 2,
FloatTypeClass = 3,
StructureTypeClass = 4,
EnumerationTypeClass = 5,
PointerTypeClass = 6,
ArrayTypeClass = 7,
FunctionTypeClass = 8,
VarArgsTypeClass = 9,
ValueTypeClass = 10,
NamedTypeReferenceClass = 11,
WideCharTypeClass = 12
};
Type objects all contain a confidence
property; this is currently only used for type inference, but can also be used by users implementing their own analyses. Below is a reference for each of the type objects and their unique properties.
VoidTypeClass¶
A void object is one that nothing is known about. For instance if a reference is taken of a static memory address a variable will be created there with a void type as we know the address is used, but are unsure about what size is being accessed. The instruction that takes the address of the static memory address will be a void pointer.
BoolTypeClass¶
A boolean type is an integer which has a value of False (0) or True (!0).
IntegerTypeClass¶
An integer type has a sign, a width (in bytes), and a display type. The display type determines how the integer should be displayed; the options are self explanatory:
enum IntegerDisplayType
{
DefaultIntegerDisplayType,
BinaryDisplayType,
SignedOctalDisplayType,
UnsignedOctalDisplayType,
SignedDecimalDisplayType,
UnsignedDecimalDisplayType,
SignedHexadecimalDisplayType,
UnsignedHexadecimalDisplayType,
CharacterConstantDisplayType,
PointerDisplayType,
FloatDisplayType,
DoubleDisplayType
};
FloatTypeClass¶
The float type is a IEEE 754 variable precision type, and can represent floating point numbers up to 10 bytes in width. All floating point numbers are assumed to be signed.
WideCharTypeClass¶
The wide character holds a unicode character constant whose interpretation can change depending on the analysis.unicode
group of settings.
VarArgsTypeClass¶
A varargs type is used to indicate that a function is variadic and thus represents the set of additional parameters being passed to a given function.
ValueTypeClass¶
A value type is simply a constant value. It is used mainly in demangling for types which only have a have a name or value.
FunctionTypeClass¶
The function type describes the return type, parameter list, and calling convention of a function, among many other properties.
can_return
- boolean value indicating if the function can returncalling_convention
- the calling convention this function usesconst
- boolean value indicating if this a const functionhas_variable_arguments
- boolean value indicating if this function is variadicparameters
- contains a list ofType
objectsplatform
- thePlatform
object associated with this functionreturn_value
- the return type of this functionstack_adjustment
- the size in bytes of the stack adjustment that this function makes
PointerTypeClass¶
A pointer type simply describes a pointer and what it points to in the target
/element_type
property.
ArrayTypeClass¶
Array types function similarly to pointer types however the array type knows how large the object that it points to is:
target
/element_type
- the type of element this array is constructed ofcount
- the count of array elementswidth
- the size of the array (count * target.width)
EnumerationTypeClass¶
Enumeration
types function much the same way they do in C, providing a mapping between a name and corresponding constant. The object itself contains a members
property and a list of EnumerationMember
objects each containing a name and value.
StructureTypeClass¶
Structure types are simple in principle but are complicated by the need for them to be referenced by a NamedTypeReference
for them to be useful. Structures come in 3 different flavors: struct
, class
, and union
. While the first two simply differ in name, in unions all members overlap. Structure
objects contain a list of StructureMembers. StructureMember objects contain a name
, offset
, and type
. Structures can be packed or aligned, accessible by the packed
property.
NamedTypeReferenceClass¶
NamedTypeReference types are symbolic references to other types. They function much like a C typedef
(i.e. Name X corresponds to type Y). The NamedTypeReference has a type_class
property describing what sort of type it is pointing at.
enum NamedTypeReferenceClass
{
UnknownNamedTypeClass = 0,
TypedefNamedTypeClass = 1,
ClassNamedTypeClass = 2,
StructNamedTypeClass = 3,
UnionNamedTypeClass = 4,
EnumNamedTypeClass = 5
};
Most of the above should be self-explanatory except for the UnknownNamedTypeClass
which is used in the name demangler, as the mangler doesn't disambiguate between named Enumerations and named Structures. NamedTypeReference objects also have a UUID type_id
.
The Instruction Set¶
The instruction set is made up of MediumLevelILInstruction
objects. Let's start exploring by using the python console to poke around at some instructions. Open up a binary in Binary Ninja and retrieve an MLIL instruction:
>>> inst = current_mlil[8]
<il: rax = 0x402cb0("PORT")>
>>> type(inst)
<class 'binaryninja.mediumlevelil.MediumLevelILInstruction'>
current_mlil
is mapped to whatever function is currently being viewed and is not generally available to those writing plugins, as your plugin could be headless. The bracket operators tell the API to get the MLIL instruction at index 8 for the current function.
There are a number of properties that can be queried on the MediumLevelILInstruction
object, and the validity of these properties changes depending on what the current operation is. If we look at the operation
of inst
we can see it is a MLIL_CALL
instruction.
>>> inst.operation
<MediumLevelILOperation.MLIL_CALL: 51>
From the code in mediumlevelil.py
we can see that the MLIL_CALL
operation has three properties in addition to the operations available to all MediumLevelILInstruction
objects
MediumLevelILOperation.MLIL_CALL: [("output", "var_list"), ("dest", "expr"), ("params", "expr_list")],
Thus we can query the call's output
which is a list of variables:
>>> inst.output
[<var int64_t rax>]
The call's dest
(destination expression) which in this case is a MLIL_CONST_PTR
:
>>> inst.dest
<il: 0x402cb0>
>>> inst.dest.operation
<MediumLevelILOperation.MLIL_CONST_PTR: 14>
>>> inst.dest.value
<const ptr 0x402cb0>
>>> hex(inst.dest.value.value)
'0x402cb0'
The parameter list can be accessed through the params
property:
>>> inst.params
[<il: "PORT">]
>>> inst.params[0]
<il: "PORT">
>>> type(inst.params[0])
<class 'binaryninja.mediumlevelil.MediumLevelILInstruction'>
Control Flow¶
MLIL_JUMP
- Branch to thedest
expression's addressMLIL_JUMP_TO
- A jump table dispatch instruction. Uses thedest
expression to calculate the MLIL instruction targettargets
to branch toMLIL_CALL
- Branch to thedest
expression function, saving the return address, with the list of parametersparams
and returning the list of return valuesoutput
MLIL_CALL_UNTYPED
- This is a call instruction where stack resolution could not be determined, and thus a list of parameters and return values do not existMLIL_CALL_OUTPUT
- This expression holds a set of return valuesdest
from a callMLIL_CALL_PARAM
- This expression holds the set of parameterssrc
for a call instructionMLIL_RET
- Return to the calling function.MLIL_RET_HINT
- Indirect jump todest
expression (only used in internal analysis passes.)MLIL_NORET
- This instruction will never be executed, the instruction before it is a call that doesn't returnMLIL_IF
- Branch to thetrue
/false
MLIL instruction identifier depending on the result of thecondition
expressionMLIL_GOTO
- Branch to thedest
expression idMLIL_TAILCALL
- This instruction calls the expressiondest
usingparams
as input andoutput
for return valuesMLIL_TAILCALL_UNTYPED
- A tailcall where the stack stack resolution could not be determined and thus a list of parameters and return values do not existMLIL_SYSCALL
- Make a system/service call with parametersparams
and outputoutput
MLIL_SYSCALL_UNTYPED
- Makes a system/service call, but an exact set of parameters couldn't be determined.
Variable Reads and Writes¶
MLIL_SET_VAR
- Sets a variabledest
to the result of an expressionsrc
MLIL_SET_VAR_ALIASED
- Sets a variableprev
to the result of an expressionsrc
with the additional information that other variables point to the same variable destinationMLIL_SET_VAR_ALIASED_FIELD
- Sets a field at anoffset
of the variableprev
with the expressionsrc
with the additional information that the variable is alised by other variablesMLIL_SET_VAR_FIELD
- Sets variabledest
atoffset
to thesrc
expressionMLIL_SET_VAR_SPLIT
- Sets a pair of variableshigh
:low
to the result of thesrc
expressionMLIL_LOAD
- Readsize
bytes from the memory addresssrc
MLIL_LOAD_STRUCT
- Read from the struct offset atsrc
+offset
MLIL_STORE
- Storessize
bytes intodest
fromsrc
MLIL_STORE_STRUCT
- Storessize
bytes into struct offsetdest
+offset
fromsrc
MLIL_VAR
- A variable expressionsrc
MLIL_VAR_ALIASED
- A variable expressionsrc
that is known to have other variables pointing to the same destinationMLIL_VAR_ALIASED_FIELD
-MLIL_VAR_FIELD
- A variable and offset expressionsrc
,offset
MLIL_VAR_SPLIT
- A split pair of variableshigh
:low
which can be used a single expressionMLIL_VAR_PHI
- APHI
represents the combination of several prior versions of a variable when differnet basic blocks coalesce into a single destination and it's unknown which path was taken.MLIL_MEM_PHI
- A memoryPHI
represents memory modifications that could have occured down different source basic blocks similar to aVAR_PHI
.MLIL_ADDRESS_OF
- The address of variablesrc
MLIL_ADDRESS_OF_FIELD
- The address andoffset
of the variablesrc
MLIL_CONST
- A constant integral valueconstant
MLIL_CONST_DATA
- A constant data referenceconstant data reference
MLIL_CONST_PTR
- A constant integral value which is used as a pointerconstant
MLIL_EXTERN_PTR
- A symbolic pointerconstant
+offset
to a symbol that exists outside the binaryMLIL_FLOAT_CONST
- A floating point constantconstant
MLIL_IMPORT
- Aconstant
integral value representing an imported addressMLIL_LOW_PART
-size
bytes from the low end ofsrc
expression
Arithmetic Operations¶
MLIL_ADD
- Addsleft
expression toright
expressionMLIL_ADC
- Adds with carry theleft
expression to theright
expression with carry from thecarry
expressionMLIL_SUB
- Subtracts theright
expression from theleft
expressionMLIL_SBB
- Subtraction with borrow theright
expression from theleft
expression with carry from thecarry
expressionMLIL_AND
- Bitwise ANDleft
expression with theright
expressionMLIL_OR
- Bitwise ORleft
expression with theright
expressionMLIL_XOR
- Bitwise XORleft
expression with theright
expressionMLIL_LSL
- Logical shift left theleft
expression by the number of bits stored in theright
expressionMLIL_LSR
- Logical shift right theleft
expression by the number of bits stored in theright
expressionMLIL_ASR
- Arithmetic shift right theleft
expression by the number of bits stored in theright
expressionMLIL_ROL
- Rotate left theleft
expression by the number of bits stored in theright
expressionMLIL_RLC
- Rotate left with carry theleft
expression and thecarry
expression by the number of bits stored in theright
expressionMLIL_ROR
- Rotate right theleft
expression by the number of bits stored in theright
expressionMLIL_RRC
- Rotate right with carry theleft
expression and thecarry
expression by the number of bits stored in theright
expressionMLIL_MUL
- Single-precision multiply theleft
expression with theright
expressionMLIL_MULU_DP
- Double-precision unsigned multiply theleft
expression with theright
expression, result expression is twice the size of the input expressionsMLIL_MULS_DP
- Double-precision signed multiply theleft
expression with theright
expression, result expression is twice the size of the input expressionsMLIL_DIVU
- Unsigned single-precision divideleft
expression by theright
expressionMLIL_DIVU_DP
- Unsigned double-precision divideleft
expression by theright
expressionMLIL_DIVS
- Signed single-precision divideleft
expression by theright
expressionMLIL_DIVS_DP
- Signed double-precision divideleft
expression by theright
expressionMLIL_MODU
- Unsigned single-precision modulus ofleft
expression by theright
expressionMLIL_MODU_DP
- Unsigned double-precision modulus ofleft
expression by theright
expressionMLIL_MODS
- Signed single-precision modulus ofleft
expression by theright
expressionMLIL_MODS_DP
- Signed double-precision modulus ofleft
expression by theright
expressionMLIL_NEG
- Sign inversion ofsrc
expressionMLIL_NOT
- Bitwise inversion ofsrc
expressionMLIL_FADD
- IEEE754 floating point addition ofleft
expression withright
expressionMLIL_FSUB
- IEEE754 floating point subtraction ofleft
expression withright
expressionMLIL_FMUL
- IEEE754 floating point multiplication ofleft
expression withright
expressionMLIL_FDIV
- IEEE754 floating point division ofleft
expression withright
expressionMLIL_FSQRT
- IEEE754 floating point square root ofleft
expression withright
expressionMLIL_FNEG
- IEEE754 floating point sign negation ofsrc
expressionMLIL_FABS
- IEEE754 floating point absolute value ofsrc
expressionMLIL_FLOAT_TO_INT
- IEEE754 floating point to integer conversion ofsrc
expressionMLIL_INT_TO_FLOAT
- Integer to IEEE754 floating point conversion ofsrc
expressionMLIL_FLOAT_CONV
- Convert bytes insrc
expression to IEEE754 floating pointMLIL_ROUND_TO_INT
- Rounds the IEEE754 floating point numbersrc
expressionMLIL_FLOOR
- Computes the floating point floor of the IEEE754 number insrc
MLIL_CEIL
- Computes the floating point floor of the IEEE754 number insrc
MLIL_FTRUNC
- Computes the floating point truncation of the IEEE754 number insrc
MLIL_SX
- Sign extends thesrc
expressionMLIL_ZX
- Zero extends thesrc
expressionMLIL_ADD_OVERFLOW
- Calculates overflow of the addition ofleft
expression withright
expressionMLIL_BOOL_TO_INT
- Converts a boolsrc
to an integer
Comparison Instructions¶
MLIL_CMP_E
- Compare expression evaluates to true ifleft
expression is equal toright
MLIL_CMP_NE
- Compare expression evaluates to true ifleft
expression is not equal toright
MLIL_CMP_SLT
- Compare expression evaluates to true ifleft
expression is signed less thanright
MLIL_CMP_ULT
- Compare expression evaluates to true ifleft
expression is unsigned less thanright
MLIL_CMP_SLE
- Compare expression evaluates to true ifleft
expression is signed less than or equal toright
MLIL_CMP_ULE
- Compare expression evaluates to true ifleft
expression is unsigned less than or equal toright
MLIL_CMP_SGE
- Compare expression evaluates to true ifleft
expression is signed greater than or equal toright
MLIL_CMP_UGE
- Compare expression evaluates to true ifleft
expression is unsigned greater than or equal toright
MLIL_CMP_SGT
- Compare expression evaluates to true ifleft
expression is signed greater thanright
MLIL_CMP_UGT
- Compare expression evaluates to true ifleft
expression is unsigned greater thanright
MLIL_TEST_BIT
- Test if bitright
in expressionleft
is setMLIL_FCMP_E
- Floating point compare expressions - evaluates to true ifleft
expression is equal toright
MLIL_FCMP_NE
- Floating point compare expressions - evaluates to true ifleft
expression is not equal toright
MLIL_FCMP_LT
- Floating point compare expressions - evaluates to true ifleft
expression is less thanright
MLIL_FCMP_LE
- Floating point compare expressions - evaluates to true ifleft
expression is less than or equal toright
MLIL_FCMP_GE
- Floating point compare expressions - evaluates to true ifleft
expression is greater than or equal toright
MLIL_FCMP_GT
- Floating point compare expressions - evaluates to true ifleft
expression is greater thanright
MLIL_FCMP_O
- Floating point compare expressions - evaluates to true if bothleft
andright
expressions are ordered (not NaN)MLIL_FCMP_UO
- Floating point compare expressions - evaluates to true if eitherleft
orright
expression is unordered (NaN)
Miscellaneous Instructions¶
MLIL_NOP
- No operationMLIL_BP
- Breakpoint instructionMLIL_TRAP
- Interrupt/trap instruction withvector
expressionMLIL_INTRINSIC
- Intrinsic instruction defined by the architectureMLIL_FREE_VAR_SLOT
- Free thedest
expression from the register stackMLIL_UNDEF
- The expression performs undefined behaviorMLIL_UNIMPL
- The expression is not implementedMLIL_UNIMPL_MEM
- The expression is not implemented but does accesssrc
memory