Binary Ninja Intermediate Language Series, Part 2: Medium Level IL¶
Binary Ninja Intermediate Language: Medium Level IL¶
The Medium Level Intermediate Language (MLIL) is the second major representation in the Binary Ninja Intermediate Language (BNIL) family of intermediate languages. Much like LLIL this representation is tree based and has many of the same instructions. This representation is distinct in a few key ways.
- Registers have been translated to variables.
- The stack as a concept is not present.
- Variables have types associated with them.
- Call sites have their parameters inferred and associated with them.
- Data flow has been calculated and constants are propagated.
- Some dead code is eliminated (MLIL only, Mapped MLIL doesn't do this)
Purposes of MLIL¶
- Simplified representation
- Small discrete operations
- Can be more accurate to binary representation than decompilation
- Powerful Data flow (PossibleValueSet) APIs
- Accurate (though verbose) variable identification
In the rest of this article we will explore the variable object, the type object, the confidence system, and finally the instruction set.
The Variable Object¶
First, it's important to understand what we mean when we talk about a MLIL variable. Continuing from our example above we can get a Variable object.
>>> inst.output
[<var int64_t rax>]
>>> var = inst.output[0]
>>> type(var)
<class 'binaryninja.function.Variable'>
Variables in MLIL have a very specific meaning, that is not completely obvious at first. They represent a single storage location within the scope of a single function. To those not well versed in program analysis, a storage location is where a value is located at a given point in time. In the process of compilation a compiler conducts a step called Register Allocation; this is the process of figuring out how to map the potentially infinite number of variables specified in the original source code to a finite set of registers. When there are more variables and intermediate values than registers available, the compiler spills them on to the stack. Thus, a single high-level-language variable can be mapped across a number of storage locations. A variable can simultaneously be in multiple registers and on the stack at the same time. However, unlike high-level-language variables, MLIL variables represent one and only one storage location. Binary Ninja's High Level IL (HLIL) will be responsible for storing this mapping.
So let's look at the properties available on a Variable object.
source_type¶
The source_type represents the storage location type and can be one of the following :
enum VariableSourceType
{
StackVariableSourceType,
RegisterVariableSourceType,
FlagVariableSourceType
};
>>> var.source_type
<VariableSourceType.RegisterVariableSourceType: 1>
storage¶
The storage property changes meaning depending on the VariableSourceType. When a variable is of type RegisterVariableSourceType, its storage property represents the index into the register list for the given architecture. If the source_type is StackVariableSourceType, its storage property represents the stack offset of the variable.
>>> var
<var int64_t rax>
>>> var.source_type
<VariableSourceType.RegisterVariableSourceType: 1>
>>> bv.arch._regs_by_index[var.storage]
'rax'
>>> var2
<var int64_t var_260>
>>> var2.source_type
<VariableSourceType.StackVariableSourceType: 0>
>>> hex(var2.storage)
'-0x260'
Given the above information it might now be intuitive how variable names are constructed. First we determine the source_type of the variable. If it's a RegisterVariableSourceType we just use the register's name directly. If it’s a StackVariableSourceType then we use var_ + hex(-storage). Finally, we append a count each time that that storage location is reused.
index¶
The index is an identifier chosen to be unique across different analysis passes.
type¶
The type property returns the Type object associated with the variable:
>>> var.type
<type: int64_t, 0% confidence>
Type objects are described in detail in the next section.
The Type Object¶
Type objects are very similar to standard C types. A Type object's type can be determined through the object’s type_class property. Valid types are in the TypeClass enumeration:
enum TypeClass
{
VoidTypeClass = 0,
BoolTypeClass = 1,
IntegerTypeClass = 2,
FloatTypeClass = 3,
StructureTypeClass = 4,
EnumerationTypeClass = 5,
PointerTypeClass = 6,
ArrayTypeClass = 7,
FunctionTypeClass = 8,
VarArgsTypeClass = 9,
ValueTypeClass = 10,
NamedTypeReferenceClass = 11,
WideCharTypeClass = 12
};
Type objects all contain a confidence property; this is currently only used for type inference, but can also be used by users implementing their own analyses. Below is a reference for each of the type objects and their unique properties.
VoidTypeClass¶
A void object is one that nothing is known about. For instance if a reference is taken of a static memory address a variable will be created there with a void type as we know the address is used, but are unsure about what size is being accessed. The instruction that takes the address of the static memory address will be a void pointer.
BoolTypeClass¶
A boolean type is an integer which has a value of False (0) or True (!0).
IntegerTypeClass¶
An integer type has a sign, a width (in bytes), and a display type. The display type determines how the integer should be displayed; the options are self-explanatory:
enum IntegerDisplayType
{
DefaultIntegerDisplayType,
BinaryDisplayType,
SignedOctalDisplayType,
UnsignedOctalDisplayType,
SignedDecimalDisplayType,
UnsignedDecimalDisplayType,
SignedHexadecimalDisplayType,
UnsignedHexadecimalDisplayType,
CharacterConstantDisplayType,
PointerDisplayType,
FloatDisplayType,
DoubleDisplayType
};
FloatTypeClass¶
The float type is an IEEE 754 variable precision type, and can represent floating point numbers up to 10 bytes in width. All floating point numbers are assumed to be signed.
WideCharTypeClass¶
The wide character holds a unicode character constant whose interpretation can change depending on the analysis.unicode group of settings.
VarArgsTypeClass¶
A varargs type is used to indicate that a function is variadic and thus represents the set of additional parameters being passed to a given function.
ValueTypeClass¶
A value type is simply a constant value. It is used mainly in demangling for types which only have a name or value.
FunctionTypeClass¶
The function type describes the return type, parameter list, and calling convention of a function, among many other properties.
can_return- boolean value indicating if the function can returncalling_convention- the calling convention this function usesconst- boolean value indicating if this a const functionhas_variable_arguments- boolean value indicating if this function is variadicparameters- contains a list ofTypeobjectsplatform- thePlatformobject associated with this functionreturn_value- the return type of this functionstack_adjustment- the size in bytes of the stack adjustment that this function makes
PointerTypeClass¶
A pointer type simply describes a pointer and what it points to in the target/element_type property.
ArrayTypeClass¶
Array types function similarly to pointer types however the array type knows how large the object that it points to is:
target/element_type- the type of element this array is constructed ofcount- the count of array elementswidth- the size of the array (count * target.width)
EnumerationTypeClass¶
Enumeration types function much the same way they do in C, providing a mapping between a name and corresponding constant. The object itself contains a members property and a list of EnumerationMember objects each containing a name and value.
StructureTypeClass¶
Structure types are simple in principle but are complicated by the need for them to be referenced by a NamedTypeReference for them to be useful. Structures come in 3 different flavors: struct, class, and union. While the first two simply differ in name, in unions all members overlap. Structure objects contain a list of StructureMembers. StructureMember objects contain a name, offset, and type. Structures can be packed or aligned, accessible by the packed property.
NamedTypeReferenceClass¶
NamedTypeReference types are symbolic references to other types. They function much like a C typedef (i.e. Name X corresponds to type Y). The NamedTypeReference has a type_class property describing what sort of type it is pointing at.
enum NamedTypeReferenceClass
{
UnknownNamedTypeClass = 0,
TypedefNamedTypeClass = 1,
ClassNamedTypeClass = 2,
StructNamedTypeClass = 3,
UnionNamedTypeClass = 4,
EnumNamedTypeClass = 5
};
Most of the above should be self-explanatory except for the UnknownNamedTypeClass which is used in the name demangler, as the mangler doesn't disambiguate between named Enumerations and named Structures. NamedTypeReference objects also have a UUID type_id.
The Instruction Set¶
The instruction set is made up of MediumLevelILInstruction objects. Let's start exploring by using the python console to poke around at some instructions. Open up a binary in Binary Ninja and retrieve an MLIL instruction:
>>> inst = current_mlil[8]
<il: rax = 0x402cb0("PORT")>
>>> type(inst)
<class 'binaryninja.mediumlevelil.MediumLevelILInstruction'>
current_mlil is mapped to whatever function is currently being viewed and is not generally available to those writing plugins, as your plugin could be headless. The bracket operators tell the API to get the MLIL instruction at index 8 for the current function.
There are a number of properties that can be queried on the MediumLevelILInstruction object, and the validity of these properties changes depending on what the current operation is. If we look at the operation of inst we can see it is a MLIL_CALL instruction.
>>> inst.operation
<MediumLevelILOperation.MLIL_CALL: 51>
From the code in mediumlevelil.py we can see that the MLIL_CALL operation has three properties in addition to the operations available to all MediumLevelILInstruction objects
MediumLevelILOperation.MLIL_CALL: [("output", "var_list"), ("dest", "expr"), ("params", "expr_list")],
Thus, we can query the call's output which is a list of variables:
>>> inst.output
[<var int64_t rax>]
The call's dest (destination expression) which in this case is a MLIL_CONST_PTR:
>>> inst.dest
<il: 0x402cb0>
>>> inst.dest.operation
<MediumLevelILOperation.MLIL_CONST_PTR: 14>
>>> inst.dest.value
<const ptr 0x402cb0>
>>> hex(inst.dest.value.value)
'0x402cb0'
The parameter list can be accessed through the params property:
>>> inst.params
[<il: "PORT">]
>>> inst.params[0]
<il: "PORT">
>>> type(inst.params[0])
<class 'binaryninja.mediumlevelil.MediumLevelILInstruction'>
Control Flow¶
MLIL_JUMP- Branch to thedestexpression's addressMLIL_JUMP_TO- A jump table dispatch instruction. Uses thedestexpression to calculate the MLIL instruction targettargetsto branch toMLIL_CALL- Branch to thedestexpression function, saving the return address, with the list of parametersparamsand returning the list of return valuesoutputMLIL_CALL_UNTYPED- This is a call instruction where stack resolution could not be determined, and thus a list of parameters and return values do not existMLIL_CALL_OUTPUT- This expression holds a set of return valuesdestfrom a callMLIL_CALL_PARAM- This expression holds the set of parameterssrcfor a call instructionMLIL_RET- Return to the calling function.MLIL_RET_HINT- Indirect jump todestexpression (only used in internal analysis passes.)MLIL_NORET- This instruction will never be executed, the instruction before it is a call that doesn't returnMLIL_IF- Branch to thetrue/falseMLIL instruction identifier depending on the result of theconditionexpressionMLIL_GOTO- Branch to thedestexpression idMLIL_TAILCALL- This instruction calls the expressiondestusingparamsas input andoutputfor return valuesMLIL_TAILCALL_UNTYPED- A tailcall where the stack resolution could not be determined and thus a list of parameters and return values do not existMLIL_SYSCALL- Make a system/service call with parametersparamsand outputoutputMLIL_SYSCALL_UNTYPED- Makes a system/service call, but an exact set of parameters couldn't be determined.
Variable Reads and Writes¶
MLIL_SET_VAR- Sets a variabledestto the result of an expressionsrcMLIL_SET_VAR_ALIASED- Sets a variableprevto the result of an expressionsrcwith the additional information that other variables point to the same variable destinationMLIL_SET_VAR_ALIASED_FIELD- Sets a field at anoffsetof the variableprevwith the expressionsrcwith the additional information that the variable is alised by other variablesMLIL_SET_VAR_FIELD- Sets variabledestatoffsetto thesrcexpressionMLIL_SET_VAR_SPLIT- Sets a pair of variableshigh:lowto the result of thesrcexpressionMLIL_LOAD- Readsizebytes from the memory addresssrcMLIL_LOAD_STRUCT- Read from the struct offset atsrc+offsetMLIL_STORE- Storessizebytes intodestfromsrcMLIL_STORE_STRUCT- Storessizebytes into struct offsetdest+offsetfromsrcMLIL_VAR- A variable expressionsrcMLIL_VAR_ALIASED- A variable expressionsrcthat is known to have other variables pointing to the same destinationMLIL_VAR_ALIASED_FIELD-MLIL_VAR_FIELD- A variable and offset expressionsrc,offsetMLIL_VAR_SPLIT- A split pair of variableshigh:lowwhich can be used a single expressionMLIL_VAR_PHI- APHIrepresents the combination of several prior versions of a variable when different basic blocks coalesce into a single destination and it's unknown which path was taken.MLIL_MEM_PHI- A memoryPHIrepresents memory modifications that could have occurred down different source basic blocks similar to aVAR_PHI.MLIL_ADDRESS_OF- The address of variablesrcMLIL_ADDRESS_OF_FIELD- The address andoffsetof the variablesrcMLIL_CONST- A constant integral valueconstantMLIL_CONST_DATA- A constant data referenceconstant data referenceMLIL_CONST_PTR- A constant integral value which is used as a pointerconstantMLIL_EXTERN_PTR- A symbolic pointerconstant+offsetto a symbol that exists outside the binaryMLIL_FLOAT_CONST- A floating point constantconstantMLIL_IMPORT- Aconstantintegral value representing an imported addressMLIL_LOW_PART-sizebytes from the low end ofsrcexpression
Arithmetic Operations¶
MLIL_ADD- Addsleftexpression torightexpressionMLIL_ADC- Adds with carry theleftexpression to therightexpression with carry from thecarryexpressionMLIL_SUB- Subtracts therightexpression from theleftexpressionMLIL_SBB- Subtraction with borrow therightexpression from theleftexpression with carry from thecarryexpressionMLIL_AND- Bitwise ANDleftexpression with therightexpressionMLIL_OR- Bitwise ORleftexpression with therightexpressionMLIL_XOR- Bitwise XORleftexpression with therightexpressionMLIL_LSL- Logical shift left theleftexpression by the number of bits stored in therightexpressionMLIL_LSR- Logical shift right theleftexpression by the number of bits stored in therightexpressionMLIL_ASR- Arithmetic shift right theleftexpression by the number of bits stored in therightexpressionMLIL_ROL- Rotate left theleftexpression by the number of bits stored in therightexpressionMLIL_RLC- Rotate left with carry theleftexpression and thecarryexpression by the number of bits stored in therightexpressionMLIL_ROR- Rotate right theleftexpression by the number of bits stored in therightexpressionMLIL_RRC- Rotate right with carry theleftexpression and thecarryexpression by the number of bits stored in therightexpressionMLIL_MUL- Single-precision multiply theleftexpression with therightexpressionMLIL_MULU_DP- Double-precision unsigned multiply theleftexpression with therightexpression, result expression is twice the size of the input expressionsMLIL_MULS_DP- Double-precision signed multiply theleftexpression with therightexpression, result expression is twice the size of the input expressionsMLIL_DIVU- Unsigned single-precision divideleftexpression by therightexpressionMLIL_DIVU_DP- Unsigned double-precision divideleftexpression by therightexpressionMLIL_DIVS- Signed single-precision divideleftexpression by therightexpressionMLIL_DIVS_DP- Signed double-precision divideleftexpression by therightexpressionMLIL_MODU- Unsigned single-precision modulus ofleftexpression by therightexpressionMLIL_MODU_DP- Unsigned double-precision modulus ofleftexpression by therightexpressionMLIL_MODS- Signed single-precision modulus ofleftexpression by therightexpressionMLIL_MODS_DP- Signed double-precision modulus ofleftexpression by therightexpressionMLIL_NEG- Sign inversion ofsrcexpressionMLIL_NOT- Bitwise inversion ofsrcexpressionMLIL_FADD- IEEE754 floating point addition ofleftexpression withrightexpressionMLIL_FSUB- IEEE754 floating point subtraction ofleftexpression withrightexpressionMLIL_FMUL- IEEE754 floating point multiplication ofleftexpression withrightexpressionMLIL_FDIV- IEEE754 floating point division ofleftexpression withrightexpressionMLIL_FSQRT- IEEE754 floating point square root ofleftexpression withrightexpressionMLIL_FNEG- IEEE754 floating point sign negation ofsrcexpressionMLIL_FABS- IEEE754 floating point absolute value ofsrcexpressionMLIL_FLOAT_TO_INT- IEEE754 floating point to integer conversion ofsrcexpressionMLIL_INT_TO_FLOAT- Integer to IEEE754 floating point conversion ofsrcexpressionMLIL_FLOAT_CONV- Convert bytes insrcexpression to IEEE754 floating pointMLIL_ROUND_TO_INT- Rounds the IEEE754 floating point numbersrcexpressionMLIL_FLOOR- Computes the floating point floor of the IEEE754 number insrcMLIL_CEIL- Computes the floating point floor of the IEEE754 number insrcMLIL_FTRUNC- Computes the floating point truncation of the IEEE754 number insrcMLIL_SX- Sign extends thesrcexpressionMLIL_ZX- Zero extends thesrcexpressionMLIL_ADD_OVERFLOW- Calculates overflow of the addition ofleftexpression withrightexpressionMLIL_BOOL_TO_INT- Converts a boolsrcto an integer
Comparison Instructions¶
MLIL_CMP_E- Compare expression evaluates to true ifleftexpression is equal torightMLIL_CMP_NE- Compare expression evaluates to true ifleftexpression is not equal torightMLIL_CMP_SLT- Compare expression evaluates to true ifleftexpression is signed less thanrightMLIL_CMP_ULT- Compare expression evaluates to true ifleftexpression is unsigned less thanrightMLIL_CMP_SLE- Compare expression evaluates to true ifleftexpression is signed less than or equal torightMLIL_CMP_ULE- Compare expression evaluates to true ifleftexpression is unsigned less than or equal torightMLIL_CMP_SGE- Compare expression evaluates to true ifleftexpression is signed greater than or equal torightMLIL_CMP_UGE- Compare expression evaluates to true ifleftexpression is unsigned greater than or equal torightMLIL_CMP_SGT- Compare expression evaluates to true ifleftexpression is signed greater thanrightMLIL_CMP_UGT- Compare expression evaluates to true ifleftexpression is unsigned greater thanrightMLIL_TEST_BIT- Test if bitrightin expressionleftis setMLIL_FCMP_E- Floating point compare expressions - evaluates to true ifleftexpression is equal torightMLIL_FCMP_NE- Floating point compare expressions - evaluates to true ifleftexpression is not equal torightMLIL_FCMP_LT- Floating point compare expressions - evaluates to true ifleftexpression is less thanrightMLIL_FCMP_LE- Floating point compare expressions - evaluates to true ifleftexpression is less than or equal torightMLIL_FCMP_GE- Floating point compare expressions - evaluates to true ifleftexpression is greater than or equal torightMLIL_FCMP_GT- Floating point compare expressions - evaluates to true ifleftexpression is greater thanrightMLIL_FCMP_O- Floating point compare expressions - evaluates to true if bothleftandrightexpressions are ordered (not NaN)MLIL_FCMP_UO- Floating point compare expressions - evaluates to true if eitherleftorrightexpression is unordered (NaN)
Miscellaneous Instructions¶
MLIL_NOP- No operationMLIL_BP- Breakpoint instructionMLIL_TRAP- Interrupt/trap instruction withvectorexpressionMLIL_INTRINSIC- Intrinsic instruction defined by the architectureMLIL_FREE_VAR_SLOT- Free thedestexpression from the register stackMLIL_UNDEF- The expression performs undefined behaviorMLIL_UNIMPL- The expression is not implementedMLIL_UNIMPL_MEM- The expression is not implemented but does accesssrcmemory
