解析方法体的 IL

Sorin Serban

4.90/5 (64投票s)

2006年5月9日

CPOL

4分钟阅读

195730

6152

本文展示了如何从 `MethodBody.GetILAsByteArray()` 方法提供的 IL 数组中获得可读且可编程的结果。

Screenshot of the demp

引言

.NET 通过其 `System.Reflection` 命名空间提供了检查程序集的能力。您可以获取其中定义的所有类型、字段、属性以及您需要的基本所有内容。但仍然缺少某些东西：方法的体。在进行深入检查时，您会期望找到方法体内部使用的变量、循环和决策。微软忽视了这一需求，但仍然为我们提供了一些东西：IL 代码。然而，这还不够，因为它实际上是一个字节数组，对于普通程序员来说毫无意义。

需要的是一系列对象来表示该 IL 代码中的实际指令。这正是我想要提供的。

背景

任何使用过反射的程序员都听说过 Lutz Roeder 编写的非常棒的 reflector。reflector 可以反编译任何 .NET 程序集，并为用户提供给定程序集中每个编程元素的等效代码。

您会注意到我说了“等效”。这主要是因为反射机制无法为您提供原始代码。编译过程会首先删除任何注释和未使用的变量。只有有效和必要的代码会被添加到编译后的代码中。因此，我们无法获得完全相同的代码。

reflector 是一个很棒的工具，但我们可能希望用自己的代码获得类似的结果。我们该如何做到呢？让我们先看看经典的“hello world”示例，看看我们想要实现什么，以及框架实际提供给我们的东西。这是经典的 C# 代码：

public void SayHello()
{
    Console.Out.WriteLine("Hello world");
}

当我们在反射中使用 `SayHello` 方法的体并请求 IL 代码时，我们会得到一个字节数组，例如：

0,40,52,0,0,10,114,85,1,0,112,111,53,0,0,10,0,42

嗯，这可读性不高。我们所知道的是，这是 IL 代码，我们想将其转换为可处理的形式。最简单的方法是将其转换为 MSIL（Microsoft Intermediate Language）。`SayHello` 方法的 MSIL 代码看起来像这样，这也是我的库应该返回的内容：

0000 : nop
0001 : call System.IO.TextWriter System.Console::get_Out()
0006 : ldstr "Hello world"
0011 : callvirt instance System.Void System.IO.TextWriter::WriteLine()
0016 : nop
0017 : ret

Using the Code

`SDILReader` 是一个只包含三个类的库。要获取方法的体的 MSIL，只需创建一个 `MethodBodyReader` 对象，并将其构造函数传递一个 `MethodInfo` 对象，该对象是要分解的对象。

MethodInfo mi = null;
// obtain somehow the method info of the method we want to dissasemble
// ussually you open the assembly, get the module, get the type and then the 
// method from that type 
// 
...
// instantiate a method body reader
SDILReader.MethodBodyReader mr = new MethodBodyReader(mi);
// get the text representation of the msil
string msil = mr.GetBodyCode();  
// or parse the list of instructions of the MSIL
for (int i=0; i<mr.instructions.Count;i++)
{
    // do something with mr.instructions[i]
}

它是如何工作的

好吧，这是个好问题。为了开始，我们首先需要了解 .NET 反射机制提供的 IL 数组的结构。

IL 代码结构

IL 实际上是必须执行的操作的枚举。一个操作是成对的：<操作码, 操作数>。操作码是 `System.Reflection.Emit.OpCode` 的字节值，而操作数是操作正在处理的实体的元数据信息的地址，即方法、类型、值。此地址被 .NET 框架称为元数据令牌。因此，为了解释该数组，我们必须执行类似以下的操作：

获取下一个字节，看看我们正在处理的是什么操作。
根据操作，元数据令牌定义在接下来的 1、2、3 或 4 个字节中。获取操作数的元数据令牌。
使用 `MethodInfo.Module` 对象来检索元数据令牌所指向的对象。
存储对 <操作, 操作数>。
如果不在 IL 数组的末尾，则重复。

ILInstruction

`ILInstruction` 类用于存储 <操作, 操作数> 对。此外，我们还有一个简单的方法可以将内部信息转换为可读的 `string`。

MethodBodyReader

`MethodBodyReader` 类完成了所有繁重的工作。在构造函数内部，调用了一个 `private` 方法 `ConstructInstructions`，该方法解析 IL 数组。

int position = 0;
instructions = new List<ILInstruction>();
while (position < il.Length)
{
    ILInstruction instruction = new ILInstruction();

    // get the operation code of the current instruction
    OpCode code = OpCodes.Nop;
    ushort value = il[position++];
    if (value != 0xfe)
    {
        code = Globals.singleByteOpCodes[(int)value];
    }
    else
    {
        value = il[position++];
        code = Globals.multiByteOpCodes[(int)value];
        value = (ushort)(value | 0xfe00);
    }
    instruction.Code = code;
    instruction.Offset = position - 1;
    int metadataToken = 0;
    // get the operand of the current operation
    switch (code.OperandType)
    {
        case OperandType.InlineBrTarget:
            metadataToken = ReadInt32(il, ref position);
            metadataToken += position;
            instruction.Operand = metadataToken;
            break;
        case OperandType.InlineField:
            metadataToken = ReadInt32(il, ref position);
            instruction.Operand = module.ResolveField(metadataToken);
            break;
        ....
    }
    instructions.Add(instruction);
}

这里我们看到用于解析 IL 的简单循环。嗯，它并不那么简单。它实际上有 18 种情况，并且我没有考虑所有操作符，只考虑了最常见的操作符。有 240 多个操作符。操作符在应用程序启动时加载到两个 `static` 列表中。

public static OpCode[] multiByteOpCodes;
public static OpCode[] singleByteOpCodes;

public static void LoadOpCodes()
{
    singleByteOpCodes = new OpCode[0x100];
    multiByteOpCodes = new OpCode[0x100];
    FieldInfo[] infoArray1 = typeof(OpCodes).GetFields();
    for (int num1 = 0; num1 < infoArray1.Length; num1++)
    {
        FieldInfo info1 = infoArray1[num1];
        if (info1.FieldType == typeof(OpCode))
        {
            OpCode code1 = (OpCode)info1.GetValue(null);
            ushort num2 = (ushort)code1.Value;
            if (num2 < 0x100)
            {
                singleByteOpCodes[(int)num2] = code1;
            }
            else
            {
                if ((num2 & 0xff00) != 0xfe00)
                {
                    throw new Exception("Invalid OpCode.");
                }
                multiByteOpCodes[num2 & 0xff] = code1;
            }
        }
    }
}

创建对象后，我们可以使用该对象来解析指令列表或获取它们的 `string` 表示形式。就这样，尽情反编译吧。

未来工作

好了，现在剩下的是将 MSIL 转换为 C# 代码。

历史

2006 年 5 月 9 日

发布了原始版本

2007 年 6 月 28 日

经过很长一段时间，我终于能够回顾我文章的读者提出的问题。以下是结果：

我增加了对泛型的支持。
现在 `OperandType.InlineTok` 也得到了正确处理。
修复了各种其他小问题。

请务必从项目开头的链接再次下载源代码。

参考文献

MSDN