不要依赖混淆

Fady Anwar

3.16/5 (32投票s)

2008年2月21日

CPOL

5分钟阅读

104575

一篇演示为何你不应依赖混淆来保护你的 .NET 应用程序的文章。

引言

与原生代码不同，托管代码一直以来都知道可以轻松反编译成其源代码，从而简化其逆向工程，因此产生了我们所谓的混淆。在编译托管代码后，我们对其进行修改，使其反编译器失效并使反编译变得无用，因为反编译将生成无法理解或修改后无法再次编译的垃圾代码。混淆通常是通过将类、方法和变量的名称重命名为随机名称来完成的，使其在反编译时难以阅读，并且在某些混淆器的情况下，输出的混淆应用程序反编译后生成的代码在再次编译时会产生生成错误。但是，尽管混淆有时被证明是有效的，但它存在主要的弱点和限制，使得依赖它不是一个好选择。

为了演示，在本文中，我将使用 C# 作为我的托管代码语言，并且作为 Microsoft Visual Studio 社区版附带的、抢先版本的 Dotfuscator 将作为我的混淆工具。

示例

假设我们有一个应用程序，在执行某个操作之前会检查用户是否已通过身份验证。

private void btnSubmit_Click(object sender, EventArgs e)
{
    //we authenticate the user here using the method Authenticate()
    if (Authenticate())
    {
        //if the user credential is valid then...
        MessageBox.Show("access granted");
        this.Run();
    }
    else
    {
        //else we kick him/her out
        MessageBox.Show("invalid credentials");
        this.Close();
    }
}

当我们混淆这段代码并尝试反编译它时，我们会得到（我使用 Lutz reflector 进行反编译）：

private void a(object A_0, EventArgs A_1)
{
    if (this.c())
    {
        MessageBox.Show("access granted");
        this.b();
    }
    else
    {
        MessageBox.Show("invalid credentials");
        base.Close();
    }
}

显而易见，大部分代码已被重命名，但消息字符串未被触及。此外，.NET 框架使用了像 `MessageBox` 类和 `Show()` 方法这样的类和方法，它们仍然没有被重命名，这是一个大问题。编译反混淆代码的结果可能会导致编译时错误，因为混淆代码可能对方法和类使用相同的名称，但这对于 IL（中间语言）来说并非如此。因此，如果我们只是使用 *ildasm* 来反汇编该应用程序的 EXE，我们会得到如下结果：

C:\Program Files\Microsoft Visual Studio 8\VC>ildasm 
               C:\Dotfuscated\password.exe /out=c:\password.il
.method private hidebysig instance void 
a(object A_0, 
class [mscorlib]System.EventArgs A_1) cil managed 
{ 
    // Code size 44 (0x2c) 
    .maxstack 8 
    IL_0000: ldarg.0 
    IL_0001: call instance bool a::c() 
    IL_0006: brfalse.s IL_001a 
    IL_0008: ldstr "access granted" 
    IL_000d: call valuetype [System.Windows.Forms]System.Windows.Forms.DialogResult 
             [System.Windows.Forms]System.Windows.Forms.MessageBox::Show(string) 
    IL_0012: pop 
    IL_0013: ldarg.0 
    IL_0014: call instance void a::b() 
    IL_0019: ret 
    IL_001a: ldstr "invalid credentials" 
    IL_001f: call valuetype [System.Windows.Forms]System.Windows.Forms.DialogResult 
             [System.Windows.Forms]System.Windows.Forms.MessageBox::Show(string) 
    IL_0024: pop 
    IL_0025: ldarg.0 
    IL_0026: call instance void [System.Windows.Forms]System.Windows.Forms.Form::Close() 
    IL_002b: ret 
} // end of method a::a

正如我们所见，我们的消息字符串再次以明文形式显示。所使用的 .NET 框架命名空间也是如此。这时我们的消息框又出现了：System.Windows.Forms.MessageBox::Show(string)。

那么，问题出在哪里？问题在于，过去用于 Win32 应用程序的旧破解技术仍然可以非常容易地应用于 .NET 程序集。因此，如果我是一名经验丰富的破解者，我会将此应用程序反汇编为 IL，并搜索每次输入无效密码时出现在我面前的“无效凭据”字符串，然后向上查找几行，找到第 IL_0006 行的分支语句，然后非常容易地，我可以将 brfalse 改为 brtrue 并使用 ILAsm 重新编译该应用程序。

C:\Program Files\Microsoft Visual Studio 8\
             VC>ilasm c:\password.il /out=c:\password.exe

下次运行新编译的应用程序时，我输入无效的用户名和密码，将收到“访问已授予”的欢迎消息，而不是被拒绝。显而易见，同样的方法也可以应用于破解许可证密钥等类似的事情。

但那是因为我当前的混淆工具没有混淆消息字符串，对吗？如果我们混淆我应用程序中的每个可用字符串怎么办？我将使用 Dotfuscator 的评估版来执行此操作，该版本具有称为“字符串加密”的功能，我个人不认为这是加密，而是编码或混淆，因为你无法加密事物并同时提供加密算法和密钥。所以，这是字符串混淆后反汇编的代码：

.method private hidebysig instance void 
eval_a(object A_0, 
class [mscorlib]System.EventArgs A_1) cil managed 
{ 
    // Code size 71 (0x47) 
    .maxstack 9 
    .locals init (int32 V_0) 
    IL_0000: ldc.i4 0x3 
    IL_0005: stloc V_0 
    IL_0009: ldarg.0 
    IL_000a: call instance bool eval_a::eval_c() 
    IL_000f: brfalse.s IL_002c 
    IL_0011: ldstr bytearray (F0 90 F2 90 F4 96 F6 92 F8 8A FA 88 FC DD FE 98 
    00 73 02 62 04 6B 06 73 08 6C 0A 6F ) // .s.b.k.s.l.o 
    IL_0016: ldloc V_0 
    IL_001a: call string a$PST06000001(string, int32) 
    IL_001f: call valuetype [System.Windows.Forms]System.Windows.Forms.DialogResult 
             [System.Windows.Forms]System.Windows.Forms.MessageBox::Show(string) 
    IL_0024: pop 
    IL_0025: ldarg.0 
    IL_0026: call instance void eval_a::b() 
    IL_002b: ret 
    IL_002c: ldstr bytearray (F0 98 F2 9D F4 83 F6 96 F8 95 FA 92 FC 99 FE DF 
    00 62 02 71 04 60 06 63 08 6C 0A 65 0C 79 0E 66 // .b.q.`.c.l.e.y.f 
    10 70 12 7F 14 66 ) // .p...f 
    IL_0031: ldloc V_0 
    IL_0035: call string a$PST06000001(string, 
    int32) 
    IL_003a: call valuetype [System.Windows.Forms]System.Windows.Forms.DialogResult 
             [System.Windows.Forms]System.Windows.Forms.MessageBox::Show(string) 
    IL_003f: pop 
    IL_0040: ldarg.0 
    IL_0041: call instance void [System.Windows.Forms]System.Windows.Forms.Form::Close() 
    IL_0046: ret 
} // end of method eval_a::eval_a

注意：“eval”前缀是因为我使用的是 Dotfuscator 的评估版。

正如我们所见，所有字符串都已被混淆，但所有使用的 .NET 框架类和方法名称仍然是明文且可读的，这是因为你可以混淆除 .NET 框架命名空间、类和方法之外的任何内容。这是因为如果你混淆了它们的名称，你如何在用户机器上调用它们呢？

同样，如果我是一名经验丰富的破解者并且知道我在寻找什么，我就会寻找在此应用程序中调用过的最稀有的 .NET 框架方法和类。例如，`MessageBox` 就是一个很好的例子。另外，`Form::Close()` 也是一个不错的选择。我会在新的 IL 中搜索它们，再次向上查找几行，寻找分支语句，直到在第 IL_000f 行找到它，然后再次将其从 brfalse 改为 brtrue，并使用 ILAsm 再次 build 我的应用程序。当我运行它时，我将收到另一个“访问已授予”的消息，正如你所见，这只花了 5 分钟。

但是，那是由于应用程序流程太清晰且未被混淆，对吧？如果我们使用 Dotfuscator 中的“控制流混淆”功能来混淆应用程序流程怎么办？输出的 IL 会是这样的：

.method private hidebysig instance void 
eval_a(object A_0, 
class [mscorlib]System.EventArgs A_1) cil managed 
{ 
    // Code size 81 (0x51) 
    .maxstack 2 
    .locals init (int32 V_0) 
    IL_0000: ldc.i4 0xa 
    IL_0005: stloc V_0 
    IL_0009: ldarg.0 
    IL_000a: call instance bool eval_a::eval_c() 
    IL_000f: brfalse.s IL_0036 
    IL_0011: ldc.i4.1 
    IL_0012: br.s IL_0017 
    IL_0014: ldc.i4.0 
    IL_0015: br.s IL_0017 
    IL_0017: brfalse.s IL_0019 
    IL_0019: br.s IL_001b 
    IL_001b: ldstr bytearray (57 39 59 39 5B 3F 5D 
                   3B 5F 13 61 11 63 44 65 01 // W9Y9[?];_.a.cDe. 
                   67 1A 69 0B 6B 02 6D 1A 6F 15 71 16 ) // g.i.k.m.o.q. 
    IL_0020: ldloc V_0 
    IL_0024: call string a$PST06000001(string, 
    int32) 
    IL_0029: call valuetype [System.Windows.Forms]System.Windows.Forms.DialogResult 
             [System.Windows.Forms]System.Windows.Forms.MessageBox::Show(string) 
    IL_002e: pop 
    IL_002f: ldarg.0 
    IL_0030: call instance void eval_a::b() 
    IL_0035: ret 
    IL_0036: ldstr bytearray (57 31 59 34 5B 2A 5D 
             3F 5F 0C 61 0B 63 00 65 46 // W1Y4[*]?_.a.c.eF 
             67 0B 69 18 6B 09 6D 0A 6F 15 71 1C 73 00 75 1F // g.i.k.m.o.q.s.u. 
             77 19 79 16 7B 0F ) // w.y.{. 
    IL_003b: ldloc V_0 
    IL_003f: call string a$PST06000001(string, int32) 
    IL_0044: call valuetype [System.Windows.Forms]System.Windows.Forms.DialogResult 
             [System.Windows.Forms]System.Windows.Forms.MessageBox::Show(string) 
    IL_0049: pop 
    IL_004a: ldarg.0 
    IL_004b: call instance void [System.Windows.Forms]System.Windows.Forms.Form::Close() 
    IL_0050: ret 
} // end of method eval_a::eval_a

有很多分支，但通过肉眼检查，所有分支都指向其他分支，这些分支又指向其他分支，直到它们到达一个真正的分支。而且，通过肉眼检查，它们都没有条件——它们只是分支。所以，很容易就能发现第 IL_000f 行的真正分支，并执行同样的操作，将条件从 false 改为 true，然后构建应用程序，我们又会得到之前的结果。

但这仅仅是因为代码不够复杂。如果我们让代码稍微复杂一些，并使用之前的方法来混淆它呢？

看起来像这样的代码：

private void btnSubmit_Click(object sender, EventArgs e) 
{ 
    if (CheckConnection()) 
    { 
        if (CheckDB()) 
        { 
            //we authenticate the user here using the method Authenticate() 
            if (Authenticate()) 
            { 
                //if the user credential is valid then... 
                MessageBox.Show("access granted"); 
                this.Run(); 
            } 
            else 
            { 
                //else we kick him out 
                MessageBox.Show("invalid credentials"); 
                this.Close(); 
            } 
        } 
    } 
}

在反汇编后会变成这样：

.method private hidebysig instance void 
eval_a(object A_0, 
class [mscorlib]System.EventArgs A_1) cil managed 
{ 
    // Code size 81 (0x51) 
    .maxstack 2 
    .locals init (int32 V_0) 
    IL_0000: ldc.i4 0xa 
    IL_0005: stloc V_0 
    IL_0009: ldarg.0 
    IL_000a: call instance bool eval_a::eval_c() 
    IL_000f: brtrue.s IL_0036 
    IL_0011: ldc.i4.1 
    IL_0012: br.s IL_0017 
    IL_0014: ldc.i4.0 
    IL_0015: br.s IL_0017 
    IL_0017: brfalse.s IL_0019 
    IL_0019: br.s IL_001b 
    IL_001b: ldstr bytearray (57 39 59 39 5B 3F 5D 
             3B 5F 13 61 11 63 44 65 01 // W9Y9[?];_.a.cDe. 
             67 1A 69 0B 6B 02 6D 1A 6F 15 71 16 ) // g.i.k.m.o.q. 
    IL_0020: ldloc V_0 
    IL_0024: call string a$PST06000001(string, 
    int32) 
    IL_0029: call valuetype [System.Windows.Forms]System.Windows.Forms.DialogResult 
             [System.Windows.Forms]System.Windows.Forms.MessageBox::Show(string) 
    IL_002e: pop 
    IL_002f: ldarg.0 
    IL_0030: call instance void eval_a::b() 
    IL_0035: ret 
    IL_0036: ldstr bytearray (57 31 59 34 5B 2A 5D 3F 
             5F 0C 61 0B 63 00 65 46 // W1Y4[*]?_.a.c.eF 
             67 0B 69 18 6B 09 6D 0A 6F 15 71 1C 73 00 75 1F // g.i.k.m.o.q.s.u. 
    77 19 79 16 7B 0F ) // w.y.{. 
    IL_003b: ldloc V_0 
    IL_003f: call string a$PST06000001(string, 
    int32) 
    IL_0044: call valuetype [System.Windows.Forms]System.Windows.Forms.DialogResult 
             [System.Windows.Forms]System.Windows.Forms.MessageBox::Show(string) 
    IL_0049: pop 
    IL_004a: ldarg.0 
    IL_004b: call instance void [System.Windows.Forms]System.Windows.Forms.Form::Close() 
    IL_0050: ret 
} // end of method eval_a::eval_a

这次，破解起来更难了，但通过肉眼检查，我们可以看到只有两个条件分支，分别在第 IL_000f 和 IL_0017 行，在我找到 `Form::Close()` 方法之前。我可以尝试它们，或者我可以直接在找到 `Form::Close()` 方法和 `MessageBox::Show(string)` 方法之前的第一个分支进行修改，然后一次又一次地构建应用程序，直到我得到另一个“访问已授予”的消息。

那么，结论是什么？

结论

嗯，混淆是保护我们知识产权的好方法，它比直接将敏感信息以明文形式暴露要好。但是，正如我在本文中演示的，我们不能仅仅依靠混淆来保护我们的应用程序，因为很容易破解任何仅依赖混淆进行保护的应用程序。而我这样做时没有任何特殊工具，而且只花了短短几分钟。

感谢阅读，我期待您的评论和反馈。