1. 引入

大量的恶意软件都使用了混淆技术来逃检测。查了下Android混淆技术，看了如下两篇资料：

Understanding Android Obfuscation Techniques: A Large-Scale Investigation in the Wild
https://github.com/ClaudiuGeorgiu/Obfuscapk

对Android的混淆方法（Obfuscation）与混淆检测方法有了基本的理解，记录如下。

2. Android的混淆方法

2.1 四大混淆方法

这是第一篇论文中写的，四种比较流行的混淆方法

identifier renaming：标识符重命名，比如改变变量名、函数名、class名
string encryption：字符串加密
Java reflection：反射机制（很多恶意软件会对函数名加密后再利用反射，进一步提高检测难度）
packing：加壳

这里提到的四种方法，前三种是比较常用的混淆方式，比较特别的是他把加壳也算作混淆的一种

2.2 更工程化的混淆方法

第二个链接进去是Obfuscapk加壳工具的readme，Obfuscapk有如下更多的工程上常用的混淆方法：

Trivial: as the name suggests, this category includes simple operations (that do not modify much the original application), like signing the apk file with a new signature.

这是比较简单的对原始APK做些小改变，所谓小改变就是不改动原始代码（DEX），比如重新对APK签名。

Rename: operations that change the names of the used identifiers (classes, fields, methods).

改变标识符的名字（class，fields，methods）。

Encryption: packaging encrypted code/resources and decrypting them during the app execution. When Obfuscapk starts, it automatically generates a random secret key (32 characters long, using ASCII letters and digits) that will be used for encryption.

对代码、资源文件加密（笔者理解就是第一代加壳）。

Code: all the operations that involve the modification of the decompiled source code.

对DEX中的“源代码”（smali）做改动

Resources: operations on the resource files (like modifying the manifest).

对资源文件、manifest文件做改动。比如对manifest中的item做重新排列。

2.3 更细节的混淆方法

这也是Obfuscapk的github中提到的，可以认为是Android混淆方法的具体指导方针：

AdvancedReflection [Code]

使用反射机制来调用dangerous API（比如获取手机电话号码），这是改动code的一种方式。

ArithmeticBranch [Code]

插入垃圾代码（无意义的代码），类似花指令

AssetEncryption [Encryption]

加密asset文件

CallIndirection [Code]

保持代码逻辑不变的前提下，改变控制流图（CFG）
比如为要调用的method增加一些wrapper methods

ClassRename [Rename]

改变package的名字
改变class的名字

ConstStringEncryption [Encryption]

对代码中的常量字符串进行加密（dex string）

DebugRemoval [Code]

移除调试信息

FieldRename [Rename]

对field进行重命名

Goto [Code]

在method中插入一些goto
这是改变CFG的一种方法
比如，给定一个method，先插入goto跳转到method尾部，然后再从尾部插入goto跳转到method真正的起始位置

LibEncryption [Encryption]

加密 native libs（so文件）

MethodOverload [Code]

使用java的overloading机制
比如，建立新的同名函数，插入随机arguments

MethodRename [Rename]

改变函数名

NewAlignment [Trivial]

Realign the application.

NewSignature [Trivial]

对应用重新签名

Nop [Code]

随机插入NOP指令

RandomManifest [Resource]

对manifest文件中的内容进行重新随机排列
xml文件中的item以及属性是无序的，可以任意排列

Rebuild [Trivial]

重新build为新的apk文件

Reflection [Code]

反射机制

Reorder [Code]

改变代码块的顺序

ResStringEncryption [Encryption]

加密资源文件中的字符串

VirusTotal [Other]

混淆后发给VT扫描（以检测混淆后的效果，对恶意软件来源是有效的）

3. 混淆检测方法

第一篇paper中提到了简单的混淆检测方法，能用于检测标识符重命名，字符串加密，反射，加壳这四种情况。

带label的样本来源

APK使用公开数据集，比如F-Droid
用不同的混淆方法，对数据集中的样本，使用不同的工具做混淆后，打label
label为这四种其中之一：标识符重命名，字符串加密，反射，加壳

特征提取

对于标识符重命名分类
- 提取所有标识符
- 对标识符按字符提取3-gram
- 所有标识符按照3-gram组成固定长度的向量
对于字符串加密分类
- 和上面类似，也是对字符串提取3-gram组成固定长度的向量

分类方法（混淆检测方法）

标识符重命名，和字符串加密，提取特征后，使用SVM进行分类
反射：按照pattern进行检测，比如： [Class.forName() → getMethod() → invoke()].
加壳：按照如下pattern，能检测6中常用壳
- 壳1：Ali
  - 文件pattern：lib/armeabi/libmobisec.so | aliprotect.dat
  - 代码pattern：com.ali.fixHelper | com.ali.mobisecenhance.StubApplication
- 壳2：Tencent
  - 文件pattern：lib/armeabi/libmain.so | lib/armeabi/libshell.so | lib/armeabi/mix.dex
  - 代码pattern：com.tencent.StubShell
- 壳3：Qihoo
  - 文件pattern：assets/libjiagu.so
  - 代码pattern：com.qihoo.util.StubApplication
- 壳4：iJiami
  - 文件pattern：assets/ijiami.dat | */armeabi/libexec.so | */armeabi/libexecmain.so`
  - 代码pattern：com.shell.SuperApplication
- 壳5：Bangcle
  - 文件pattern：assets/bangcle_classes.jar | lib/armeabi/libsecexe.so | lib/armeabi/libsecmain.so
  - 代码pattern：com.secshell.shellwrapper.SecAppWrapper | com.bangcle.protect.ApplicationWrapper
- 壳6：Baidu
  - 文件pattern：assets/baiduprotect.jar | lib/armeabi/libbaiduprotect.so
  - 代码pattern：com.baidu.protect.StubApplication

pattern简单来说就是字符串匹配、文件名匹配，或者正则匹配这样的匹配技术。

4. 不同数据集下的混淆技术使用分布

第一篇参考文章还有个比较有意思的事情，他统计了不同混淆技术在不同数据集下的使用分布：

在这里插入图片描述

可见，这四种混淆技术，都被恶意软件大量使用。而且，像标识符重命名/反射这样的机制也被大量正常样本（Google Play）使用。

Android混淆技术综述