模糊测试基本介绍

覆盖率指引的模糊测试方法获得覆盖率的四种追踪方式¹：

使用编译器向基本块边缘插桩，可以准确地插桩并易于优化，但需要源码已知。
静态二进制重写，不需要源码，仍在研究，因为静态代码插桩准确性难以保证，并且优化能力有限。这些限制条件会影响代码率信息的质量与准确性，以及二进制重写的表现。
动态二进制插桩，不需要源码，可以容易、准确插入代码，但是动态翻译二进制的开销可能大到不能接受。
硬件辅助追踪，不需要源码，利用内置的硬件追踪扩展，在运行时直接获取控制执行流信息。

侵入式与非侵入式追踪²：

Traces can be generated by trace code that is executed within tasks and/or interrupt service routines, just like application code that is executed on the same CPU. This is the most flexible approach, as both the content and the amount of trace information output can be defined in software. However, this tracing method comes with a significant drawback: It uses resources that are shared with the application software, hence tracing may significantly reduce the amount of memory available for the applications, increase the gross execution times of the applications and, in the case of real-time systems, impair functionality. This is why it is called intrusive tracing.

The most common case is that adding trace code is detrimental to the functionality of the applications in real-time systems because the resource requirements for intrusive tracing have been underestimated in the early stages of the project, such that tracing would eventually eat up resources that are required by the application. Therefore, the resource requirements for tracing must be properly considered throughout the whole development lifecycle. Removing trace code from real-time systems may also cause functional issues, typically just before the final production software release. This is the worst case, as trace information is no longer available in this scenario.

Non-intrusive tracing does not change the intrinsic timing behavior of the system under test. This approach simplifies the software development process a lot and requires dedicated hardware support for tracing. External trace probes connected to the target system, in conjunction with on-chip debug modules, capture code execution on instruction level, memory accesses and other events on the target processor. This approach is the best option when it comes to debugging the code execution down to the instruction level. The PCB design of the device under test must provide the connectors required by the external probe.

Another option for non-intrusive tracing is on-chip tracing, where most of the trace hardware is packed into the same chip that also contains the CPU that executes the application code. Non-intrusive tracing can, however, be restricted by limitations of the respective trace module or probe, such as buffer sizes, bus bandwidth or the size of an external probe.

Due to cost savings (no expensive third-party trace hardware required), reduced footprint (very small connectors instead of larger probe connectors), and limited trace bandwidth requirements, the on-chip tracing method is the preferred approach for generating the trace data required for in-depth timing analysis on task, runnable and ISR level. On-chip tracing is a suitable tracing method for devices under test with form factors very close to the final volume production devices.

针对网络协议的模糊测试

网络协议的特点是一般有明确的状态信息，相同的input在不同的状态可能得到不同的output。针对网络协议的模糊测试一般具有stateful的特点。这类模糊测试有几个难点：

生成格式正确的信息，满足对特定状态的fuzz
扩展到不同的协议中
测试样例有效性，需要通过格式校验比如长度、协议认证、校验和等

AFLNET

首次提出针对有状态协议的灰盒模糊测试。AFLNET从响应信息中提取响应码来表示状态信息，并用响应码序列来推断协议实现的状态模型，并进一步使用这一模型来指导fuzz。

一些不足：

状态表示能力：AFLNET要求响应信息中包含状态码，这并不是协议必须实现的。而且状态码表示能力有限，且可能产生冗余状态。
测试效率：没有明确的信号反映待测程序是否处理完消息，因此设置固定的计时器来控制消息发送，时间窗口可能过小或过大。

STATEAFL

使用程序内存状态来表示服务状态，通过对被测程序插桩来收集状态信息并推测状态模型。在每一轮网络交互中，STATEAFL将程序变量值转储给分析队列，并进行post-execution的分析，来更新状态模型。

一些不足：

面对和AFLNET相同的测试效率问题，而且因为后执行分析，产生额外的开销，会降低测试吞吐量。

NSFuzz

使用基于变量的状态表示方法推断状态模型来指导模糊测试，使用基于网络事件循环的同步机制来提高吞吐量。

启发式的变量判断方法：静态分析中只在事件循环代码中分辨状态变量，且关注被读与写、被赋予枚举类型的数据或是数据结构体里的整型成员。

表示状态的方法：使用两条语句维护shared_state数组，当状态变量值被更新时同步更新shared_state；当fuzzer在通信管道收到消息处理结果时，对这个数组进行hash，作为当前程序所处的state。

shared_state[hash(var_id) ^ cur_store_val] = 1;
shared_state[hash(var_id) ^ pre_store_val] = 0;

IoTHunter

提出多阶段信息生成方法来对IoT固件中的有状态网络协议进行fuzz。分为对已知状态的模糊测试与未知状态的探索。基于整数变异的方法改变包类型，并对包格式（比如长度、校验和）做检查等。

数据流指导的模糊测试

控制流指导的模糊测试侧重程序操作的执行顺序（比如分支与循环），数据流指导的模糊测试侧重变量如何定义与使用。变量的定义与使用位置可以不存在控制上的依赖关系。在模糊测试中，数据流主要使用动态污点分析（DTA）技术，即将目标程序的输入数据在定义处视为污点，并在运行时追踪它是如何被访问与使用的。

在实践中，难以做到准确的DTA，开销会很大。并且部分真实程序无法在应用DTA技术的情况下成功编译。因此大部分灰盒模糊测试不使用DTA，以期获得更高的吞吐量。

有一些轻量级的DTA代替方案（比如REDQUEUE、GREYONE），而基于控制流与数据流的模糊测试器的覆盖率指标还没有被完全探索。

DATAFLOW

源码

在程序执行时并行使用数据流分析来指导模糊测试，使用不精确的推断来降低开销并提高吞吐量。对数据流有效性进行了简单的评估，认为对大部分测试目标而言，数据流并不比控制流优越，但是在部分特定场景（比如控制流和语义解耦，如parser）下，数据流可能会有用。