Security Analysis of Wasm Software Ecosystem

This article is reproduced from OpenHarmony TSC official WeChat public account " Summit Review No. 12 | Wasm Software Ecosystem Security Analysis "

Speaker | Wang Haoyu

Review and arrangement | Liao Tao

Typesetting proofreading | Li Pingping

Guest profile

Wang Haoyu, professor of Huazhong University of Science and Technology, doctoral supervisor, director of OpenHarmony Technology Club of Huazhong University of Science and Technology. Research focuses on security, privacy, and reliability issues in emerging software systems. In the past five years, he has published nearly 70 papers in CCF Class A and CSRankings top conferences, and the results of top conference papers in the field of software security and system measurement are among the best in China.

content source

The 1st Open Atom Open Source Foundation OpenHarmony Technology Summit - Developer Tools Sub-Forum

video review

Video Link: Summit Review Issue 12 | Wasm Software Ecosystem Security Analysis (Wang Haoyu)_哔哩哔哩_bilibili

Contents

WebAssembly (Wasm) is an efficient, low-level, and portable bytecode format developed by the W3C standardization organization. Currently, Wasm is increasingly used in scenarios such as browsers, serverless computing, cross-platform containers, and blockchain DApps. What kind of collisions can Wasm and OpenHarmony ecology have? Professor Wang Haoyu from the School of Network Security of Huazhong University of Science and Technology shared the current exploration in the field of Wasm security at the first OpenHarmony Technology Summit, and looked forward to the new direction of the combination of Wasm and OpenHarmony.

01Introduction to the Wasm software ecosystem

Currently, almost all mainstream high-level languages ​​such as C, C++, Rust, Go, Java, and C# can be compiled into Wasm, and all mainstream browsers also support Wasm. In addition, the industry has also implemented many independent Wasm virtual machines (runtimes), supporting interpreter, AOT, JIT and other modes.

WebAssembly (Wasm) and its runtime environment

The execution architecture and design features of Wasm are:

  • Type-safe stack instructions: a linear time type checking algorithm that fully determines the number and type of values ​​on the stack;
  • Structured control flow instructions: Internal instructions can only jump according to the nested structure, which simplifies compiler implementation;
  • Expandable linear memory: one page is 64KB, the initial memory page and the maximum number of memory pages are specified in the module, which can be dynamically increased during runtime; important data such as function call stack and return address are maintained by the external runtime to ensure security;
  • Instructions and data are completely separated: function "addresses" are indicated by subscripts, and indirect jumps are implemented by jump tables.

The application prospect of Wasm is very broad. For example, Wasm can support efficient web computing, so large-scale applications can be run in browsers based on Wasm; Wasm also supports cross-platform container technology, suitable for embedded, trusted computing, and cloud computing scenarios. In addition, Wasm is also widely used in the fields of DApp and smart contracts in Web 3.0/Blockchain.

Wasm's multi-language, cross-platform, and high-performance features make it very suitable for OpenHarmony's open source ecosystem for new scenarios of the Internet of Everything, and has broad application prospects on mobile devices. At present, Michael Yuan, the maintainer of the WasmEdge open source project, and others have initiated the OpenHarmony Wasm-SIG proposal, which is dedicated to publicizing, implementing, and promoting the integration of Wasm and OpenHarmony. Third-party developers can run safely and efficiently on OpenHarmony terminal devices. Wasm programs written in C, C++, Rust and other languages ​​are conducive to expanding the developer community of the OpenHarmony ecosystem.

02Wasm security and related research

There are also many security issues in the Wasm ecosystem that have attracted the attention of the academic community, including front-end compiler security, code porting security, Wasm binary security, Wasm-related malicious applications, Wasm trusted execution environment, and so on.

  • Code memory safety: Since the Wasm ecosystem is still relatively immature, vulnerabilities that already have mature defense measures in traditional binary can still be exploited. For example, due to the lack of a stack canary mechanism, attackers can easily exploit stack overflow vulnerabilities; Wasm also lacks related heap protection mechanisms, etc.
  • Program porting safety: A large number of existing programs can be compiled "directly" to Wasm, but bugs or security issues may be introduced; porting will result in different code behavior (such as pointer size, memory capacity, environment variables, etc.); improper handling when porting It may lead to security issues such as difficulty in implementing different heap memory management and lack of security measures.
  • Malicious Wasm programs: At present, a large part of Wasm programs are used for malicious mining and other activities. Wasm can also be used by malicious software as an obfuscation or packing method.

Wasm (Security) Issues and Related Research in Academia

However, research on Wasm security is still in its infancy. On the one hand, the new features and scenarios introduced by Wasm continue to bring new security issues and challenges; on the other hand, Wasm has almost no general program analysis framework, and most of the tools are platform-specific Wasm binary analysis (only supports Part of the instruction set, only modeling platform-related library functions), unable to analyze general Wasm binary; in addition, Wasm binary decompiler is still in its infancy, Wasm virtual machine and compiler are not mature enough, Wasm code obfuscation and code protection technology Also relatively lacking.

In response to the above security issues, the academic team led by Professor Wang Haoyu has done security enhancement related work in Wasm binary translation, Wasm program analysis, Wasm runtime/compiler bug detection, etc. For example, in the blockchain smart contract scenario, the secure Wasm binary translation from EVM bytecode to eWasm bytecode is realized; the Wasm symbolic execution framework EOSafe, the Wasm fuzzing framework WASAI, and the Wasm general binary rewriting framework BREWasm are proposed , Wasm binary obfuscation tool Chaos and other analysis techniques. In addition, Professor Wang Haoyu's team proposed fuzzing technology for Wasm runtime, and has found dozens of code defects in wasmer, wasmtime, WAMR, wasm3, Wasm Edge and other runtimes.

Professor Wang Haoyu's team Wasm-related research work

03Wasm binary rewriting and its security application

In the developer tool sub-forum of this summit, Professor Wang Haoyu introduced a general-purpose Wasm binary rewriting tool proposed by his team. Wasm binary rewriting has the advantages of no need for source code, cross-platform and cross-language, and its application scenarios include Wasm program repair, test case generation, code instrumentation, auxiliary dynamic analysis, Wasm code vulnerability detection, Wasm fuzzing, Wasm binary protection and obfuscation wait. At present, most of the research on Wasm binary rewriting and instrumentation in academia is limited to simple instruction-level modifications, such as adding some instructions before and after a certain instruction, and the modification of control flow is limited to a specific mode. change etc. However, a general-purpose Wasm binary rewriting framework is the basis for much of the Wasm research work above.

There are some challenges in implementing a general-purpose Wasm binary rewriting framework.

(1) Coupling between different segments of Wasm: In Wasm, all the information of a function, including function signatures, function instructions, etc., are distributed in different Wasm segments, which leads to the fact that rewriting a single segment is not enough to implement Wasm A tiny feature in . Moreover, developers need to be familiar with the different data structures of multiple segments in order to rewrite a certain function in Wasm;

(2) Structured control flow and control flow modification: Wasm does not have goto-like jump instructions, and jump instructions can only be added by nesting code blocks, which brings great benefits to the realization of flexible control flow rewriting challenge

(3) Wasm stack balance checksum repair: A correct Wasm binary needs to satisfy static check rules. For example, all information of a function depends on the index of the function, and the instructions of the function need to satisfy the stack balance. After the Wasm binary is rewritten, if there are static verification problems such as index inconsistency between indexes or a function instruction without stack balance, an incorrect Wasm binary will be generated.

Professor Wang Haoyu's team proposed corresponding solutions to the above challenges. For challenge (1), in addition to providing fine-grained rewriting of the data structure in each segment, it also abstracts the structure of each segment into a set of semantics and provides a large number of rewriting APIs for semantics, so that developers do not need to care about The underlying modification logic for each segment.

Aiming at challenge (2), a method of atomizing the control flow structure is proposed. When the Wasm module is loaded, the instructions are divided and code blocks are constructed (atomization). The atomic control flow structure can be combined to build a more complex control flow structure. , and after the modification, the control flow structure based on the atomic code block is converted back to the Wasm instruction.

For challenge (3), implement two auxiliary modules indices-fixer and stack-calculator to fix index errors and stack balance.

BREWasm framework

Based on the above solutions, the team led by Professor Wang Haoyu proposed a general Wasm binary rewriting framework - BREWasm. The framework mainly includes the following five parts:

  • Wasm Parser: Given a simple DSL, abstract Wasm segments and data structures, and parse them into a list of operable objects;
  • Section Rewriter: based on Wasm segment and data structure abstraction, implement fine-grained segment rewriting API;
  • Semantics Rewriter: Combine segment rewriting APIs to implement a set of Semantic APIs with richer semantics;
  • Control Flow Reconstructor: implements a set of Control Flow APIs that can modify the control flow arbitrarily and flexibly without paying attention to stack balance;
  • Wasm Encoder: Re-encodes the rewritten list of operable objects into a legal Wasm binary according to segment and data structure abstractions.

Wasm control flow atomization schematic and some Control Flow APIs provided in BREWasm

BREWasm can be applied to scenarios such as Wasm code obfuscation, Wasm program stack overflow protection, and Wasm program instrumentation. For example, in Wasm binary obfuscation, BREWasm splits the original Wasm code block to obtain the basic elements of control flow rewriting, and forms these elements into a switch-case control flow structure, and then inserts it into the while control flow , the control flow flattening and obfuscation of any Wasm program can be achieved with only a few lines of code; in terms of stack overflow protection of Wasm programs, only a few APIs provided by BREWasm can be used to hook functions that may have stack overflow problems. Insert the canary on the stack in advance before the function is called, and check whether the value of the canary has changed after the function is executed to determine whether there is a stack overflow problem during the execution of the function; in the Wasm program stub, BREWasm can detect Wasm Binary instrumentation, realizes functions such as dynamic taint analysis, call graph analysis, memory access analysis, malicious mining detection, etc., and can also specify instrumentation rules, perform automatic instrumentation for each Wasm instruction, and import externally implemented analysis for Wasm binary APIs. In addition, BREWasm can also be very conveniently applied to scenarios such as Wasm code transformation, Wasm program repair, and Wasm fuzz testing.

Example of BREWasm implementing control flow flattening for arbitrary Wasm programs

04 ►Summary and Outlook

A cross-language, cross-platform, and cross-scenario open source software ecosystem is a development trend, and it also introduces many new attack surfaces. The characteristics of Wasm make it very suitable for OpenHarmony's open source ecosystem for new scenarios of the Internet of Everything, and the security issues in it cannot be ignored. We look forward to the academia and industry working together to contribute to the open source ecosystem of the Internet of Everything and continue to empower emerging software security!

Click to follow to learn more about OpenHarmony TSC technical content

Guess you like

Origin blog.csdn.net/OpenHarmony_dev/article/details/132691984