This article is the third article in the series "The Definitive Guide to WebAssembly". List of articles in the series:
"The Definitive Guide to WebAssembly" (1) Introduction to WebAssembly
"The Definitive Guide to WebAssembly" (2) Getting Started with WebAssembly
The operating system runs programs that are usually included in compiled form. Each operating system has its own format that defines where to start running, what data is required, and what the instructions are for the different function bits. WebAssembly is no exception. In this chapter, we'll see how this behavior is packaged and how the host knows how to handle it.
A software engineer can spend his entire career ignoring how programs are loaded and executed through this process. Their world begins with int main (int argc, char **argv)
or and ends with mere arrival. These are well-known entry points for C, Java, and Python programs, so this is where the programmer assumes control flow responsibility. However, the operating system or program runtime needs to build and tear down the executable structure before the program starts and after it exits. The loader needs to know where the instructions start, how the data elements are initialized, what other modules or libraries need to be loaded, etc.static void main (String [] args)
if __name == "__main__":
These details are usually defined by the nature of the executable file. On Linux, this is defined by the Executable and Linkable Format (ELF) [1] ; on Windows, it is defined by the Portable Executable Format (PE) [2] ; on macOS, it is defined by Mach-O Format [3] definition. Apparently these are platform-specific formats for native executable files. More portable systems like Java and .NET use intermediate bytecode representation, but still have a well-defined structure, and they all work similarly.
One of the first design considerations for WebAssembly MVP is to define the module structure so that the WebAssembly host knows what to look for and verify, and where to start when executing the deployment unit.
In Chapter 2, you saw a more complex module structure than when you started this chapter. We'll walk through these parts step by step, and then show you some tools for exploring the textual and visual structure of WebAssembly modules. In the previous chapter, we briefly discussed binary structures. It is compact and quick to transfer and load. You probably don't often spend a lot of time looking at binary details because you're focused on the software side. It's useful to be familiar with the layout of modules, so let's take a look.
Module structure
The empty module is the most basic module of WebAssembly. An empty module does not need any content to be a valid module, as shown in Example 3-1.
Example 3-1. Empty module, but valid WebAssembly module.
(module)
Obviously, this is nothing to see, but it can be converted to binary form. You'll notice in the output below that it doesn't take up much space and it does nothing.
brian@tweezer ~/g/w/s/ch03> wat2wasm empty.wat
brian@tweezer ~/g/w/s/ch03> ls -alF
total 16
drwxr-xr-x 4 brian staff 128 Dec 21 14:45 ./
drwxr-xr-x 4 brian staff 128 Dec 14 12:37 ../
-rw-r--r-- 1 brian staff 8 Dec 21 14:45 empty.wasm
-rw-r--r-- 1 brian staff 8 Dec 14 12:37 empty.wat
If you're more visually oriented, you might like to use WebAssembly Code Explorer, available from the wasdk GitHub repository [4] . You can use it online in a browser [5] or clone it to run an HTTP server. I'll use the Python 3 web server as before.
brian@tweezer ~/g/wasmcodeexplorer> python3 -m http.server 10003
Serving HTTP on :: port 10003 (http://[::]:10003/) ...
Again, it doesn't look like much for an empty module, but it will be a useful summary once we start adding some elements to it. The operating system usually identifies the file format from the first few bytes of the file. They are often called magic numbers . For WebAssembly, these bytes are encoded as 0x00 0x61 0x73 0x6D
hexadecimal values representing the characters a, s, and m respectively, followed by the version number 1 ( 0x01 0x00 0x00 0x00
expressed in bytes).
In Figure 3-1, you can see the magic byte, which is version 1 of the WebAssembly file format, with a series of numbers on the left and an empty module structure on the right.
wasm-objdump
For command line inspection of modules you have several options, the executables in the Wabt toolkit are very useful. See the appendix for help installing the various tools discussed in this book.
If you run the command without the switch, it will prompt an error message. As you'll see, these make a bigger difference when you have more details to explore.
brian@tweezer ~/g/w/s/ch03> wasm-objdump empty.wasm
At least one of the following switches must be given:
-d/--disassemble
-h/--headers
-x/--details
-s/--full-contents
Now we just need to verify that our module, although useless, is valid by using the detail switch. This also indicates that we are dealing with version 1 of the format.
brian@tweezer ~/g/w/s/ch03> wasm-objdump -x empty.wasm
empty.wasm: file format wasm 0x1
Section Details:
Explore the various parts of the module
Regarding the concepts we introduced, there is a problem of circular dependencies. The module format must support all the various elements included in WebAssembly, some of which we will cover in later chapters. We'll focus primarily on what we've seen so far, with the promise of revisiting elements from other sections soon.
The overall structure of the module is based on a series of optional numbered sections, each covering a specific feature of WebAssembly. In Table 3-1 we can see a list and description of these parts.
Table 3-1. List of WebAssembly modules
ID | name | describe |
0 | Custom | Debug or metadata information for use by third parties |
1 | Type | Type definitions used in modules |
2 | Import | Import elements used by a module |
3 | Function | Type signatures associated with functions in modules |
4 | Table | A table defining indirect, immutable references used by modules |
5 | Memory | The linear memory structure used by a module |
6 | Global | global variables |
7 | Export | Export elements provided by a module |
8 | Start | An optional startup function used to start a module |
9 | Element | elements defined by a module |
10 | Code | The body of a function defined by a module |
11 | Data | A data element defined by a module |
12 | Data Count | The number of data elements defined by the module |
Consider the following example from Chapter 2.
Example 3-2. A simple WebAssembly text file
(module
(func $how_old (param $year_now i32) (param $year_born i32) (result i32) ①
local.get $year_now
local.get $year_born
i32.sub)
(export "how_old" (func $how_old)) ②
)
1. Internal functions
$how_old
2. Exported functions
how_old
We use the wat2wasm tool to convert it to binary form. If we try to interrogate the structure produced by this conversion, we see the following:
> wasm-objdump -x hello.wasm
hello.wasm: file format wasm 0x1
Section Details:
Type [1]:
- type [0] (i32, i32) -> i32
Function [1]:
- func [0] sig=0 <how_old>
Export [1]:
- func [0] <how_old> -> "how_old"
Code [1]:
- func [0] size=7 <how_old>
Note that there are many more parts than our empty module. First, we have a type section, which defines a signature. It proposes a type that accepts two i32s and returns one i32. This is the appropriate signature for our how_old
method. The type is not given a name, but it can still be used to set expectations and validate in terms of functional configuration.
Next we have a Function section which links our type (type[0] in the Type section) to the named function. Because we export our functions to make them available to our host environment or other modules, we see the internal functions exported <how_old>
by name how_old
. Finally, we have a Code section that contains the actual description of our only function.
Figure 3-2 shows what our module looks like in WebAssembly Code Explorer.
Red indicates section boundaries, but you can also get more detail by moving sections in the browser. For example, the purple bytes in the exports section, if you hover over one of those bytes, it should show the name of the exported function how_old
. You can see the actual instructions via the green and blue bytes in the final code section.
If you look closely at Example 3-2, you'll notice that our variable names are not imported by default. wasm-objdump
This fact was also emphasized. For debugging purposes, you need to specify in the wat2wasm command:
> wat2wasm hello.wat -o hellodebug.wasm --debug-names
> wasm-objdump -x hellodebug.wasm
hellodebug.wasm: file format wasm 0x1
Section Details:
Type [1]:
- type [0] (i32, i32) -> i32
Function [1]:
- func [0] sig=0 <how_old>
Export [1]:
- func [0] <how_old> -> "how_old"
Code [1]:
- func [0] size=7 <how_old>
Custom:
- name: "name"
- func [0] <how_old>
- func [0] local [0] <year_now>
- func [0] local [1] <year_born>
Note that wat2wasm uses custom sections to preserve function and local variable details. Other tools may use this section for their own purposes, but this is generally how debugging information is captured. In Figure 3-3, you can see that there are more bytes in the module because of this custom part.
Use modules
Once you understand the process of inspecting the static binary structure of a WebAssembly module, you'll want to move on to working with it in a more dynamic way. We've seen the basics of instantiating modules through the JavaScript API in a few examples, such as in Example 2-4, but there are other things we can do.
The code in Example 3-2 generates an export section, but as we saw in Table 3-1, there is also a potential import section that receives elements from the host environment. This can eventually include Memory and Table instances, as we'll see in subsequent chapters, but now we can import a function into the module that allows us to communicate more directly with WebAssembly's console window. Keep in mind that we're still sorting out the low-level details, and your day-to-day experience with these technologies will likely be at a higher level.
Take a look at Example 3-3, a new version of our example that exports a second function. More importantly, it also imports a function.
(module
(func $log (import "imports" "log_func") (param i32)) ①
(func $how_old (param $year_now i32) (param $year_born i32) (result i32) ②
local.get $year_now
local.get $year_born
i32.sub)
(func $log_how_old (param $year_now i32) (param $year_born i32) ③
local.get $year_now
local.get $year_born
call $how_old
call $log
)
(export "how_old" (func ow_old)) ④
(export "log_how_old" (func $log_how_old)) ⑤
)
1. Import a function from the host that expects an i32 parameter
2. Same as previous
$how_old
function3. A new function requires two parameters, and then calls the function we imported
4. Export our old function as before
how_old
5. Export our new
log_how_old
function
As you can see, we have a new function that can be called in the module, but we can't call it yet. Our previous functionality is still available with no changes. Our new function calls the old function to do the math, but requires a log_func
function named to call its result. To clarify some differences, let's generate .wasm
the output and then dump the module structure.
brian@tweezer ~/g/w/s/ch03> wat2wasm hellolog.wat brian@tweezer ~/g/w/s/ch03> wasm-objdump -x hellolog.wasm
hellolog.wasm: file format wasm 0x1
Section Details:
Type [3]:
- type [0] (i32) -> nil
- type [1] (i32, i32) -> i32
- type [2] (i32, i32) -> nil
Import [1]:
- func [0] sig=0 <imports.log_func> <- imports.log_func
Function [2]:
- func [1] sig=1 <how_old>
- func [2] sig=2 <log_how_old>
Export [2]:
- func [1] <how_old> -> "how_old"
- func [2] <log_how_old> -> "log_how_old"
Code [2]:
- func [1] size=7 <how_old>
- func [2] size=10 <log_how_old>
This is the first time we have an entry in the import section. It's defined as having types we haven't seen yet. If you look at the types section, you'll see that we now specify three types: one that takes an i32 but returns nothing, one that takes two i32 parameters and an i32 return value, and one that takes two A new types , i32, returns nothing.
The first of these types is defined in our import. We want the host environment to give us a function that we can call to receive i32. The purpose of this function is to print out the arguments in some way, not to return anything, so it doesn't need a return type. We want to find this function from importObject which we ignored earlier on the JavaScript side. The second one is the same as before. The third one calls our $how_old
function with arguments, but then logs it so it doesn't need a return value either. The Imports and Functions sections show the links between functions and signatures.
To pass importObject
the provided element, we need some HTML code, as shown in Example 3-4.
Example 3-4. An HTML file to instantiate our module and call the imported object through methods.
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>WASM Import test</title>
<script src="utils.js"></script>
</head>
<body>
<script>
var importObject = {
imports: {log_func: function (arg) {console.log ("You are this old:" + arg + "years.");
},
log_func_2: function (arg) {alert ("You are this old:" + arg + "years.");
}
}
};
fetchAndInstantiate ('hellolog.wasm', importObject).then (function (instance) {console.log (instance.exports.log_how_old (2021, 2000));
});
</script>
</body>
</html>
Compare the import statement in Example 3-3 with the structure of the object. Note that there is an import namespace, which contains a log_func
function named. This is the structure specified by our import statement. $log_how_old
The function pushes its two arguments onto the stack, and then $how_old
the call instruction calls our previous function. Remember, this function subtracts one argument from another and returns the result to the top of the stack. At this point, we don't need to push the value back onto the stack; we can simply call $log
the imported function we named. The result of the previous function will be the parameter of this new call. Take the time to make sure you understand the relationship between parameters, return values, and functions.
If you copy the file from the previous chapter utils.js
(which provides fetchAnd Instantiate()
the functions) and serve it over HTTP as we did before, you can load the new HTML file. Initially you won't see anything as ours log_func
just dumps its arguments to console.log()
. However, if you look at the console in your browser's developer tools, you should see something like Figure 3-4.
If importObject
you change it to something like Example 3-5 and then reload the HTML file in the browser, you will no longer see the console message; you should see a pop-up alert message. Obviously, nothing has changed in our WebAssembly code - we're just passing in a different function from the JavaScript side, so we're seeing different results. We'll see more complex interactions as we delve deeper into this topic, but hopefully you're starting to understand how WebAssembly and JavaScript code interact through imports and exports.
Example 3-5. The same WebAssembly module can be instantiated and called in different ways
var importObject = {
imports: {log_func: function (arg) {alert ("You are this old:" + arg + "years.");
}
}
};
Instantiating modules and calling their functions will be your main interaction with them through the JavaScript API, but there are some additional behaviors you can use. If you want to know which methods a module imports or exports, you can use the JavaScript API to ask the loaded module. If you do not call the methods utils.js
in fetchAndInstantiate()
, but instead change the HTML to have the code shown in Example 3-6, you will see the results shown in Figure 3-5.
Example 3-6. We can do more with the JavaScript API, including streaming compilation.
WebAssembly.compileStreaming (fetch ('hellolog.wasm'))
.then (function (mod) {var imports = WebAssembly.Module.imports (mod);
console.log (imports [0]);
var exports = WebAssembly.Module.exports (mod);
console.log (exports);
}
);
Once we understand more concepts and start using higher-level languages to express our actions, the full power of WebAssembly begins to emerge.
So far, we've been using code blocks in a file called utils.js, which looks like Example 3-7. For simple modules this is fine, but as your modules get larger it can remove some of the built-in latency. Performance refers not only to runtime performance, but also to loadtime performance.
Example 3-7. We have been instantiating modules in a simple way
function fetchAndInstantiate (url, importObject) {return fetch (url).then (response =>
response.arrayBuffer ()).then (bytes =>
WebAssembly.instantiate (bytes, importObject)
).then (results =>
results.instance
);
}
The problem here is that although we use Promises to avoid blocking the main thread, we read the module into an ArrayBuffer before instantiating it. We are actually waiting for all network transfers to complete before compiling the module. One of the first features of post-MVP is the ability to support compilation while bytes are still being transferred over the network. The module's format structure lends itself to this kind of optimization, so it would be a shame not to use it.
Although there is no "right" way to instantiate your module (for example, in some cases you may want to instantiate multiple instances of a module), in most cases the code in Example 3-8 is a A slightly more efficient method.
Example 3-8. The recommended way to instantiate a module in most cases.
(async () => {const fetchPromise = fetch (url);
const {instance} = await WebAssembly.instantiateStreaming (fetchPromise); // Use the module
const result = instance.exports.method (param1, param2);
console.log (result);
})();
Note that we're not creating ArrayBuffer
; we're passing a Promise from fetch()
a method to a WebAssembly object's instantiateStreaming()
method. This allows the baseline compiler to start compiling functions as they appear on the network. In most cases, the code compiles faster than it can be transferred over the network, so by the time you finish downloading the code, it should be verified and ready to use. When JavaScript finishes downloading, that's typically when the verification process begins, so we see improvements in startup times.
There is currently no official way to cache WebAssembly modules, but it is an unobtrusive way to improve startup time. Cache control and other network artifact handling will avoid unnecessary redownloading of modules (for example, if they have been updated).
Future integration with ES6 modules
As we can see, while being able to work through the JavaScript API is obviously useful, doing so is low-level and repetitive, which is why we put it in a reusable utility script file. In the future, we hope that it will be easier to use WebAssembly modules from HTML, since they will be available as ES6 modules.
This is a bit tricky because of the asynchronous processing required at the top level and how the module's graph is loaded in three phases of build, instantiation and evaluation. There are subtle differences in the verification process for binary WebAssembly and JavaScript-based modules, when compilation occurs, and how module environment records are traversed and linked.
There are proposals to add support for the platform to eliminate these differences. At the time of writing, we are in the second phase of the proposal process. Link Clark gives a good introduction to its intricacies on YouTube [6] .
Our goal is to introduce a form of declaration, as shown in Example 3-9.
Example 3-9. Recommended declaration form for loading WebAssembly modules
import {something} from "./myModule.wasm";
something ();
This not only helps simplify the instantiation of WebAssembly modules, but also helps them participate in the JavaScript module's dependency graph. Without distinguishing how they are managed as dependencies, developers will more easily mix behaviors expressed in multiple languages into a complete solution.
The proposal is cleanly designed and well supported, but involves careful orchestration of the HTML specification, ES6 module specification, implementation, JavaScript bundler, and the larger Node.js community. My guess is that it won’t be long before we see progress on this proposal.
Now that we understand the structural elements of a WebAssembly binary, you should be able to easily inspect your own and third-party modules manually and programmatically. The next step is to look at the more dynamic elements of the WebAssembly module. We'll focus first on the Memory instance to simulate the functionality of contiguous memory blocks in a more traditional programming runtime.
Reference link
[1]
Executable and Linkable Format (ELF): https://en.wikipedia.org/wiki/Executable_and_Linkable_Format[2]
Portable Executable Format (PE): https://en.wikipedia.org/wiki/Portable_Executable[3]
Mach-O Format: https://en.wikipedia.org/wiki/Portable_Executable[4]
wasdk GitHub repository: https://github.com/wasdk/wasmcodeexplorer[5]
Online use: https://wasdk.github.io/wasmcodeexplorer/[6]
On YouTube: https: //www.youtube.com/watch?v=qR_b5gajwug&ab_channel=MozillaHacks
To get more information about the cloud native community, join the WeChat group. Please join the cloud native community and click to read the original article to learn more.