1. Introduction to sonic-rs
sonic-rs is a high-performance Rust JSON library based on SIMD, which is the Rust version of the sonic JSON library.
The Bytedance sonic open source project now includes multiple JSON libraries in different languages (as shown below). Among them, sonic-go was the first to be open source, using JIT and SIMD technology, and sonic-cpp used C++ templates and SIMD technology. Both JSON libraries have been implemented on a large scale within Byte. In the context of cost optimization, in order to help Golang business migrate to Rust and optimize Rust JSON performance, we developed a high-performance JSON library sonic-rs in pure Rust language based on our JSON optimization experience and practice.
-
sonic: https://github.com/bytedance/sonic (Golang JSON library)
-
sonic-cpp: https://github.com/bytedance/sonic-cpp (C++ JSON library)
-
sonic-rs: https://github.com/cloudwego/sonic-rs (Rust JSON library)
The JSON functions currently supported by sonic-rs are relatively complete, basically aligned with the related functions of serde-json, and provide richer functions and more high-performance interfaces. The main features of sonic-rs are:
-
FastStr
Basically compatible with the Serde ecosystem, and also supports types in Volo -
Support dynamic type encoding and decoding and on-demand parsing
-
Support
LazyVaue
,RawNumber
etc. types -
Supports
UTF-8
checksum standard floating point precision
In terms of performance, we based on the Rust structure and JSON data provided by serde-rs official benchmark ( https://github.com/serde-rs/json-benchmark ), and compared serde-json, simd-json and sonic-rs in The parsing performance under the Rust structure was tested, and it can be found that the performance of sonic-rs is 1.5~2 times that of simd-json and 2 times that of serde-json:
twitter/sonic_rs::from_slice_unchecked
time: [694.74 µs 707.83 µs 723.19 µs]
twitter/sonic_rs::from_slice
time: [796.44 µs 827.74 µs 861.30 µs]
twitter/simd_json::from_slice
time: [1.0615 ms 1.0872 ms 1.1153 ms]
twitter/serde_json::from_slice
time: [2.2659 ms 2.2895 ms 2.3167 ms]
twitter/serde_json::from_str
time: [1.3504 ms 1.3842 ms 1.4246 ms]
citm_catalog/sonic_rs::from_slice_unchecked
time: [1.2271 ms 1.2467 ms 1.2711 ms]
citm_catalog/sonic_rs::from_slice
time: [1.3344 ms 1.3671 ms 1.4050 ms]
citm_catalog/simd_json::from_slice
time: [2.0648 ms 2.0970 ms 2.1352 ms]
citm_catalog/serde_json::from_slice
time: [2.9391 ms 2.9870 ms 3.0481 ms]
citm_catalog/serde_json::from_str
time: [2.5736 ms 2.6079 ms 2.6518 ms]
canada/sonic_rs::from_slice_unchecked
time: [3.7779 ms 3.8059 ms 3.8368 ms]
canada/sonic_rs::from_slice
time: [3.9676 ms 4.0212 ms 4.0906 ms]
canada/simd_json::from_slice
time: [7.9582 ms 8.0932 ms 8.2541 ms]
canada/serde_json::from_slice
time: [9.2184 ms 9.3560 ms 9.5299 ms]
canada/serde_json::from_str
time: [9.0383 ms 9.2563 ms 9.5048 ms]
2. sonic-rs optimization practice
The optimization of sonic-rs is mainly based on SIMD, and part of it draws on the optimization ideas of other JSON libraries such as simd-json. SIMD (Single instruction, multiple data) is a parallel optimization technology that can process multiple data in parallel with one instruction. Most CPUs today already support various SIMD instruction sets. For example, SSE, AVX2, AVX512 under x86_64 architecture, neon instruction set under aarch64 architecture, etc. After optimizing using SIMD instructions, for a suitable task, the program will execute fewer instructions and therefore perform better.
In terms of overall design, sonic-rs does not adopt the two-stage parsing idea of simd-json. It mainly applies SIMD optimization to hot spots in JSON parsing and serialization, including string serialization, on-demand parsing and floating point number parsing. wait.
2.1 SIMD optimized string serialization
String serialization is the hot spot of JSON serialization. When serializing, you need to scan the string for escape characters. For longer strings, the operation of judging escape characters byte by byte is time-consuming. Scanning escape characters is very suitable for speeding up using SIMD.
If you use AVX2 instructions to scan escape characters, as shown in the following code. This SIMD code is under the Haswell architecture. After O3 optimization is turned on, there are actually only six SIMD instructions, that is, 6 SIMD instructions can scan 32 bytes at a time. Compared with scalar code, the number of program instructions is greatly reduced, thereby reducing the execution time of the program.
static inline __m256i _mm256_find_quote(__m256i vv) {
__m256i e1 = _mm256_cmpgt_epi8 (vv, _mm256_set1_epi8(-1));
__m256i e2 = _mm256_cmpgt_epi8 (vv, _mm256_set1_epi8(31));
__m256i e3 = _mm256_cmpeq_epi8 (vv, _mm256_set1_epi8('"'));
__m256i e4 = _mm256_cmpeq_epi8 (vv, _mm256_set1_epi8('\\'));
__m256i r1 = _mm256_andnot_si256 (e2, e1);
__m256i r2 = _mm256_or_si256 (e3, e4);
__m256i rv = _mm256_or_si256 (r1, r2);
return rv;
}
2.2 SIMD optimization on-demand parsing
Many business scenarios only use some fields in JSON, which is very suitable for on-demand parsing and skipping unnecessary JSON fields during parsing. When skipping JSON fields, the difficulty lies in how to efficiently skip objects and arrays in JSON.
Based on the grammatical rules that object and array brackets in JSON must match, sonic-rs uses SIMD to implement an efficient bracket matching algorithm. First get the bitmap of json object and array through SIMD, and then skip object and array by calculating the number of brackets. When a bracket match is found, the object or array can be skipped.
2.3 SIMD optimized floating point number analysis
Floating point number parsing is a performance hotspot in JSON parsing. Within the bytes, we find that the mantissas of most floating point numbers in JSON are relatively long, and are also suitable for SIMD optimization. As shown below, for a 16-byte floating point number mantissa "1234342112345678":
-
First read this string into the vector register. At this time, each number in the vector is still the value of the ASCII code.
-
Secondly, use vector subtraction to subtract the ASCII code '0' byte by byte to get v1. At this time. The numbers in v1 are already in decimal.
-
Then, continue to use vector instructions to multiply and add each number in v1 (multiply the high bit by 10 and add the low bit) to get v2. Each number in v2 is already a two-digit decimal number.
-
By analogy, using SIMD instructions to accumulate layer by layer, v16 is finally obtained. v16 contains a 16-digit number, which is the final mantissa analysis result.
-
Finally, we use vector instructions to convert v16 to u64 type.
The entire parsing process can complete the floating-point mantissa analysis without traversing every character of the floating-point mantissa.
3. sonic-rs status and plans
sonic-rs has been open source for more than three months, and is currently iterated to version 0.3. It already supports the Rust stable version and supports the aarch64 architecture. sonic-rs has accumulated some usage documents to help developers in all aspects:
-
Golang migrates Rust users to use sonic-rs: https://github.com/cloudwego/sonic-rs/blob/main/docs/for_Golang_user_zh.md
-
Rust serde-json user migration sonic-rs: https://github.com/cloudwego/sonic-rs/blob/main/docs/serdejson_compatibility.md
-
Performance optimization details: https://github.com/cloudwego/sonic-rs/blob/main/docs/performance_zh.md
In the future, sonic-rs will continue to improve performance, ease of use and stability. It is expected to support Bytes/FastStr
zero-copy parsing of common data types, support detection of SIMD instructions during exercise, etc. Interested developers are welcome to join us.
project address
GitHub:https://github.com/cloudwego
Official website: www.cloudwego.io
The pirated resources of "Celebrating More Than Years 2" were uploaded to npm, causing npmmirror to have to suspend the unpkg service. Microsoft's China AI team collectively packed up and went to the United States, involving hundreds of people. The founder of the first front-end visualization library and Baidu's well-known open source project ECharts - "going to the sea" to support Fish scammers used TeamViewer to transfer 3.98 million! What should remote desktop vendors do? Zhou Hongyi: There is not much time left for Google. It is recommended that all products be open source. A former employee of a well-known open source company broke the news: After being challenged by his subordinates, the technical leader became furious and fired the pregnant female employee. Google showed how to run ChromeOS in an Android virtual machine. Please give me some advice. , what role does time.sleep(6) here play? Microsoft responds to rumors that China's AI team is "packing for the United States" People's Daily Online comments on office software's matryoshka-style charging: Only by actively solving "sets" can we have a future