Technical Exploration: Verifying FHE in RISC Zero - From Hiding to Proving: ZK Path of FHE Verification (1)

1 Introduction

For open source code implementation, see:

The L2IV Research team recently tried to use ZKP to verify FHE due to the emergence of the following two major application scenarios:

  • 1) Off-chain calculation of fhEVM: Fhenix and Inco, is building an EVM with fully homomorphic encryption enhancements on the L1 chain based on Zama's fhEVM. The full name of fhEVM is fully homomorphic Ethereum Virtual Machine (EVM). Off-chain calculations eliminate the need for L1 Validators to rerun FHE calculations, thereby providing scalability. At the same time, it is possible to further hide functions, thereby providing privacy.
  • 2) FHE mining: Influenced by Aleo’s proof of succinct work (PoSW) and ZPrize< /span> Inspiration, thinking that FHE mining is a noteworthy future direction to encourage FHE ASIC manufacturing and incentivize FHE miners to become Validators of the fhEVM network. The core task of FHE mining is to develop a ZKP system for PoSW in the FHE context.

There are many FHE solutions, and the L2IV Research team is particularly concerned about the TFHE used by Zama. The TFHE implemented uses the module p = 2 64 p=2^{64} p=264 - It can be calculated efficiently on modern CPUs and other hardware platforms. The L2IV Research team's interest in validating FHE stems from its direct applicability in fhEVM.

However, use modulo p = 2 64 p=2^{64} p=264, it is a big challenge for ZKP, because there are only limited ZKP systems that can be efficiently compatible with this module:

  • 1) Most ZKP systems are designed to operate within a certain Field, but modulo p = 2 64 p=2^{64} p=264 does not constitute a field because it lacks the modular inverse of 2, it is just a ring. The existing 2021 paper Rinocchio: SNARKs for Ring Arithmetic is designed for ring, but it cannot be used in this scenario because it only supports:
    • If the ZKP isdesigned-verifier, it can support any rings, but this is not suitable for blockchain application scenarios.
    • rings are used in conjunction with safe, composite-order pairing-friendly curves, which are related to p = 2 64 p=2^{64} p=264Incompatibility.
  • 2) Use appropriate fields to simulate 64-bit integer calculations. This is optional, although it has its own overhead. There are only a few VMs designed for 64-bit integers, such as RISC Zero and Aleo's Leo. 32-bit fields can also be used to simulate 64-bit integer calculations, such as Polygon Miden and Valida< a i=4> (using Plonky3).

The L2IV Research team chose RISC Zero for three reasons:

  • 1) Its tool chain and development ecology are relatively stable and mature.
  • 2) RISC Zero’s proof generation performance, especially on Apple M2 chips, is quite good.
  • 3) Through the Bonsai API, the proof generation process can be offloaded to RISC Zero's dedicated Bonsai server, thereby avoiding the need to generate proof locally.
    Insert image description here
    This article focuses on showing the functional but not yet optimized prototype. A series of optimization implementations will be made in the future. Different directions will also be explored in the future.

This article provides an overview of FHE and RISC Zero, details the process of bringing existing Rust code to RISC Zero, introduces a new data loading optimization technique in RISC Zero, and demonstrates using Ghidra to decompose and decompose RISC-V code. Analysis to identify further optimization opportunities.

Update December 14, 2023: We noticed that "include_bytes" may not align data correctly and may cause alignment errors. Therefore, choose to use include_bytes_aligned crate in include_bytes_aligned.

2. What is TFHE?

fully homomorphic encryption (FHE) is the encryption algorithm, expressed as E E E, used for data encryption. If the plaintext is known a a a, 经FHE 加后获GET密文 E ( a ) E(a) E(a)
a → E ( a ) a \rightarrow E(a) aE(a)

Full homomorphism means that addition, subtraction and multiplication can be performed based on ciphertext:

  • 加法: Addition:  E ( a ) ,   E ( b ) → E ( a + b ) \text{Addition:}~E(a),~E(b)\rightarrow E(a+b) Addition: E(a), E(b)E(a+b)
  • 减法: Subtraction:  E ( a ) ,   E ( b ) → E ( a − b ) \text{Subtraction:}~E(a),~E(b)\rightarrow E(a-b) Subtraction: E(a), E(b)E(ab)
  • 乘法: Multiplication:  E ( a ) ,   E ( b ) → E ( a × b ) \text{Multiplication:}~E(a),~E(b)\rightarrow E(a \times b) Multiplication: E(a), E(b)E(a×b)

FHE can be used to represent all binary logic gates when the plaintext is binary bits (0s and 1s). Including basic gates such as XOR and AND, which form the basis of any binary logic operation in FHE:

  • XOR:  E ( a ) ,   E ( b ) → E ( a + b − a × b ) \text{XOR:}~E(a),~E(b)\rightarrow E(a + b - a \times b) XOR: E(a), E(b)E(a+ba×b)
  • AND:  E ( a ) ,   E ( b ) → E ( a × b ) \text{AND:}~E(a),~E(b)\rightarrow E(a \times b) AND: E(a), E(b)E(a×b)

Since it can represent all binary gates, FHE can perform arbitrary calculations within the bounded size. Among blockchain applications, FHE has attracted significant interest for enabling privacy in decentralized finance (DeFi) applications. For example, in a privacy-enhanced decentralized exchange (DEX), FHE can secretly handle calculations for automated market makers (AMMs).

Much of the computational overhead of FHE is due to managing and mitigating "noise". All existing FHE constructions rely on the learning-with error (LWE) assumption or its variants - which forms the basis of these cryptographic systems. For each step of the calculation, output - such as E ( a + b − a × b ) E(a+b-a\times b) E(a+ba×E ( a ) E(a) E(a)sum E ( b ) E(b) E(b) has more noise , and this output may become input to subsequent steps. As the calculation proceeds, the amount of noise accumulated in the ciphertext becomes larger and larger. As shown in the figure below, once the noise in the ciphertext reaches a certain threshold, it makes the ciphertext undecryptable. [The picture below is taken from the 2021 Zama team Marc Joye's paper "Guide to Fully Homomorphic Encryption over the [Discretized] Torus". 】
Insert image description here
To facilitate infinite computation in FHE, a way must be found to clean up the noise before it becomes too loud. This technique is called "bootstrapping" - first introduced by Craig Gentry in 2009 with his seminal paper on FHE. Bootstrapping involves using an encrypted version of the FHE secret key, often referred to as the "bootstrapping key", to decrypt and refresh the noisy ciphertext, which produces a new ciphertext that contains the same data but is less noisy.

As you can imagine, FHE calculation is a very tiring job for ciphertext - just like a marathon, ciphertext needs to rest to avoid "burn out". As shown below, think of FHE bootstrapping as someone taking a break to avoid burning out during a marathon.
Insert image description here
Among different fully homomorphic encryption (FHE) algorithms, TFHE has attracted great attention because Bootstrapping in TFHE is efficient, and TFHE is well suited for evaluating Boolean circuits on encrypted data. Zama, Fhenix, and Inco are all using TFHE.

Therefore, the main challenge in validating FHE is to accurately verify the bootstrapping process. In TFHE, bootstrapping involves "blindly rotating" a polynomial based on the bootstrapped ciphertext using a bootstrap key, and subsequently extracting the refreshed ciphertext from this rotated polynomial.

While this initially looks like an attempt at advanced cryptography, it’s worth noting that the process revolves around manipulating polynomials and matrices, as shown below, taken from a 2021 Zama Team Marc Joye paper "Guide to Fully Homomorphic Encryption over the [Discretized] Torus". For those keen to delve deeper, we highly recommend Marc Joye's introductory book on TFHE, which is easy to pick up for those with only a basic understanding of linear algebra.
Insert image description here

3. What is RISC Zero?

RISC Zero is a general-purpose zero-knowledge proof system designed specifically for the RISC-V architecture. In other words, anything that can be compiled into < /span> A question people often ask is: Why did RISC Zero choose RISC-V instead of other instruction sets? There are two main reasons: (executable and linkable format) programs are compatible with RISC Zero. As a VM executes, RISC Zero generates a zero-knowledge proof of that execution, called a "receipt." The details are shown in the figure below. ELF
Insert image description here

  • 1) RISC-V is a simple universal instruction set. RISC Zero only needs to support the following 46 instructions in riscv32im:
    LB, LH, LW, LBU, LHU, ADDI, SLLI, SLTI, SLTIU, XORI, SRLI, SRAI, ORI, ANDI, AUIPC, SB, SH, AW, ADD, SUB, SLL, SLT, SLTU, XOR, SRL, SRA, OR, AND, MUL, MULH, MULSU, MULU, DIV, DIVU, REM, REMU, LUI, BEQ, BNE, BLT, BGE, BGEU, JALR, JAL, ECALL, EBREAK
    
    It is much simpler than the 1131 instructions of modern Intel x86 and the hundreds of instructions of modern ARM. At the same time, RISC-V is not much different from other minimalist instruction sets - MIPS (MIPS company later converted to RISC-V), WASM and early generations of ARM, such as ARMv4T.
  • 2) Thanks to LLVM, it is easy to compile various languages ​​​​to RISC-V.
    LLVM is a compilation toolchain that implements an intermediate layer (called "intermediate representation (IR)") between backends ("ISAs") and frontends ("programming languages") . Since RISC-V is one of the supported backends, LLVM allows many frontends (including C, C++, Haskell, Ruby, Rust, Swift, etc.) to be compiled into RISC-V.
    See Chris Lattner’s book The Architecture of Open Source Applications (Volume 1) LLVM, LLVM’s 3-stage design implementation, RISC-V is also one of the available LLVM backends.
    Insert image description here

In other words, through a bottom-up approach, RISC Zero:

  • Ability to support programs written using existing Web2 programming languages.
  • can also target ZKP domain-specific languages ​​(DSLs) such as ZoKrates, Cairo, Noir and Circom and create a compiler to convert them to RISC-V.
    For DSL languages ​​that are difficult to compile directly to RISC-V, another approach is to first compile the DSL to an intermediate language like C/C++, and then use an existing LLVM compiler to finally convert to RISC-V.

Another question people will ask is, even though RISC-V is a favorable choice, why boot as a virtual machine? Can't "constraint systems" be written in existing ZK-specific DSLs, such as ZoKrates, Cairo, Noir and Circom?
There are two main reasons:

  • 1) Using RISC Zero can significantly reduce development time.
    Traditionally, building a constraint system for ZKP is a difficult task that can only be completed by skilled developers and can take weeks. Worse, debugging a restraint system can take longer, and it can be difficult to ensure that the restraint system is bug-free. This is one of the factors that limits the development and adoption of zero-knowledge proofs because it makes it difficult to build applications.
    Building historical proofs that verify the history of Bitcoin or Ethereum, or proving the output of an AI with zero knowledge, were once ambitious entrepreneurial projects. Now, with RISC Zero, what was once an ambitious endeavor has become feasible even as a hackathon project. For example, verifying Zama's FHE—a very complex application for which no ZKP constraint system had ever been written before—can be accomplished in RISC Zero with just a few lines of code.
    This shift also simplifies the process of recruiting developers. Existing Rust code can be easily migrated to RISC Zero by someone with prior Rust experience, and that person doesn't even need a lot of Rust knowledge.
  • 2) RISC Zero may even exceed human performance in some aspects. The key advantage of RISC Zero is that it is specifically optimized to perform 32-bit and 64-bit integer calculations and can also manage large storage and memory. Even skilled ZKP engineers cannot beat it.
    Before RISC Zero, performing such optimizations required top ZKP researchers and engineers because such technology was new and had never been documented. Optimization will take at least several months to develop, as it may require the most creative use of lookup arguments.
    RISC Zero has encapsulated these cutting-edge technologies into its Bonsai framework, making them both accessible and affordable. If cryptography is like cooking, then RISC Zero is a microwave oven.

Next, we will explore how to use RISC Zero to verify Zama's FHE calculation with zero knowledge. It seems that all that is needed is to reuse some code from the Zama team and add a few lines of code.

4. Use RISC Zero to verify Zama’s FHE calculation

4.1 Verify bootstrapping of FHE ciphertext

First, we need to show how to verify the main steps of bootstrapping a certain FHE ciphertext:

  • Use the bootstrapping key to perform a "blind rotation" on the polynomial, as follows:
#![no_main]
risc0_zkvm::guest::entry!(main);

// load the toy FHE Rust library from Louis Tremblay Thibault (Zama)
use ttfhe::{
    
    N,
   ggsw::{
    
    cmux, GgswCiphertext},
   glwe::GlweCiphertext,
   lwe::LweCiphertext
};

// load the bootstrapping key and the ciphertext to be bootstrapped
static BSK_BYTES: &[u8] = include_bytes_aligned!(8, "../../../bsk");
static C_BYTES: &[u8] = include_bytes_aligned!(8, "../../../c");

pub fn main() {
    
    
   // a zero-copy trick to load the key and the ciphertext into RISC Zero
   let bsk = unsafe {
    
    
       std::mem::transmute::<&u8, &[GgswCiphertext; N]>(&BSK_BYTES[0])
   };
   let c = unsafe {
    
    
       std::mem::transmute::<&u8, &LweCiphertext>(&C_BYTES[0])
   };

   // initialize the polynomial to be blindly rotated
 let mut c_prime = GlweCiphertext::trivial_encrypt_lut_poly();   
  c_prime.rotate_trivial((2 * N as u64) - c.body);

   // perform the blind rotation
   for i in 0..N {
    
    
       c_prime = cmux(&bsk[i], &c_prime, &c_prime.rotate(c.mask[i]));
   }

   eprintln!("test res: {}", c_prime.body.coefs[0]);
}

In the above code, in addition to the trivial operation code lines such as loading dependencies or constant data, the actual key code is only 5 lines - including the initialization of the polynomial and its blindly rotating:

let mut c_prime = GlweCiphertext::trivial_encrypt_lut_poly();   
c_prime.rotate_trivial((2 * N as u64) - c.body);

for i in 0..N {
    
    
    c_prime = cmux(&bsk[i], &c_prime, &c_prime.rotate(c.mask[i]));
}

The key functions and algorithms for executing FHE steps are all developed by Zama team Louis Tremblay Thibaulthttps://github.com/tremblaythibaultl/ttfhe/< a i=2> Rust library, the specific key functions and algorithms are:

  • trivial_encrypt_lut_poly
  • rotate_trivial
  • cmux

Using RISC Zero, a proof can be generated for the execution of this RISC-V program. The following code of RISC Zero will execute the RISC-V program (an executable file in ELF format) and generate a proof (called "receipt") that authenticates the execution:

let env = ExecutorEnv::builder().build().unwrap();
let prover = default_prover();

let receipt = prover.prove_elf(env, METHOD_NAME_ELF).unwrap();
receipt.verify(METHOD_NAME_ID).unwrap();

The receipt can be sent to a third party who can verify the execution of that RISC-V program without having access to its detailed workings. For situations where a more compact proof format is required, RISC Zero can also generate succinct proofs that are more concise while retaining their verifiability.

4.2 Verify FHE (VFHE)

Take the toy FHE Rust library developed by Zama team Louis Tremblay Thibaulthttps://github.com/tremblaythibaultl/ttfhe/ as an example. To demonstrate how to use RISC Zero to verify FHE. There are two reasons for choosing this library:

  • 1) It is very close to the library used in Zama's production fhEVM
  • 2) It is written in Rust, making it easy to compile directly to RISC-V.

Insert image description here
The toy FHE Rust library is minimalist - it only has 6 files and contains 800 lines of code - but it fully supports the three different types of FHE ciphertext that will be used:

  • 1) LWE ciphertext (lwe.rs): Its structure is a vector composed of 1024 64-bit integers.
  • 2) General LWE (GLWE) ciphertext (glwe.rs): Its structure is also a vector composed of 1024 64-bit integers.
  • 3) General Gentry–Sahai–Waters (GGSW) ciphertext (ggsw): Its structure is size composed of 64-bit integers is 4 × 1024 4\times 1024 4×1024's size.

This is enough to start development using RISC Zero, since the main requirement is an efficient Rust implementation that compiles seamlessly to RISC-V. Louis Tremblay Thibault has developed preliminary versions of these concepts in the VFHE library (https://github.com/tremblaythibaultl/vfhe) as a basic starting point :

#![no_main]
use risc0_zkvm::guest::env;
use ttfhe::{
    
    ggsw::BootstrappingKey, glwe::GlweCiphertext, lwe::LweCiphertext};
risc0_zkvm::guest::entry!(main);

pub fn main() {
    
    
    // bincode can serialize `bsk` into an blob that weighs 39.9MB on disk.
    // This `env::read()` call doesn't seem to stop - memory is allocated until the process goes OOM.
    let (c, bsk): (LweCiphertext, BootstrappingKey) = env::read();

    let lut = GlweCiphertext::trivial_encrypt_lut_poly();

    // `blind_rotate` is a quite heavy computation that takes ~2s to perform on a M2 MBP.
    // Maybe this is why the process is running OOM?
    let blind_rotated_lut = lut.blind_rotate(c, &bsk);

    let res_ct = blind_rotated_lut.sample_extract();

    env::commit(&res_ct);
}

However, in the comments of the above code, two main problems are clearly pointed out:

  • 1) The first problem is related to data loading. An unresolved challenge currently is how to efficiently load bootstrap keys and ciphertext into RISC-V programs, as both can be large in size.
    Louis's approach is to use RISC Zero's env::read channel, which is the standard way to input data from the outside into a RISC-V machine during proof generation. However, as Louis points out, this approach is not optimal, mainly due to its significant memory requirements and the large number of VM CPU cycles required just for data loading, resulting in out-of-memory (OOM) issues. RISC Zero's Parker Thompson admitted that this may be the source of the problem: "Generally, the cost of reading large chunks of data to the client is quite high." a>. The code looks like this: macro, which instructs the compiler to integrate the data into the RISC-V executable. This data can then be deserialized from bytes, for example, using
    As an initial solution to avoid this data loading overhead, it is recommended to embed the data directly into the RISC-V program. In Rust, a typical solution is to use the include_bytes_aligned!bincode::deserialize
    static BSK_BYTES: &[u8] = include_bytes_aligned!(8, ("../../../bsk");
    let bsk: BootstrappingKey = bincode::deserialize(BSK_BYTES);
    
    However, significant challenges arise from the large number of cycles required by the RISC-V program to allocate memory and copy data for the entire 64MB bootstrap key. Benchmark testing shows that just proving the correct loading of the key takes at least 2 hours.
    This article reveals the "zero-copy" technique in RISC zero to solve this problem. It allows for bootstrapping keys with almost zero CPU cycles. This article will delve into the details of this technology.
  • 2) The second major issue involves calculations. As Louis commented on the code, the efficiency of blind rotation (which is the main step of bootstrapping) may be an issue as it is not a lightweight calculation per se ("takes 2 seconds to perform on M2 MBP" ). This is the bigger challenge of the whole "verifying FHE in RISC Zero" thing.
    The L2IV Research team has designed and implemented many tricks and techniques in RISC Zero to optimize this part. The process of optimizing FHE calculations in RISC Zero is extensive, and we plan to dedicate several articles in this series to thoroughly explain each tip and technique.

4.2.1 Zero-copy trick for loading large amounts of data

The L2IV Research team takes a deep dive into ways to overcome data loading challenges in RISC-V programs, highlighting ways to significantly reduce RISC-V CPU cycles.
The core idea is to avoid copying data in RISC-V. This is critical because copying a 64MB data set in RISC-V would require over 50 million instructions—one to read the data, one to write the data, and one to update the pointer. All of these instructions are unnecessary to some extent because the Rust compiler already includes the data as part of the RISC-V program, so the data is already available.
Implementing this in Rust is challenging due to its inherently memory-safe design. Standard practice in Rust involves initializing data structures through a rigorous process: allocating the data structure in the stack or heap, zeroing out the data structure's memory (by filling the whole memory with zeros for the purpose), and then copying the data one by one . As you can see, Rust costs even more in compute cycles, since zeroing memory requires at least another 34 million instructions.
The L2IV Research team’s solution uses certain low-level Rust primitives to “get around” the limitations imposed by the Rust LLVM compiler, allowing for more efficient programming in RISC-V.

In the process of collaborating with Greater Heat on Aleo mining, I learned a valuable technique involving< a i=3>. This is a special Rust function that can reinterpret bits of one type into another type. In particular, it can be used to modify the type of a pointer. In this application, the bootstrapping key (BSK_bytes) and the ciphertext to be bootstrapped can be explicitly embedded (or more accurately, hardcoded) directly in the RISC-V file (C_bytes. To avoid copying data, the type of the pointer can be manipulated directly, as follows:std::mem::transmite

let bsk = unsafe {
    
     
   std::mem::transmute::<&u8, &[GgswCiphertext; N]>(&BSK_BYTES[0]) 
}; 

let c = unsafe {
    
     
   std::mem::transmute::<&u8, &LweCiphertext>(&C_BYTES[0]) 
};

As shown in the previous code snippet, the ELF segment pointer to the hard-coded data is obtained, initially a byte pointer (&u8). Then, convert it to a pointer to the bootstrap key (&[GgswCiphertext; N]) or a pointer to the LWE ciphertext (&LweCiphertext). Furthermore, it is necessary to enclose this code in "unsafe" brackets because Rust classifies this low-level function as unsafe and requires explicit acknowledgment of its potential risk by being unsafe. This unsafe use does not in itself mean danger; rather, it means that specialized expertise is required to handle such low-level operations.

For those familiar with C/C++, this process can be likened to typecasting. In C/C++, the equivalent code looks like this:

/* C */
BootstrappingKey *bsk = (BootstrappingKey*) &BSK_BYTES[0];
LweCiphertext *c = (LweCiphertext*) &C_BYTES[0];

/* C++ */
BootstrappingKey *bsk = reinterpret_cast<BootstrappingKey*>(&BSK_BYTES[0]);
LweCiphertext *c = reinterpret_cast<LweCiphertext*>(&C_BYTES[0]);

Experimental results show that using this approach virtually eliminates the number of cycles typically required for data loading. Subsequent analysis will further verify this zero-copy operation by using the RISC-V decompiler.

5. RISC-V at a glance

The L2IV Research team has addressed the initial challenges Louis from Zama encountered with FHE bootstrapping verification using RISC Zero. Upcoming articles in this series will delve into the topic of performance improvements and their nuances.

This section focuses on using the RISC-V decompiler to examine RISC Zero-verified programs in a zero-knowledge context. There are two goals:

  • 1) Confirm that the L2IV Research team’s technology effectively achieves zero-copy at the RISC-V assembly level.
  • 2) Comprehensively understand the overall structure of the RISC-V program.

To do this, usehttps://github.com/NationalSecurityAgency/ghidra - developed by the US National Security Agency (NSA) A comprehensive, free-to-use reverse engineering framework that supports RISC-V.
Insert image description here
The above figure shows Ghidra CodeBrowser for a RISC-V program certified by RISC Zero.

Ghidra also supports:

  • Displaying RISC-V assembly code, see the middle pane above.
  • and the decompiled code (in C) - see the right pane above. Looking back at the 46 instructions of RISC-V mentioned earlier, it is worth noting that the assembly code being analyzed uses these precise instructions.

First focus on the automatically generated decompiled code, as shown below:

/* method_name::main */

void method_name::main(void)

{
    
    
  uint *puVar1;
  uint *puVar2;
  int iVar3;
  undefined auStack_c018 [8192];
  undefined auStack_a018 [8192];
  undefined *local_8018;
  code *local_8014;
  int *local_4018 [2];
  undefined1 *local_4010;
  undefined4 local_400c;
  undefined **local_4008;
  undefined4 local_4004;
  
  gp = &__global_pointer$;
  ttfhe::glwe::GlweCiphertext::trivial_encrypt_lut_poly(auStack_c018);
  ttfhe::glwe::GlweCiphertext::rotate_trivial((int)auStack_c018,0x600);
  puVar2 = &anon.874983810a662adbf4687c54e184621b.1.llvm.4718791565163837729;
  puVar1 = (uint *)&anon.874983810a662adbf4687c54e184621b.0.llvm.4718791565163837729;
  iVar3 = 0x400;
  do {
    
    
    ttfhe::glwe::GlweCiphertext::rotate(local_4018,(int)auStack_c018,*puVar2);
    ttfhe::ggsw::cmux(&local_8018,puVar1,(int)auStack_c018,(int)local_4018);
    memcpy(auStack_c018,&local_8018,0x4000);
    puVar2 = puVar2 + 2;
    iVar3 = iVar3 + -1;
    puVar1 = puVar1 + 0x4000;
  } while (iVar3 != 0);
  local_8018 = auStack_a018;
  local_8014 = u64>::fmt;
  local_4010 = anon.874983810a662adbf4687c54e184621b.4.llvm.4718791565163837729;
  local_400c = 2;
  local_4018[0] = (int *)0x0;
  local_4008 = &local_8018;
  local_4004 = 1;
  std::io::stdio::_eprint(local_4018);
  return;
}

It can be initially confirmed that its data loading process does achieve zero-copy efficiency. This code snippet uses std::mem::transmute to compile the data load into a 16-byte sequence of RISC-V machine codes:

37 c5 20 04 93 04 c5 4c 37 c5 20 00 13 04 c5 4c

decompilation shows 4 assembly instructions, which are responsible for storing pointer values ​​into the s1 and s0 registers. Essentially, this code assigns the 0x420c4cc value to the s1 register and the 0x020c4cc value to the s0 register:

        00200844 37 c5 20 04     lui        a0,0x420c

        00200848 93 04 c5 4c     addi      s1,a0,0x4cc

        0020084c 37 c5 20 00     lui        a0,0x20c

        00200850 13 04 c5 4c     addi      s0,a0,0x4cc

The assembly code can be further decompiled into a C-like format for a clearer understanding, as shown below:

uint *puVar1;
uint *puVar2;

puVar2 = &anon.874983810a662adbf4687c54e184621b.1.llvm.4718791565163837729;
puVar1 = (uint *)&anon.874983810a662adbf4687c54e184621b.0.llvm.4718791565163837729;

In the decompiled code, the first label anon.874983810a662adbf4687c54e184621b.1.llvm.4718791565163837729 clearly indicates the location of the ciphertext byte, which is expressed as C_BYTES. Using Ghidra, you can directly observe these ciphertext bytes in the code:
Insert image description here
The above figure shows the data of the ELF executable file at the position 0x420c4cc, that is, the s1 register The initial value is used for the ciphertext to be bootstrapped.

C_BYTE is contained in a file named "c". The contents of this file can be examined by usingHex Fiend, a tool for hex editing. As described below, checks confirmed the consistency of the data.

static C_BYTES: &[u8] = include_bytes!("../../../c");

Insert image description here
As shown in the hex editor above, the ciphertext to be bootstrapped is stored in the "c" file.
Similarly, for the second label - anon.874983810a662adbf4687c54e184621b.0.llvm.4718791565163837729, the byte corresponding to the bootstrap key can be located, called BSK_BYTES.
Insert image description here
The above figure shows the data of the ELF executable file at the 0x020c4cc location, that is, the initial value of the s0 register, which is used to bootstrap the key.
Additionally, this data can be verified by cross-checking it with its source file "bsk" to ensure it is consistent with the information above.
Insert image description here
The image above, in the hex editor, shows the "bsk" file storing the bootstrap key.

Next, we will show how to use the bootstrap key and ciphertext in the program. In the Rust source code, the bootstrap key and ciphertext are simultaneously integrated in the for loop by explicitly calling the cmux function:

// perform the blind rotation
for i in 0..N {
    
    
   c_prime = cmux(&bsk[i], &c_prime, &c_prime.rotate(c.mask[i]));
}

Then, use Ghidra to locate and inspect the corresponding decompiled code:

puVar2 = &anon.874983810a662adbf4687c54e184621b.1.llvm.4718791565163837729;
puVar1 = (uint *)&anon.874983810a662adbf4687c54e184621b.0.llvm.4718791565163837729;
iVar3 = 0x400;
do {
    
    
  ttfhe::glwe::GlweCiphertext::rotate(local_4018,(int)auStack_c018,*puVar2);
  ttfhe::ggsw::cmux(&local_8018,puVar1,(int)auStack_c018,(int)local_4018);
  memcpy(auStack_c018,&local_8018,0x4000);
  puVar2 = puVar2 + 2;
  iVar3 = iVar3 + -1;
  puVar1 = puVar1 + 0x4000;
} while (iVar3 != 0);

The decompiled code contains different components:

  • 1) Cursor Assignments: puVar2 is used as the current pointer to c, and puVar1 is the current pointer to bsk. This setup facilitates navigation through ciphertext and bootstrap keys.
  • 2) Loop Mechanics: This loop uses iVar3 as a decrementing counter from 1024, receiving the loop when its value is 0. During each iteration, the following operations will be performed:
    • 2.1) Ciphertext Manipulation: c_prime, stored in auStack_c018 of the stack, first rotate with c.mask[i] (called *puVar2) and the result generated is stored in local_4018.
    • 2.2)cmux Function Invocation: cmux Key function, taking bsk[i] (puVar1), original c_prime, and rotated c_prime as input , output the results to local_8018.
    • 2.3) Optimization Opportunity: An interesting aspect is the handling of local_8018, which is treated as an updated c_prime. It is copied back into the c_prime variable, hinting at potential optimizations. Efficiency can be improved by eliminating this copy through in-place executioncmux.
    • 2.4) Cursor Updates: This cycle includes updating puVar2 and puVar1 simultaneously. puVar2 moves forward two dwords (64 bits) to the next c.mask[i], while puVar1 moves forward 16384 dwords (524288 bits) to the subsequent bsk[i].

When the iVar3 value is 0, the above loop ends. These steps together represent the process of handling FHE bootstrapping in the RISC-V program.

Further analysis using Ghidra enables a closer look at other parts of the plan, providing insight into potential optimization opportunities. This process helps evaluate whether the Rust RISC-V compiler is generating RISC-V instructions as expected.

For example, check the decompiled code of the cmux function. To provide context, the original Rust code will first be considered, as follows:

/// Ciphertext multiplexer. If `ctb` is an encryption of `1`, return `ct1`. Else, return `ct2`.
pub fn cmux(ctb: &GgswCiphertext, ct1: &GlweCiphertext, ct2: &GlweCiphertext) -> GlweCiphertext {
    
    
   let mut res = ct2.sub(ct1);
   res = ctb.external_product(&res);
   res = res.add(ct1);
   res
}

The decompiled code shows that the function calls to "sub" and "add" were effectively inlined during compilation. This inlining creates visible loops in the code that are responsible for simulating 64-bit integer operations. Additionally, the code uses several calls to memset and memcpy. It's worth noting that some instances of memset are used to zero memory, which may not always be necessary. This observation opens up potential optimization avenues, especially in eliminating unnecessarymemsetcalls.
Insert image description here
The above figure shows the decompiled RISC-V instruction code of the cmux function.

References

[1] L2IV Research团队2023年11月16日博客 Tech Deep Dive: Verifying FHE in RISC Zero, Part I:From Hidden to Proven: The ZK Path of FHE Validation

RISC Zero Series Blog

Guess you like

Origin blog.csdn.net/mutourend/article/details/135014646