A preliminary study on secure multi-party computation and homomorphic encryption

1. Secure multi-party computation

Technologies such as privacy computing, multi-party secure computing, and federated learning have become very popular. However, these technologies require solid mathematical skills, and the coverage is too fragmented and deep, and it is easy to form no system. This article aims to clarify the context and try to lead in depth.

Privacy computing is the Chinese abbreviation of "Privacy-Preserving Computation" and there is no unified standard definition. The United Nations Global Working Group on Big Data defines: Privacy computing is a type of technical solution that can keep data opaque, non-disclosure, and unobtainable by computing methods and other unauthorized parties during the process of processing and analyzing computing data.

The main role of privacy computing: For consumers, privacy computing applications help ensure the security of personal information; for enterprises and institutions, privacy computing is a key path to fulfill data protection obligations during data collaboration; for governments, Privacy computing is an important support for maximizing data value and social welfare.

Privacy computing enables enterprises, under the premise of data compliance requirements, to fully mobilize the enthusiasm of data resource owners, users, operators, and regulators, and achieve massive aggregation, transactions, and circulation of data resources, thereby revitalizing the data resources of third-party organizations. value and promote the market-oriented allocation of data elements. With the promulgation of the National Data Security Law, privacy computing has become even more valuable.

Privacy computing realizes " data is available and invisible ", and integrates many technical schools, roughly classified as follows:

  • Secure multi-party computation MPC, joint statistics, joint query, joint custom calculation, anti-malicious attacks, etc.;
  • Trusted Execution Environment TEE provides hardware isolation technology to build a safe and trusted area;
  • Federated learning FL provides privacy protection such as joint statistics, joint feature engineering, joint modeling and prediction;
  • Blockchain, distributed identity, authorization management, cross-domain governance, trusted data sources, audit evidence, ciphertext cross-validation
  • Differential privacy, providing privacy protection for data results

Note: The last item of differential privacy is not listed separately in some journals, but the effect can still be achieved.

Secure Multi-party Computation (MPC): It is a cryptographic technology that is based on multi-party data to collaboratively complete calculation goals and achieve no leakage of private data of all parties except calculation results and deducible information.

Secure multi-party computing does not refer to a single protocol, but a collection of key technologies. Commonly used technologies include:

  • confusing circuits;
  • Secret sharing; many scenes, more popular
  • Homomorphic encryption; many scenarios, more popular
  • Inadvertent transmission, often used for privacy requests
  • Zero-knowledge proof, often used for result verification

Privacy computing products built with MPC technology as the core are often called MPC platforms. The reference frame is shown in Figure 1. The MPC platform is a privacy security product with password as its core. Compliance is its most basic requirement, so it should comply with relevant laws and regulations and meet a series of standard requirements.

The MPC technology platform supports two technical architectures:

  • One is the way MPC directly connects to the platform: first, the MPC protocol, MPC compiler, and MPC application adaptation are used to build the MPC computing module; secondly, the MPC computing module supports the upper-layer general computing and machine learning computing; and finally, privacy intersection and privacy are realized. Privacy computing capabilities such as statistics and joint modeling.
  • The second is to build a platform in a way that MPC enhances FL: first, build the FL computing module on the bottom layer and use MPC to enhance its security; secondly, support the upper-layer machine learning algorithm through the FL computing module; and finally implement the privacy computing function of the machine learning class. .

Figure 1: MPC platform system framework

1.1 Security Model Security Model

Security model: the assumptions made about the behavioral patterns of participants.

The academic community introduces a series of security assumptions when constructing cryptographic algorithms. Only when these security assumptions are true, the corresponding cryptographic algorithm is safe. Similarly, a cryptographic protocol composed of multiple cryptographic algorithms needs to introduce more security assumptions due to the addition of more interacting parties.

For a cryptographic protocol, the set of all security assumptions required and the security requirements under the corresponding assumptions is called a security model.

The secure multi-party computation protocol may be attacked by external or internal adversaries during execution. Therefore, the security model of secure multi-party computation defines an adversary that can control a subset of corrupted participants to cover external attacks, Insider attacks and various collusion attack scenarios. In secure multi-party computation, commonly used adversary behavior models include semi-honest model (Semi-honest model) and malicious behavior model (Malicious model) . In the semi-honest behavior model, it is assumed that the adversary will honestly participate in the specific protocol of secure multi-party computation, follow every step of the protocol, and just want to infer the privacy of the other party through the content obtained during the execution of the protocol; while in the malicious behavior model , the adversary can not follow the agreement and take arbitrary actions to obtain the privacy of others.

The security of secure multi-party computation is defined through an ideal world/real world model. In this model, an ideal world in which a trusted third party exists is first defined. Each participant provides its own secret data to a trusted third party through a secure channel, and the third party performs function calculations on the joint data. After completing the calculation, the trusted third party sends the output to each participating party. Corresponding to the ideal world is the real world. There is no trusted third party in the real world. Each participant implements function calculations on joint data by directly interacting with each other to execute the agreement. If any real-world attack can be simulated in the ideal world, then we say that this multi-party computation protocol is secure. Specifically, for any adversary in the real world, there is an adversary in the ideal world whose input/output joint distribution in the ideal world execution is indistinguishable from the input/output joint distribution calculation of the adversary in the real world execution.

1.1.1 Semi-honest adversary model

In the Semi-honest model, the computing party has the need to obtain the original data of other computing parties, but it still performs according to the computing protocol. A semi-honest relationship means that there is a certain degree of trust between the participants, which is suitable for data calculations between institutions; a semi-honest member fully abides by the execution process of the agreement, does not quit the execution process of the agreement midway, and does not tamper with the results of the operation of the agreement, but other You can retain some intermediate results during the execution of the agreement, and try to analyze and deduce the input data of other members through these intermediate results.

1.1.2 Malicious adversary model

In the Malicious model model, the participants do not perform the calculation process in accordance with the calculation protocol at all. They can interrupt the operation of the protocol at will, destroy the normal execution process of the protocol, modify the intermediate results of the protocol at will, or collude with other participants. Participants can communicate with each other in any (malicious) way without any trust relationship. The result may be that the agreement is not executed successfully and both parties do not get any data; or the agreement is executed successfully and both parties only know the calculation results. It is more suitable for data calculation between individuals or between individuals and institutions.

1.1.3 Security two-party calculation

The protocol used is Garbled Circuit (Confused Circuit GC) + Oblivious Transfer (Oblivious Transfer OT);

GC+ OT is a general algorithm under the two-party Semi-honest model, which can support secure two-party calculations with any computing logic.

1.1.4 Secure multi-party computation

The protocol used is: homomorphic encryption + secret sharing + OT (+ commitment scheme + zero-knowledge proof, etc.). The main construction technologies are: Yao's GC (confusion circuit), which is a series of confusion circuits pioneered by Academician Yao Qizhi.

  • SPDZ (arithmetic circuit) is a series of encryption based on secret sharing and limited homomorphism pioneered by Ivan Damgard.
  • GMW (Boolean circuit), the pioneering paper of boolean sharings.
  • ABY (including share conversion of arithmetic, Boolean and confusion circuits), including ABY and ABY3.
  • MHE, based on the FHE fully homomorphic encryption series.

There are already general theoretical solutions to secure multiparty computation that can support any type of computation. Common general theoretical frameworks for secure multi-party computation mainly include confusion circuits, secret sharing, homomorphic encryption, zero-knowledge proof, etc.

You may wish to read MPC's recent review document first. If you are interested, I recommend the following two books, each with its own emphasis.

  • Secure Multiparty Computation (MPC)2020-Yehuda Lindell.
  • A Pragmatic Introduction to Secure Multi-Party Computation.
  • Applications of Secure Multiparty Computation.

1.2 Confusion circuit

Obfuscation circuit is a cryptographic protocol, a concept proposed by Academician Yao Qizhi in the 1980s for secure computing. The effect is that when several communicating parties need to input some data together, then a result is calculated through the same function. However, neither party to the communication wants others to know what their input is. At this time, the purpose can be achieved by using the confusing circuit protocol.

Based on the framework of obfuscated circuits, under this framework, the function of any function is represented as a logical circuit composed of AND gate and XOR gate . The participants of the agreement are composed of generators. It is composed of evaluator. A gate circuit is actually a truth table. For the truth table of each gate circuit, the generator first generates a random number for the truth value on each input/output line. In this way, the true value on each input/output line corresponds to a random number. Then, the generator encrypts the random number corresponding to the true value on each output line using the two random numbers corresponding to the true value on the corresponding input line. After completing these encryption operations, the generator randomly scrambles all the generated ciphertexts to generate a confused, or encrypted, truth table, and sends it to the calculation side along with the random numbers corresponding to its input. The calculating party needs to securely obtain the random number corresponding to its input from the generating party, which can be done through the oblivious transfer protocol. After obtaining the random number input by itself and the random number input by the generator, the calculating party decrypts the confusing truth table and obtains the only correct random number corresponding to the output true value. When reaching the last gate circuit, the calculation side only needs to query the correspondence table between the random number and the true value of the circuit output line to obtain the output value of the entire function calculation.

1.2.1 Inadvertent transmission

The cryptographic idea behind Garbled Circuits (GC) relies on: Oblivious Transfer (OT).

Unintentional transmission (OT) means that assuming A has two values ​​�1,�2, B wants to get one of the values, but B pays attention to privacy and does not want A to know which value he chose. Therefore, it can be guaranteed through the OT protocol that B only obtains one of the values, but A does not know which one he obtained. This is the most basic OT protocol, also known as the 1-out-of-2  OT protocol. As the name suggests, it is a choice of 1 out of 2.

In addition, there is the 1-out-of-n  OT protocol, also known as 1-out-of-n. And the m-out-of-n  OT protocol also becomes m-out-of-n.

It can also be defined as: When executing the 1-out-of-2 OT protocol, the sender has a pair of messages (�0, �1). The receiver has a selection bit b. After passing the OT protocol, the receiver receives the element �� corresponding to b, but does not know the value of �1−�, and the sender does not know which message the receiver has received. As shown below.

1.2.2 Logic circuit

Can be extended to any door.

1.2.3 Confusion circuit

The obfuscated circuit protocol is divided into the following parts.

  • Step 1: Alice generates a confusion circuit. The generation process is mainly divided into four steps;
  • Step 2: Alice and Bob communicate, and Bob obtains Alice’s value through the OT protocol;
  • Step 3: Bob evaluate evaluates (decrypts) the generated obfuscated circuit;
  • Step 4: Share the results and obtain the logic value of the circuit output.

As shown below:

The detailed explanation is as follows:

For a basic circuit, the truth table is as follows:

Then each input and output corresponds to a key, which can be understood as a mapping, mapping the original 0/1 input and output to another label (can be an integer, a string, etc.).

The truth table that is actually sent needs to be encrypted and obfuscated (that is, the rows are changed).

Decryption of obfuscated circuits

Bob starts decrypting after receiving the obfuscated circuit. When Alice sends the obfuscation circuit, she also sends the key corresponding to Alice's input to Bob (Bob will not find out what Alice's real input is at this time). At the same time, Alice also sends Bob the keys corresponding to the inputs that Bob can choose. At this time, the oblivious transmission protocol needs to be run, which allows Bob to choose the corresponding keys from all possible input keys, and will not Let Alice know Bob's choice. After Bob obtains the key, he decrypts the truth table of the obfuscated circuit to obtain the output of the obfuscated circuit.

1.3 Homomorphic Encryption, HE

1.3.1 What is homomorphic encryption

Homomorphic encryption focuses on data processing security . It provides a function to process encrypted data, that is, the encrypted data can be processed. After the user with the key decrypts the processed data, The processed results are consistent with the results of plain text data processing. ( General encryption solutions focus on data storage security, that is, users cannot do any operations on the encryption results, otherwise it will lead to incorrect decryption or even decryption failure )

Homomorphic encryption is tailor-made for cloud computing: the user wants to process a piece of data, but his computer has weak computing power. This user can use the concept of cloud computing and let the cloud help him process the results. However, security cannot be guaranteed by directly handing the data to the cloud, so homomorphic encryption is used to let the cloud process the encrypted data and then return the processing results to the user.

If an encryption function satisfies both additive homomorphism and multiplication homomorphism, it is called Fully Homomorphic Encryption (FHE) . At present, the cryptographic assumptions that can be used to construct fully homomorphic encryption mainly include the ideal coset problem on the ideal lattice (Ideal Coset Problem, ICP), the approximate greatest common divisor problem on the integers (Approximate Greatest Common Devisior, AGCD), and the error learning problem ( Learning with Errors, LWE) and so on. Since there is no effective quantum solution algorithm for the LWE problem, encryption schemes based on the LWE assumption are considered quantum-resistant.

A fully homomorphic encryption scheme is an encryption scheme that allows arbitrarily complex program evaluation of encrypted data.

Learning with Errors (LWE) , also translated as fault-tolerant learning problem , is the problem of solving a system of linear equations with noise. In 2009, Craig Regev constructed the fully homomorphic encryption scheme for the first time (known as the " Holy Grail of Cryptozoology "), for which he won the 2018 Gödel Prize.

Craig Gentry’s intuitive definition of homomorphic encryption:

A way to delegate processing of your data, without giving away access to it.

A way of delegating processing of data without giving up access to the data.

1.3.2 Homomorphic encryption process definition

Alice's entire process of processing data with homomorphic encryption HE through Cloud Cloud is roughly as follows:

  • Alice encrypts the data. And send the encrypted data to Cloud;
  • The processing method of Alice submitting data to Cloud is represented here by function f;
  • Cloud processes the data under function f and sends the processed results to Alice;
  • Alice decrypts the data and gets the result.

Based on this, we can intuitively get the function that a HE solution should have:

  1. KeyGen function: Key generation function. This function should be run by Alice to generate the key Key used to encrypt the data Data. Of course, there should be some public constants PP (Public Parameter);
  2. Encrypt function : encryption function. This function should also be run by Alice to encrypt the user data Data with Key to obtain the ciphertext CT (Ciphertext);
  3. Evaluate function: evaluation function. This function is run by Cloud and operates on the ciphertext under the data processing method f given by the user, so that the result is equivalent to the user encrypting f(Data) with the key Key.
  4. Decrypt function: decryption function. This function is run by Alice to get the result f(Data) processed by Cloud.

1.3.3 Definition of cryptographic security

The most basic security of the HE solution is semantic security (Semantic Security). Intuitively speaking, the ciphertext does not reveal any information in the plaintext.

Of course, there is a stronger security definition called Chosen Ciphertext Security. Selected ciphertext security is divided into non-adaptive (None-Adaptively) and adaptive (Adaptively), namely CCA1 and CCA2.

The contents of this section are subject to confirmation.

1.3.4 Homomorphic encryption learning sequence

Personal practical learning experience, the order of articles to read is as follows:

  1. Paper 1:BV11:Efficient Fully Homomorphic Encryption from (Standard) LWE.
  2. Paper 2:The Learning with Errors Problem.
  3. Paper 3:BGV12:(Leveled) fully homomorphic encryption without bootstrapping.
  4. Paper 4:Bra12: Fully Homomorphic Encryption without Modulus Switching from Classical GapSVP.
  5. Paper 5:GSW13:Homomorphic Encryption from Learning with Errors: Conceptually-Simpler, Asymptotically-Faster, Attribute-Based.
  6. Paper 6:CKKS17: Homomorphic encryption for arithmetic of approximate numbers.
  7. Paper 7:CHIMERA: Combining Ring-LWE-based Fully Homomorphic Encryption Schemes.
  8. Paper 8:PEGASUS: Bridging Polynomial and Non-polynomial Evaluations in Homomorphic Encryption.

Homomorphic encryption technology column

Aigraphx: Homomorphic Encryption (1) - Efficient Fully Homomorphic Encryption from (Standard) LWE1 (Efficient Fully Homomorphic Encryption from (Standard) LWE) 46 Agree · 7 Comments Editor

1.4 Secret sharing

The idea of ​​secret sharing is to split the secret in an appropriate manner. Each split share is managed by a different participant. A single participant cannot recover the secret information. Only several participants can work together to recover the secret message. What's more, when something goes wrong with any of the corresponding participants, the secret can still be fully recovered.

Secret sharing is a cryptographic technology that divides and stores secrets. The purpose is to prevent secrets from being too concentrated to achieve the purpose of dispersing risks and tolerating intrusions. It is an important means in information security and data confidentiality.

The key to secret sharing is how to better design secret splitting and recovery methods. Secret sharing consists of two algorithms - the secret share allocation algorithm and the secret recovery algorithm. When executing the secret share distribution algorithm, the distributor divides the secret into several shares and distributes them among a group of participants, so that each participant gets a secret share of the secret; the secret recovery algorithm ensures that only the participant's Some specific subsets can effectively recover the secret, while other subsets cannot effectively recover the secret, and even no useful information about the secret can be obtained.

The multi-party secure computing secret sharing process is as follows:

  • Data splitting->data distribution->data calculation->get calculation results->summarize calculation results.
  • Key sharing ensures that during the calculation process, each participant sees only some random numbers, but still calculates the desired result in the end.

For example, the problem of calculating the average salary : If A, B, C, and D can find the average salary of four people without revealing their own salaries?

  • 1. A generates a random number, adds it to his salary, encrypts it with B's public key and sends it to B
  • 2. B decrypts it with his own private key, adds his salary, and then encrypts it with C’s public key and sends it to C.
  • 3. C decrypts it with his own private key, adds his salary, and then encrypts it with D’s public key and sends it to D.
  • 4. D decrypts it with his own private key, adds his salary, and then encrypts it with A’s public key and sends it to A.
  • 5. A uses his own private key to decrypt and subtract the original random number to obtain the total salary.
  • 6. A divides the total salary by the number of people to get the average salary and announces the result.

Based on the framework of secret sharing, under this framework, the function of any function is also represented as a logic circuit composed of AND gates and XOR gates. Different from the confusion circuit framework, the input and output of each operation gate are shared with each participant in a secret sharing manner. Due to the arithmetic nature of secret sharing, the secure calculation of the XOR gate can be performed directly locally on each participant without the need for pairwise interaction. For safe computation of AND gates, interaction between participants is required. The interaction here is generally based on Beaver's multiplication triplet. Multiplicative triples are input-independent and can be generated in an offline stage by running a cryptographic protocol (such as Oblivious Transfer Protocol) between each party or by a dedicated third party. Furthermore, instead of representing the function as a logic circuit, another way is to represent it as an arithmetic circuit consisting of multiplication gates and addition gates. The secure computation of addition gates can be completed locally on each participant, while the secure computation of multiplication gates requires interaction between participants. Secret sharing schemes include Shamir's secret sharing scheme and Feldman's verifiable secret sharing scheme. The secure multi-party computation framework based on secret sharing only requires simple arithmetic operations, so the calculation speed is relatively fast, but it also has limitations:

  • Since the secure calculation of each XOR gate/multiplication gate requires interaction between participants, the number of interactions between participants is not fixed, but is related to the complexity of the entire calculation;
  • Second, due to the need for pairwise interaction between participants, and even simple functions will be expressed as complex logic/arithmetic circuits, this will result in large communication overhead;
  • Third, when the number of participants is large, it is difficult to ensure that the interactions between the participants can achieve perfect synchronization, and during the entire calculation process, all participants must remain online, which is not practical in reality. It may be difficult to guarantee, for example, for mobile users with limited resources and unstable network conditions.

1.5 Zero-knowledge proof

1.5.1 Concept

Zero-knowledge proofs, abbreviated as ZKPs , were originally proposed by S. Goldwasser, S. Micali and C. Rackoff in their 1985 paper "Knowledge Complexity of Interactive Proof Systems" . If the verifier provides any useful information, the verifier believes that a certain assertion is correct.

1.5.2 Zero-knowledge proof process

Privacy Computing Column-Zero Knowledge Proof

Aigraphx: zero-knowledge proofs, classic implementation of ZKPs 22 Agree · 1 Comment Article Edit

2. Comparison of main technologies in privacy computing

3. Application-oriented secure multi-party computing technology

3.1 Securing machine learning through MPC

Machine learning is often considered the most important tool in big data systems because of its ability to effectively mine valuable knowledge hidden in large amounts of data. However, even with multi-GPU acceleration, training image processing algorithms using deep learning techniques still takes hours or even days. Therefore, many cloud service providers provide machine learning services, such as Alibaba, Azure Machine Learning, and Google Cloud Machine Learning Engine. Using these platforms, users can outsource their machine learning tasks to the cloud, however, they are reluctant to disclose their data to the cloud due to security and privacy concerns. Therefore, privacy and security have become hot issues in outsourced learning tasks.

3.1.1 Privacy-preserving machine learning from multiple data sources

In this scenario, data is collected from different users. For example, a smart wearable device might collect data from the wearer and upload that data to a data center. Another application example is that users upload location information to the cloud in order to obtain location-based services from service providers. Due to privacy concerns, some users do not want their sensitive data to be leaked to the cloud. Therefore, the challenging question arises, that is, how to ensure that machine learning-based services can be provided to all users while protecting each user's sensitive data. So far, two technical solutions for private learning from distributed data sources have been proposed . One is based on randomization, such as using differential privacy to add noise to the original data. Another one is constructed through MPC . For the latter, a common approach is to first design a data aggregation scheme to aggregate data encrypted using different keys into an integrated data set. In addition, there are currently many research works on specific privacy-preserving machine learning tasks for multiple data sources. For example, construct a privacy-preserving Naive Bayes classification method.

3.1.2 Protecting data and trained models in two-party situations

In this scenario, a cloud server (called Bob) owns a data set and trains a model through a specific machine learning algorithm. Bob wants to not leak this model to other entities. The user (called Alice) wants to receive customized services based on his own data, but he does not want his private data to be leaked. This situation applies to the application of the classic MPC protocol: Alice inputs her private data and Bob inputs the model. At the end of the protocol execution, Bob does not know Alice's data, and Alice does not know Bob's model.

3.2 Privacy set calculation

Set calculations are basic operations commonly used in MPC. They are essential tools for tasks such as privacy-preserving data querying and data mining. Freedman et al. first studied the intersection of two-party sets and proposed a protocol based on Paillier homomorphic encryption system. The protocol achieves security against semi-honest adversaries in the standard model and against malicious parties in the Random Oracle (RO) model. The work of Freedman and others introduced the research direction of privacy set computing, and the tools they used became the basis for subsequent research work.  One research direction is to expand the scope of set operations, while another research direction is to design the Privacy Set Intersection (PSI) protocol under a stronger security model or simpler encryption assumptions . In order to design a PSI protocol that is resistant to malicious adversaries without introducing inefficient zero-knowledge proofs, Hazay and Lindell proposed using oblivious pseudorandom function evaluation (OPRF) to solve the set intersection and pattern matching problems. For the fully malicious security model, PSI protocols based on other paradigms have different approaches.  There are mainly five different technologies to construct efficient PSI protocols, including naive solutions, three-party-based PSI, public key-based PSI, circuit-based PSI, and OT-based PSI. Recently, more and more research works on OT-based PSI have appeared. Dong et al. proposed a PSI protocol based on OT extension and Bloom filter. Pinkas et al. optimized the work of Dong et al. using a stochastically confused Bloom filter protocol, which is based on stochastic OT extensions and achieves security in a semi-honest model. In it, Rindal and Rosulek describe a cheap protocol paradigm for PSI to achieve weak malicious security. Subsequently, Rindal and Rosule attempted to extend the PSI protocol to make it secure in the presence of malicious adversaries and showed that Bloom filter-based protocols are not secure against malicious adversaries.

3.3 Large-scale secure multi-party computation based on cloud server assistance

In the era of cloud computing, some new security issues have been introduced or re-emerged, such as searchable encryption, oblivious RAM, verifiable computing and secure deduplication. These problems can also be abstracted into MPC functions and can be solved through the MPC protocol. Therefore, it is crucial to provide comprehensive and practical solutions for MPC in a cloud computing environment. It is mainly divided into cloud-assisted MPC based on obfuscated circuits and homomorphic encryption.

3.3.1 Cloud-assisted MPC based on confusion circuit

In S2PC based on obfuscated circuits, one side is a circuit generator and the other side is a circuit evaluator. In a cloud-assisted model, both generation and evaluation of confusion circuits can be outsourced to cloud servers. The party that outsources its computing tasks is called the outsourcing party, and the other parties are called non-outsourcing parties.

In such an outsourcing environment, it is expected to design MPC protocols with linear complexity, which means that the outsourcing party's computation and communication costs increase linearly with the size of its inputs and outputs. To achieve this level of sophistication, the non-collusion assumption is necessary. Although the security of the non-collusion model is weaker than the MPC standard malicious model, the non-collusion model is sometimes preferable because it supports a wide range of applications.

The pioneering work of Kamara et al. proposed two secure single-server-aided S2PC protocols, in which the circuit evaluation task is outsourced to the cloud. Later, Carter et al. considered outsourcing security function assessment for power-limited devices such as mobile phones. In contrast to the protocol proposed in Kamara, where both parties are assumed to have low computing power but high bandwidth, the work presented in Carter considers a scenario where two parties, a mobile device with low bandwidth and an application server, Want to run S2PC protocol with the help of cloud server. In this proposed protocol, the mobile device outsources the complex calculations required for circuit evaluation to a cloud server. In Carter, a primitive called outsourced oblivious transfer is introduced, which allows mobile devices to delegate obfuscated key transfer tasks to cloud servers. Mood et al. mainly considered cloud-assisted protocols for reusing encrypted values ​​in the cloud. These authors proposed the concept of PartialGC, which allows encrypted values ​​generated during obfuscation circuit computation to be reused.

In addition to circuit evaluation tasks, circuit generation tasks can also be outsourced to cloud servers. Carter et al. treat mobile devices as obfuscated circuit generators and successfully outsource the generation of obfuscated circuits securely to an untrusted cloud server.

The above work considers outsourcing agreements in the single-server-aided model. In contrast, Kerschbaum proposed a scheme to outsource the generation of obfuscated circuits to multiple servers, divided into encryption servers and evaluation servers. These two types of servers are responsible for encrypting and evaluating obfuscated circuits respectively. The protocol implements three types of obliviousness: input-output obliviousness, function obliviousness and outsourcing obliquity.

3.3.2 Cloud-assisted MPC based on homomorphic encryption

Homomorphic encryption can be naturally combined with cloud computing to build a cloud-assisted MPC protocol. The basic idea is as follows. To perform computational tasks, all parties encrypt their data using the FHE scheme and upload the resulting ciphertext to the cloud. The cloud then performs calculations on these ciphertexts and returns the resulting ciphertext. The data privacy achieved depends on the security of the underlying FHE scheme. However, no participating party should directly possess the key to the FHE scheme. The question therefore becomes how one party can decipher the results of the calculation.

Asharov et al. solve this problem by sharing keys among all parties. They built a cloud-assisted MPC protocol based on the threshold FHE scheme. In this case, during each calculation process, all participants need to generate the parameters of the system, including private keys, public keys, and evaluation keys. In practice, however, when engaging in such a protocol, participants prefer to generate their own long-term parameters in advance. Therefore, a dynamic multiparty computation model was proposed by López-Alt et al. In their model, all participants have their own long-term public/private key pairs, and the MPC protocol used can be built through a multi-key FHE scheme.

The effectiveness of the above projects depends on the efficiency of the underlying FHE solution. To improve the efficiency of the protocol, Peter et al. proposed a new practical protocol based only on additive homomorphic encryption schemes. They used a special type of additive homomorphic encryption called a BCP encryption scheme, in which there is a master key that can be used to decrypt ciphertext encrypted with any public key. Furthermore, these authors assume the existence of two non-colluding cloud servers: one that stores the master key of the underlying encryption scheme and the other that stores all user-generated ciphertexts. The two servers interact to perform the necessary calculations after all relevant users have uploaded their data.

The FHE-based cloud-assisted MPC framework is simpler than the confusion circuit-based framework.

However, the practicality of implementations following this approach depends on the efficiency of the FHE scheme, and unfortunately the question of how to construct a practical FHE scheme remains an open question. However, if the goal is not to build a general protocol, for some specific functionality, the implementation only requires a homomorphic encryption scheme to some extent, and a practical protocol can still be formulated.

The construction of cloud-assisted MPC protocols based on obfuscated circuits does not require the use of inefficient public key cryptography tools. However, such a protocol still has two drawbacks. First, the cloud-assisted MPC protocol based on obfuscated circuits can only achieve security under the assumption of non-collusion, while the security achieved based on homomorphic encryption schemes holds even when any corrupt party colludes. Then, when executing a protocol based on obfuscated circuits, at least one user (not the cloud server) must perform a computation whose cost is linear in the circuit size of the circuit, whereas for protocols based on homomorphic encryption, any participant outside the cloud server A party's overhead only depends on the length of that party's inputs/outputs, minimizing the local computation and communication costs of the party.

3.4 Appendix: Secure multi-party computation: interpretation of theory, practice and application

Sixty-three: Secure multi-party computation: theory, practice and application 128 Agree · 16 Comments Editor

4. Homomorphic encryption and secure multi-party computation

4.1 Fully Homomorphic Algorithm (FHE)

  • Partially homomorphic encryption (PHE) can be used to assist MPC, which can generate multiplication triples for auxiliary calculations without an independent third party in SPDZ or ABY.
  • The partial homomorphic encryption algorithm can be used to build the most basic component of MPC under the malicious model: oblivious transfer.
  • Some homomorphic algorithms can be used together with MPC to complete privacy-preserving neural network training.
  • My overall feeling is that under the premise of "no independent third party", PHE should be an indispensable foundation for MPC. In addition, when the bandwidth is large (larger than MPC with the same function) and the latency requirements are low, when FHE is used for outsourcing calculations, the system status that needs to be maintained is much less than that of MPC.

4.2 Homomorphic encryption is a lower-level cryptographic primitive and is used in infrastructure construction in many fields.

4.2.1 Blockchain

In order to protect the privacy of information on the chain and at the same time enable the computation of relevant information by blockchain nodes, the data can be homomorphically encrypted and the calculation process can be converted into a homomorphic operation process, so that the nodes can operate without knowing Encrypted text calculation is implemented in the case of plain text data.

4.2.2 Outsourcing calculations

In traditional outsourced computing (such as cloud storage and computing solutions), users need to trust the cloud service provider not to steal or even leak user data, and the cloud computing model based on homomorphic encryption can fundamentally solve this contradiction.

4.2.3 Federated learning

In federated learning, homomorphic encryption is mainly used for the parameter interactive calculation process in the joint modeling process to achieve the joint establishment of the prediction model. Currently, the most commonly used homomorphic encryption algorithm in federated learning scenarios is the Paillier additive semi-homomorphic encryption algorithm.

The above federated learning and blockchain are relatively new application fields. Outsourced computing is an application field that has always been favored by cloud computing service providers. In fact, it can be seen from the above three fields that homomorphic encryption is relatively inferior. This lower benefit brings about the ability to construct secure multi-party computations based on it. For example, SHE is used to generate multiplication triples in SPDZ, and for another example, FHE is used directly to construct PSI (also an application of secure multi-party computation).

4.2.4 Secure multi-party computation and homomorphic encryption

MPC is a set of secure computing protocols, which consists of multiple sub-protocols. It is a concept that aims to achieve distributed secure computing, that is, multiple parties cooperate to calculate the objective function without revealing their own data privacy. Conceptually, MPC and HE are not interchangeable. MPC studies secure calculations among multiple parties, while homomorphic encryption studies how to perform calculations on ciphertext. Now, there are already a lot of research results on multi-key homomorphic encryption, which belong to the research results of homomorphic encryption and secure multi-party computation.

Based on the above, we cannot simply say whether homomorphic encryption has substitutable advantages. In real application scenarios, if it is multi-center or multi-node computing, the first thing we consider is the privacy of the data, whether there is a situation where the data cannot leave their respective security zones. If so, then first consider secure multi-party computation. During the cooperative computing process, if some computations need to be outsourced, then homomorphic encryption must be considered. For example, in federated learning, after parameter interaction, it can be used later. This can also be done As a typical outsourcing example, Paillier is used to solve this problem.

4.3 Homomorphic encryption and secure multi-party computation have their own advantages and disadvantages

Homomorphic encryption does not require interaction (or requires very little interaction), so the communication overhead is relatively small, but the corresponding computational overhead is relatively large. MPC is just the opposite. Because it requires more interactions, the communication overhead is relatively large, but the computing overhead is relatively small. On the other hand, homomorphic encryption has provably secure CPA and CCA-1 security. The security of MPC is divided into honest-but-curious (or semi-honest) security and malicious security according to different adversary attack models.

4.4 Disadvantages of secure multi-party computation

  1. Trust issue: Consider "semi-honest" and "malicious models". Semi-honesty requires the participants of the agreement to "not do evil", that is, "honestly execute the agreement". This requirement is not bad for to B users, because once one party does evil and steals data The worst case scenario is that the business cooperation will break up. But for to C users, either the customer accepts the risk of information leakage, or the major manufacturers that collect user data use safer and more efficient "malicious model" protocols in order to "prove their innocence."
  2. Communication volume problem: If you use Yao's confusion circuit, performing an AES encryption requires transmitting more than 1M of confusion table data (not counting OT transmission), and it may only be encrypted 16 bytes of data, which is quite uneconomical.

The above two problems do not exist with homomorphic encryption.

4.5 Protocol security

The essence of homomorphic encryption is to take advantage of the specific properties (homomorphic properties) of certain operations. These extremely special properties make it possible to apply some applications other than encryption algorithms. This is usually a continuous interactive agreement or encryption process involving multiple parties. This multi-party agreement can in some cases be replaced by other solutions.

Multi-party secure computing mainly focuses on protocol-level design to ensure protocol security. Homomorphic encryption uses the characteristics of the algorithm itself to ensure the effectiveness of the encryption method. The two are actually not completely parallel and opposite, but two concepts in different directions.

4.6 Advantages of Homomorphic Encryption

Homomorphic encryption is an encryption scheme that can complete calculations in an encrypted state. In essence, homomorphic encryption is still an encryption scheme that protects the confidentiality of information, and additionally implements the function of homomorphic operations. Research on homomorphic encryption became hot in 2009 when fully homomorphic encryption achieved a huge breakthrough.

Homomorphic encryption is more of a cryptographic primitive than secure multi-party computation, while secure computation is a high-level protocol. Constructing a protocol through cryptographic primitives is a matter of course, so homomorphic operations and homomorphic encryption have always been a tool for secure computing research.

When designing specific computing tasks, special secure computing protocols may have good computing effects. However, in the calculation process of general functions, homomorphic encryption has huge efficiency advantages over secure calculations. Even though the efficiency of fully homomorphic encryption is still not enough to meet the needs of daily use, for the calculation of arbitrary functions, such as cloud computing, its calculation efficiency, complexity of interaction logic, communication bandwidth and communication delay are far less than secure computing.

5. Privacy computing in the field of autonomous driving involves enterprises, scenarios, and content

Intelligent connected vehicles are the leading field of intelligent transportation and an important new infrastructure to be vigorously developed in my country's "14th Five-Year Plan". It is a technical system based on the new generation of information and communication technology, big data, and artificial intelligence to achieve all-round network links and intelligent management in the car, between the car and the cloud platform, between cars, between cars, and on the road, and between cars and people.

5.1 Companies involved

Companies that circulate data on the Internet of Vehicles include chip manufacturers, sensor and other component manufacturers, terminal equipment manufacturers, system integrators, platform operators, data service providers, service providers, etc. Together with users and regulators of the Internet of Vehicles, they form a huge Internet of Vehicles data ecosystem. Although the relevant parties have different roles and interests, they are interrelated and interact with each other to jointly affect the data security of the Internet of Vehicles. In the process of data transfer, privacy protection or information security plays a vital role.

The main work of personal information protection includes the personal information protection mechanism and related technical requirements of Internet of Vehicles (intelligent connected cars) users, clarifying the scenarios, rules, and technical methods for user sensitive data and personal information protection , including anonymization, de-identification, and data de-identification. sensitivity, abnormal behavior identification and other standards.

5.2 Main scenes

Personal security risk level - (Scenario 1)

Internet of Vehicles data is very specific and contains a large amount of personal information belonging to car owners or passengers.

Vehicle trajectory data, vehicle phone data, image data from in-car cameras, voice interaction data from vehicle-mounted intelligent robots, vehicle-mounted payment data, etc. The leakage of these data will lead to personal privacy security risks.

At the same time, because the Internet of Vehicles has remote control functions, once the vehicle control data is intercepted and tampered with, it will directly affect the driver's driving safety. The information systems of many automobile manufacturers, such as Mercedes-Benz and BMW, have been compromised by hackers. The Cherokee information security incident in 2015 resulted in the recall of 1.4 million vehicles, and became a landmark event that changed the safety design specifications of the automotive industry.

Social security risk level - (Scenario 2)

With the gradual integration of the Internet of Vehicles and intelligent transportation, the data security of the Internet of Vehicles will further penetrate into the social level. For example, the intelligent linkage of vehicle data and traffic lights is the most common application of the Internet of Vehicles in intelligent transportation. If a vehicle maliciously sends false information to the intelligent transportation system, it will cause large-scale traffic jams. If such attack points are widely distributed and time-intensive, it will cause the entire urban transportation system to be paralyzed, thus bringing great social security risks.

National security risk level - (Scenario 3)

There are at least three types of data in the Internet of Vehicles that are potentially relevant to national security:

The first category is latitude and longitude data . Intelligent network-connected vehicle equipment will collect latitude and longitude data while the vehicle is driving. When this data is gathered to a certain level, it will have map mapping capabilities. Once leaked, it will pose a potential threat to national security.

The second category is image data from vehicle cameras . Intelligent connected vehicles are gradually equipped with data collected by 360-degree cameras, which will make the security and confidentiality work of confidential areas and units more challenging.

The third category is vehicle remote control data. Once this data is used by criminals, remotely controlled intelligent network-connected vehicles may become a tool to commit crimes.

Ensuring the data security of the Internet of Vehicles is not only a need to protect personal privacy, but also a need to maintain social and national security. Due to the inherent application diversity, technology diversity, process complexity and other characteristics of the Internet of Vehicles system, ensuring the security of Internet of Vehicles data is not a single-link, single-role, purely technical task, but requires an understanding of its ecosystem. On the other hand, joint efforts will be made from various aspects such as technical support and system construction.

5.3 Main content

On February 10, 2020, the National Development and Reform Commission, the Ministry of Cybersecurity and Information Technology, the Ministry of Science and Technology, the Ministry of Industry and Information Technology, the Ministry of Public Security, the Ministry of Finance, the Ministry of Natural Resources, the Ministry of Housing and Urban-Rural Development, the Ministry of Transport, the Ministry of Commerce, and the State Administration of Municipal Supervision The departments jointly released the "Smart Vehicle Innovation and Development Strategy", proposing that " by 2025 , the technological innovation, industrial ecology, infrastructure, regulations and standards, product supervision and network security systems for China's standard smart vehicles will be basically formed . To achieve conditional autonomous driving Smart cars have reached large-scale production, and highly autonomous smart cars have been marketed in specific environments." The construction of the Internet of Vehicles is in progress, and there are currently a lot of gaps in relevant standards and systems that need to be filled by those who are interested.

For standards, please refer to the "Internet of Vehicles (Intelligent Connected Vehicles) Network Security Standards".

6. Technical Specifications for Financial Applications of Multi-Party Secure Computing_Aspects of Practical Applications of Secure Multi-party Computing

For specific details, please read the financial industry standard " Technical Specifications for Multi-Party Secure Computing Financial Applications"

6.1 General requirements

The overall requirements for MPC financial applications include three parts: basic requirements, security requirements and performance requirements.

  • Basic requirements: including data input, algorithm input, collaborative calculation, result output and scheduling management requirements, respectively, mainly for data providers, algorithm providers, calculation parties, result users, and schedulers.
  • Security requirements: including protocol security, privacy data security, authentication and authorization, password security, communication security, certificate storage and logs, etc.
  • Performance requirements: Performance index requirements such as calculation delay, throughput, and calculation accuracy are proposed for MPC financial applications.

6.2 Security parameter Security parameter

Security parameters: A set of parameters used to measure the security strength or cracking difficulty of a multi-party secure computing protocol.

MPC security parameters mainly include dishonesty thresholds, statistical security parameters, and computational security parameters.

  • The dishonesty threshold is the maximum value of the dishonest participants allowed to collude in the multi-party secure computing protocol. When the value is less than half of the number of participants, the protocol is called an honest majority; otherwise, the protocol is called a dishonest majority;
  • The statistical safety parameter is an integer l. The probability distribution of the calculation factor generated based on the input data is statistically indistinguishable from the probability distribution of the calculation factor randomly simulated without knowing the input data (the statistical distance is not higher than 2−1);
  • The computational security parameter is an integer k, which means that the computational complexity of a polynomial time attacker to crack the multi-party secure calculation protocol is O(2�).

Computation factor: Data generated based on multi-party secure calculation of input data.

Including input factors, output factors and intermediate factors: input factors refer to the data that can be used by the calculation party to perform subsequent calculations after the data provider performs the data input process; the output factors refer to the data that is returned to the result user for recovery after the calculation party executes the calculation. The data of the final calculation result; the intermediate factor refers to the data generated during the intermediate calculation process of the calculating party.

6.3 The performance of secure multi-party computing has met the needs of practical applications

Although secure multi-party computation can theoretically directly use a universal framework to solve any computing problem, directly converting any computation into circuit-level secure computation has the problem of low efficiency and cannot meet the needs of practical applications. Therefore, in the past few decades, researchers have done a lot of work in pushing secure multi-party computation into practical applications.

In general, in some specific calculations or applications, the performance of secure multi-party computation is currently sufficient to meet the needs of practical applications.

For example, the secure multi-party computing protocol designed by the researchers allows 10 data owners to complete linear regression calculations within 5 seconds in a local area network environment when each party has 4 million data points; it allows 4 data owners to complete linear regression calculations within 5 seconds. Based on the MNIST data set (represented as a matrix of 1000 rows and 784 columns), a logistic regression (logistic regressior) model was trained within 5 days in a wide area network environment.

Optimization and practice of obfuscated circuits In the past few decades, researchers have done a lot of work to optimize obfuscated circuit protocols and push obfuscated circuit protocols into practical applications. Many optimization technologies for confusion circuits have been proposed, such as free XOR technology that eliminates XOR gate calculations and row reduction technology that reduces the number of encrypted truth table ciphertexts from 4 to 2 or 3. Based on these optimization technologies, researchers have also developed some preliminary advanced software frameworks that support obfuscated circuit secure multi-party computation, such as OlivA written in Java and Obliv-C written in C. Based on the obfuscation circuits implemented by these software frameworks, secure multi-party computation has achieved quite good performance in some applications. For example, ObliG allows two parties to solve the Hamming distance between two vectors with a dimension of 160 in 507 milliseconds, complete the multiplication of two 64-bit integers in 6.29 milliseconds, and do Count Min Sketch in 20.77 seconds. calculate. In addition, some scholars have considered designing and implementing high-compression obfuscation circuit construction solutions based on hardware description languages, constructing obfuscation circuits directly from the perspective of underlying hardware, greatly reducing the size of obfuscation circuits, and obtaining high-compression obfuscation circuits, which can greatly improve Reduce the communication overhead and overall runtime of secure multi-party computation protocols using obfuscated circuits. For example, researchers found that building obfuscated circuits based on hardware description languages ​​can reduce the number of circuit gates required for 1024-bit integer multiplication by 67%.

Based on the protocol design and practice of hybrid technologies, secure multi-party computation based on obfuscation circuits has advantages in logical calculations (such as comparing sizes), while secure multi-party computation based on secret sharing has advantages in arithmetic calculations (addition and multiplication).

Based on this observation, researchers in recent years have proposed the design of hybrid secure multi-party computing protocols that integrate obfuscated circuits and secret sharing technology. The basic idea is to perform calculation splitting on a function calculation. The addition and multiplication operations are safely implemented using secret sharing-based technology, and other calculations are implemented securely using obfuscation circuits. At the same time, the input/output compatibility between secret sharing and obfuscation circuits is designed. By adopting a gender-specific format conversion method, the advantages of obfuscated circuits and secret sharing technologies can be fully utilized to design an efficient hybrid secure multi-party computing protocol. Some scholars have designed Minions, a hybrid secure multi-party computing protocol for neural networks based on secret sharing and obfuscated circuit technology, which can complete the calculation of a neural network with two hidden layers (128 neurons per layer) in 0.2 seconds. In addition, since additively homomorphic encryption technology can also efficiently support addition and multiplication, some scholars have combined additively homomorphic encryption technology with obfuscated circuit technology to design an efficient hybrid secure multi-party computing protocol. For example, in 2018, some scholars implemented a spam filter based on the Naive Bayes classifier, which can complete the classification and filtering of an email in 300 microseconds.

Optimization and practice for nonlinear function calculations In some applications, calculations containing some nonlinear functions are often encountered, such as logarithmic functions, exponential functions, etc. The safe calculation of these nonlinear functions has not yet been solved efficiently and is still a challenge. In recent years, some scholars have proposed using the mathematical idea of ​​approximate fitting to effectively perform nonlinear functions through the mathematical technology of piecewise low-order polynomial approximation. approximate fit. In this way, the safe calculation problem of nonlinear functions can be effectively transformed into the safe calculation problem of low-order polynomials. Since the safe calculation of low-order polynomials can be achieved by efficient addition and multiplication operations, we can efficiently implement the safe calculation of nonlinear functions. For example, in 2018, some scholars used 14-segment order polynomials or 6-segment second-order polynomials to approximate the Tanh function to achieve secure multi-party calculations for neural networks.

Large-scale secure multi-party computation based on cloud server assistance

Secure multi-party computation requires parties to interact with each other, so scalability is a major challenge for secure multi-party computation. In applications involving a large number of users, such as crowd sensing, recommendation systems, machine learning model training, etc., the number of participants may be hundreds, thousands, or even tens of thousands. Directly performing secure multi-party computation in these large-scale computing scenarios is unrealistic. For this reason, in recent years, some researchers have proposed the idea of ​​outsourced secure multi-party computing based on cloud server assistance, transforming large-scale secure multi-party computing problems into secure multi-party computing problems between multiple cloud servers. In the system framework of these works, it is usually assumed that there are two cloud servers. The user sends encrypted data (encrypted based on homomorphic encryption technology or secret sharing technology) to the cloud server, and then a security protocol is run between the two cloud servers to obtain output. For example, in 2017, some scholars designed and implemented a privacy-preserving linear regression model training system based on this framework, which could complete model training based on millions of user data in 233 seconds. In 2018, some scholars designed and implemented a privacy-preserving truth discovery system for crowd-sensing, which can mine real data from unreliable perception data of more than 100 entities by more than 200 users in 36 seconds.

Looking forward to decades of development, secure multi-party computation has made encouraging progress in practical applications, and in some application scenarios it has even been able to achieve performance that meets actual needs. With the rapid development of various technologies such as cloud computing, artificial intelligence, and the Internet of Things, and the increasing awareness of people's privacy protection, secure multi-party computing will have a broader application in the future. Currently, secure multi-party computation can achieve good performance under the semi-honest adversary model. However, in order to achieve security under the malicious adversary model, secure multi-party computation requires a large performance price. Therefore, the efficiency issue under the malicious adversary model is a major challenge for secure multi-party computation. This year, some scholars proposed secure multi-party computation under the public verifiable covert model. The main idea is that all actions of each participant are automatically accompanied by a signature-like mechanism for other participants to store evidence. If the system If there is a certain perpetrator, then other participants can detect the malicious behavior with a certain probability and make the behavior and signature public, causing the perpetrator to suffer reputational losses. Due to the importance of reputation, setting this probability at around 50% is enough to prevent rational people from considering doing evil. Secure multi-party computation under the publicly verifiable model can achieve performance close to that under the semi-honest behavior model.

Guess you like

Origin blog.csdn.net/ab6326795/article/details/134737446