Chapter 4 Core Issues of Distributed Systems

The issue of consistency
1. Definition and Importance
Definition: For multiple service nodes in a distributed system, given a series of operations, under the protection of the agreed agreement, try to make them achieve a "some degree" of agreement on the processing results.
Note: Consistency does not mean whether the result is correct or not, but whether the state presented by the system to the outside world is consistent or not; for example, the failure status of all nodes is also consistent.
2. Issues and challenges
Network communication between nodes is unreliable, including message delay, out-of-order, and content errors; the processing time of nodes cannot be guaranteed, and errors may occur as a result, and even the nodes themselves may crash; synchronous calls can simplify the design, but will Seriously reduce the scalability of distributed systems.
3. Conformance Requirements
Terminability: Consistent results can be completed in a limited time
Similarity: The final decision result of different nodes is the same
Legitimacy: The result of the decision must be a proposal made by a node
4. Constrained Consistency
sequential consistency
Linear consistency
 
2. Consensus Algorithm
Consistency often refers to the state of data presented externally by multiple replicas in a distributed system. As mentioned above, sequential consistency and linear consistency describe the ability of multiple nodes to maintain the data state. Consensus describes the process by which multiple nodes in a distributed system reach a consensus on a certain state. Consistency thus describes the resulting state, and consensus is a means.
 
1. Issues and challenges
For example, the communication network will be interrupted, the nodes will fail, and even malicious nodes will deliberately forge messages to destroy the normal workflow of the system.
Non-Byzantine Errors: Also known as "failure errors", i.e. a situation where a failure (not responding) occurs but information is not falsified
Byzantine Errors: The Case of Malicious Responses with Falsified Information
2. Common Algorithms
Consensus algorithms can be divided into Crash Fault Tolerance (CFT) type algorithms and Byzantine Fault Tolerance (BFT) type algorithms according to whether it solves a non-Byzantine common error situation or a Byzantine error situation.
3. Theoretical Limits
Even in the case of reliable network communication, the consensus problem of scalable distributed systems, the theoretical lower bound of its general solution is - no lower bound (no solution)
 
3. FLP Impossibility Principle
1. Definitions
FLP Impossibility Principle: In a minimal asynchronous model system where the network is reliable, but allows node failure (even if there is only one), there is no deterministic consensus algorithm that can solve the consistency problem (No completely asynchronous consensus protocol can tolerate even a single). unannounced process death).
2. Correct understanding
The FLP principle actually states that a purely asynchronous system cannot ensure that consistency is completed in a finite amount of time when nodes are allowed to fail.
 
Four, CAP principle
1. Definitions
CAP principle: It is impossible for a distributed computing system to ensure the following three characteristics at the same time: consistency (Consistency), availability (Availability) and partition tolerance (Partition), and the design often needs to weaken the guarantee of a certain characteristic.
2. Application scenarios
Since the three characteristics of CAP cannot be guaranteed at the same time, the support for a certain characteristic must be weakened when designing the system. Then the following three application scenarios may appear.
Weakening consistency
Weak availability
Weakening Partition Tolerance
 
5. ACID Principles
The ACID principle describes the consistency requirements that a distributed database needs to meet, while allowing for the cost of availability.
ACID characteristics are as follows:
Atomicity: Each operation is atomic, either successful or not executed;
Consistency: The state of the database is consistent and there is no intermediate state;
Isolation: Various operations do not affect each other;
Durability: Changes in state are durable and will not fail.
A principle opposite to ACID is the BASE (Basic Availability, Soft-state, Eventual Consistency) principle, which sacrifices constraints on consistency (but achieves eventual consistency) in exchange for a certain availability.
 
Six, Paxos algorithm and Raft algorithm
1. Paxos algorithm
Single proposer + multiple acceptors: only one node is the proposer, other nodes vote, either achieve or fail.
Multiple proposers + single acceptor: Limit a node as acceptor. In this case, consensus is also easy to reach. The recipient receives multiple proposals, selects the first proposal as the resolution, and sends it to other proposers.
Multiple proposers + multiple acceptors:
Two-phase commit: prepare phase and commit phase
2. Raft algorithm
The Raft algorithm includes three roles: Leader (leader), Candidate (leader candidate) and Follower (follower). Before making a decision, a global leader is elected to simplify the subsequent decision-making process. The Leader role is very critical and determines the submission of the log (log). Logs can only be replicated unidirectionally from the leader to the follower.
 
7. Byzantine Problems and Algorithms
The difficulty of the Byzantine problem is that there may be multiple proposals in the system at any time (because the cost of proposals is very low), and it is very difficult to complete the final consistency confirmation process, which is easily disturbed.
In the design of Bitcoin's blockchain network, an innovative PoW (Proof of Work) probability algorithm idea was proposed, and improvements were made for these two links.
First, limit the number of proposals that appear in the entire network over a period of time (by increasing the cost of proposals); second, relax the need for final consistency confirmation, agreeing that everyone confirms and expands along the longest known chain. The final confirmation of the system is existence in a probabilistic sense. In this way, even if someone tries to maliciously damage, there will be a corresponding economic cost (more than half of the computing power of the overall system).
 
8. Reliability Index
1. Several 9 indicators
2. Two core times, Mean Time Between Failures and Mean Time to Repair
3. Improve reliability
The reliability of relying on a single-point implementation is, after all, limited. If you want to improve further, you have to eliminate the single point, and let multiple nodes collectively complete the original single point work through the master-slave, multi-active and other modes. This can improve the overall reliability of the service in a probabilistic sense, which is also an important use of distributed systems.
 
 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325295067&siteId=291194637