Two-phase commit of distributed transactions

Distributed CAP review

  • Consistency: There are often multiple copies of data in a distributed system, and consistency describes the consistency of the content and organization of the data in these copies
  • Availability: Describes the service capability of the system to the user. The so-called availability refers to the return of the user's desired result within the time frame that the user can tolerate
  • Partition Tolerance: Distributed systems are usually composed of multiple nodes. Since the network is unreliable, there is a possibility that nodes in a distributed cluster will be isolated into small clusters due to network communication failures, that is, network partitions. , Partition fault tolerance requires that the system can still provide consistent and available services when network partitions occur

Tow Phase Commit 2pc

Fundamental

The database is sharded, and transactions need to be committed at the same time to ensure simultaneous success or failure. A third party is required for transaction management, which is Transaction Manager, or TM for short. The other two or more databases are Resource Manager, or RM for short.

  • The first stage of submission: TM notifies RM to pre-commit, RM operates SQL, undo / redo write log, but will not submit, then RM will return the result to TM

  • Second stage commit: Assuming that the results returned by RM are all successful, TM informs RM to officially commit the transaction and release resources, and then returns the commit result to TM; as long as one of them fails, it is naturally rollback

Disadvantage

Single point of failure: Distributed transactions depend on TM. As long as it hangs, all distributed transactions that require TM will not be submitted.

Synchronous blocking: from the first stage to the second stage, resources are locked

Data inconsistency: Assuming that in the second stage, a certain RM does not receive a transaction commit/rollback notification due to a network failure, and it will keep occupying resources

optimization

  • Timeout mechanism
    • TM: If all RM responses are not received within the specified time, exit waiting and send a rollback message
    • RM: If the TM Phase 2 message is not received at the specified time, it is actually not possible to roll back, because if it is submitted, then the data will be inconsistent
  • Ask in advance, which is equivalent to three-stage submission

Three Phase Commit 3pc

In fact, it is equivalent to sending a message to all RMs for inquiries before the two-phase submission, to see if they can receive a normal response, if not, then the subsequent operations can be skipped. Coupled with the timeout mechanism, it has been possible to minimize the problems caused by the shortcomings of the two-stage

However, we cannot guarantee 100% that there will be no problems just after asking, so the problems still exist

Also, in fact, we only use the two-stage model many times. The reason is, of course, the efficiency problem; and it is to use a ready-made framework instead of implementing these two-stage and three-stage models by ourselves, such as seata, tx-lcn, tcc Etc. Of course, you can also use message queues such as rocket mq to achieve

Guess you like

Origin blog.csdn.net/qq_38238041/article/details/111318307