Distributed Transaction Solutions for Frequently Asked Questions in Interviews

1 CAP theorem

1.1 Concept

CAP Theory in Distributed Systems

  • Consistency: Whether the data of multiple nodes in a distributed environment is strongly consistent

  • Availability: Distributed services can always be guaranteed to be available . When the user sends a request, the service can return the result within a limited time

  • Partition tolerance: specifically refers to tolerance to network partitions

For a shared data system, at most two of the CAPs can be owned at the same time, and there is no way to take care of all three.

  • Any combination of the two has its applicable scenarios

  • The real system should be a mixture of ACID and BASE

  • Different types of business can and should be treated differently

  • 其中,分区容忍性又是不可或缺的。

Conclusion: In a distributed system, the most important thing is to meet business needs, rather than pursuing abstract and absolute system characteristics.

2.2 Middleware instance

  • By default, AP is preferred, and C
    Cassandra, Dynamo, etc. are weakened

  • CP is preferred by default, weakening A
    HBase, MongoDB, etc.

2 BASE theory

main idea

Basically Available

When a distributed system fails, it allows the loss of partial availability to keep the core available.

Soft state (SoftState)

An intermediate state is allowed in a distributed system that does not affect the overall availability of the system.

Eventual Consistency

After a certain period of time, all replica data in the distributed system can finally reach a consistent state

Consistency Model Consistency models for data can be divided into the following three categories:

  • Strong consistency
    After the data is successfully updated, the data in all replicas at any time is consistent, and is generally implemented in a synchronous manner

  • Weak consistency
    After the data is successfully updated, the system does not promise to read the latest written value immediately, nor does it promise how long it will take to read

  • Final consistency: A form of weak consistency. After the data is updated successfully, the system does not promise to return the latest written value immediately, but guarantees that it will eventually return the value of the last update operation.

The strong consistency, weak consistency and final consistency of distributed system data can be analyzed by Quorum NRW algorithm.

As long as you talk about a distributed system, you must ask about distributed transactions. If you don’t know anything about distributed transactions, it’s really tricky. At least you need to know what solutions are available and how to do them in general. The advantages and disadvantages of each solution are What.

Now in the interview, the distributed system has become the standard configuration, and the distributed transaction brought by the distributed system has also become the standard configuration.
You must use the transaction to make the system. If you use the transaction, you must use the distributed system after the distributed system. Business.
Let’s not say whether you have done it before, at least you have to understand what kinds of plans are there, and what pitfalls may each plan have? For example, the network problem of the TCC scheme and the consistency problem of the XA scheme

  • Transactions in monolithic systems

  • Transactions in Distributed Systems

Several Solutions of Distributed Transaction

● Method of asynchronous proofreading of data
Alipay and WeChat Pay proactively query payment status and bill statement;
● Reliable message (MQ)-based solution
Asynchronous scenario; strong versatility; high scalability
TCC programmatic solution
carefully selected , Alibaba, and Ant Financial’s self-packaged DTX

3 XA scheme

That is, a two-phase commit transaction scheme.
Support from database vendors is required; JAVA components include atomikos, etc.

3.1 Program understanding

There is a transaction manager that coordinates transactions across multiple databases (resource managers)

  1. The transaction manager first asks whether each database is pre-committed ok?

  2. If each database replies ok, that is, the pre-commit is successful, the transaction is formally submitted, and the operation is started in each database. If there is a failure here, there will be failure exception retry, log analysis, and manual retry.

3.3 Applicable scenarios

It is more suitable for distributed transactions across multiple databases in single-block applications, and because it relies heavily on the database level to handle complex transactions, the efficiency is very low, and it is not suitable for high-concurrency scenarios.

If you want to play, you can do it based on Spring + JTA.

Internet companies basically don’t use it, because if there are operations across multiple databases in a certain system, it is not compliant! In today’s microservices, a large system is divided into dozens or even hundreds of services. It is generally stipulated that each service can only operate one corresponding database.

If you want to operate the libraries corresponding to other services, you are not allowed to directly connect to the libraries of other services.
To operate the library of other people's services, you must call the interface of other services

4 TCC scheme

4.1 Three stages

  • Try
    detects the resources of each service, and locks or reserves resources in advance

  • Confirm
    performs actual operations in each service

  • Cancel
    If there is an error in the execution of any service business method, then compensation is required here, that is, the rollback operation of the business logic that has been successfully operated

4.2 Case of inter-bank transfer

If the distributed transaction involving two banks is implemented with the TCC scheme, the idea is as follows:

  • In the Try stage
    , first freeze the funds in the two bank accounts and prevent operations

  • Confirm stage
    Execute the actual transfer operation, the funds of A bank account are deducted, and the funds of B bank account are increased

  • In the Cancel stage,
    if any bank operation fails, then it needs to be rolled back to compensate.
    For example, if the A bank account has been deducted, but the B bank account fails to increase the funds, then the A bank account funds must be added back.

To be honest, this solution is rarely used, but there are also usage scenarios.
Because the rollback of this transaction is actually heavily dependent on your own code to roll back and compensate, it will cause huge compensation code, which is very disgusting!

For example, for us, generally speaking, in money-related payment, transaction and other related scenarios, we will use TCC to strictly ensure that all distributed transactions are either successful or automatically rolled back, strictly ensuring the correctness of funds!

4.3 Applicable scenarios

Unless you really have too high a requirement for consistency, it is the core of the core scenario in the system! The
common scenario is the fund scenario, then you can use the TCC solution to write a lot of business logic by yourself, and judge whether each link in a transaction is ok, execute compensation/rollback code if not ok

And it's better if your business execution time is relatively short

But to be honest, generally try not to do this as much as possible. Writing the rollback logic or compensation logic by yourself is really disgusting, and the business code is also difficult to maintain.

4.4 Scheme Diagram

5 Local message table

Such a set of ideas created by ebay

5.1 Introduction

  • System A inserts a piece of data into the message table while operating in a local transaction

  • Then system A sends this message to MQ

  • After receiving the message, system B inserts a piece of data into its local message table in a transaction and performs other business operations at the same time. If the message has been processed, the transaction will be rolled back at this time, so as to ensure that no will repeat the message

  • After system B executes successfully, it will update the state of its own local message table and the state of system A message table

  • If system B fails to process, the status of the message table will not be updated, then system A will scan its own message table regularly at this time, and if there are unprocessed messages, it will be sent to MQ again for B to process again

5.2 Advantages

This scheme guarantees the final consistency.
Even if the B transaction fails, A will continue to resend the message until B succeeds.

5.3 Defects

The biggest problem is that it relies heavily on the message table of the database to manage transactions. This will make high concurrency scenarios weak and difficult to expand. Generally, it is rarely used.

  • Local message table scheme

6 Reliable Message Eventual Consistency Scheme

Simply don't use the local message table, and implement transactions directly based on MQ. For example, Ali's RocketMQ supports message transactions!

6.1 Introduction

  • System A first sends a prepared message to MQ. If the prepared message fails to be sent, the operation will be canceled directly and will not be executed.

  • If the message is successfully sent, then execute the local transaction, tell MQ to send a confirmation message if successful, and tell MQ to roll back the message if it fails

  • If a confirmation message is sent, then system B will receive the confirmation message at this time, and then execute the local transaction

  • MQ will automatically poll all the prepared messages at regular intervals to call back your interface, and ask you whether the local transaction processing of this message has failed. For all messages that have not been sent for confirmation, should you continue to retry or roll back?
    Here you can check the database to see if the previous local transaction was executed. If it is rolled back, then it should be rolled back here too. This is to avoid the possibility that the local transaction is executed successfully, and do not confirm that the message sending fails.

  • What if the system B's transaction fails?
    Try again, automatically keep retrying until it succeeds, if it doesn’t work, or roll back the important capital business, for example, after system B rolls back locally, find a way to notify system A to roll back; or send an alarm by Manual rollback and compensation

This is still more appropriate. At present, most domestic Internet companies are playing this way. Why don't you use RocketMQ to support it, or you can use something similar to ActiveMQ? RabbitMQ? Encapsulate a set of similar logic by yourself, in short, the idea is like this

  • Reliable Message Eventual Consistency Scheme

7 Best Effort Notification Scheme

7.1 Introduction

  • After the local transaction of system A is executed, it sends a message to MQ

  • There is a best-effort notification service dedicated to consuming MQ, which will consume MQ, then write it into the database and record it, or put it into the memory queue, and then call the interface of system B

  • If the execution of system B is successful, it is ok; if the execution of system B fails, then the best effort notification service will try to call system B again regularly, repeat N times, and finally give up if it still fails

  • Schematic diagram of best effort notification scheme

8 summary

I was really asked, so to speak, we use TCC to ensure strong consistency in a particularly strict scenario; then some other scenarios implement distributed transactions based on Ali’s RocketMQ~

If you are looking for a scenario with strict capital requirements, you can’t go wrong. You can say that you are using the TCC solution.
If it is a general distributed transaction scenario, after the order is inserted, you need to call the inventory service to update the inventory. The inventory data is not as sensitive as funds. You can use Reliable Message Eventual Consistency Scheme

Versions before Rocketmq 3.2.6 can follow the above ideas, but after that, some changes have been made to the interface, so I won’t repeat them here

In fact, if you use such a solution for any distributed transaction, your code will be 10 times more complicated. In many cases, system A calls system B, system C, and system D, and we may not do distributed transactions at all. If the call reports an error, the exception log will be printed.

There are only a few bugs per month. Many bugs are functional and experiential. They really involve some bugs at the data level. How many bugs a month, two or three? If you want to ensure that the system automatically guarantees that the data is 100% error-free, and dozens of distributed transactions are executed, the code is too complicated; the performance is too poor, and the system throughput and performance will drop significantly.

99% of the distributed interface calls, do not do distributed transactions, just monitor (send emails, send text messages), record logs (if an error occurs, complete logs), quickly locate, troubleshoot and find solutions, and repair data afterwards.
Every month, every few months, a small amount of data that caused errors due to code bugs will be manually repaired, and a temporary program will be written by yourself. Some data may need to be added, some data may need to be deleted, and some data may need to be Modify the value of some fields.

The cost is hundreds of times lower than you do 50 distributed transactions, and dozens of times lower

Trade off, trade off, when you want to use distributed transactions, there must be a cost, the code will be very complicated, the development will take a long time, the performance and throughput will drop, the system will be more complex and fragile, and it will be more prone to bugs; the advantage, if done well Yes, TCC and the reliable message eventual consistency scheme can guarantee 100% that your fast data will not go wrong.

1%, 0.1%, 0.01% of business, funds, transactions, orders, we will use distributed transaction solutions to guarantee, membership points, coupons, product information, in fact, don’t do it like this

Guess you like

Origin blog.csdn.net/l688899886/article/details/126610913