Use message queue several key problems that need attention

Message queues are used in project work, we need to pay attention to several key issues:

  • Order problem message
  • Duplication message
  • Transaction message

An order message

Refers to an ordered message may be transmitted according to the order of the messages consumption. For example: an order had three messages, namely, order creation, order payment, the order is completed. When consumption, according to the order of consumption makes sense. At the same time between multiple orders is parallel consumption. First look at the following example:

If the producers had two messages: M1, M2, to ensure that the order of these two messages, what should be done? You might think of the brain like this:

 
 

This ensures that M1 M2 prior to arrival MQServer (producers wait before sending sent successfully M1 M2), according to achieve principles of the first to be consumed, M1 will first be consumed in M2, thus ensuring the order of the messages.

This model is only theoretically possible to guarantee the order message, you may experience the following problems in the actual scene:

As long as the message sent from one server to another server, network latency problems will exist. As shown above, if the transmission is greater than the time-consuming transmission Processed M2 M1, then M2 will remain the first consumer still can not guarantee the order of the messages. Even while M1 and M2 reach the consumer side, due to the load 2 is not clear consumer side 1 and the consumer side, it is still possible before the situation M2 is M1 consumption occurs.

How then to solve this problem? M1 and M2 will be sent to the same consumer, and after sending the M1, the consumer needs to end after a successful response to send M2.

Smart you might have to think of another question: If M1 is sent to the end consumer, the consumer side 1 does not respond, it is M2 continue to send it, or resend the M1? In order to ensure a certain message is a general consumer, will choose M1 retransmitted to another consumer terminal 2, it is shown below.

Such a model would ensure strict ordering of messages, careful you will still find problems, there are two cases when the consumer end 1 does not respond Server, one is M1 does not reach the (data loss in network transmission), another consumer M1 and the end consumer has been the response message has been sent, but MQ Server does not receive. If the second case, retransmission M1, M1 will result in the duplication of spending. It introduces the second question we have to say, the message duplication, will elaborate on this later.

Looking back message sequence and strictly in the order message is easily understood, also be processed by a simple manner as described herein. To sum up, in order to achieve strict message simple and feasible way is:

Guarantee 生产者 - MQServer - 消费者is one-to-one relationship

Although this design is simple, but it also has some very serious problems, such as:

  1. The degree of parallelism will become a bottleneck message system (not certain)
  2. More exception handling, such as: consumer end as long as the problem occurs, it will cause the entire process flow obstruction, we have to spend more energy to solve the problem of clogging.

From this point of view the issue the order message, we can draw two conclusions:

  1. The application does not concern the actual disorder abound
  2. Message Queue disorder does not mean disorder

Therefore, from the operational level to ensure the order of the messages and not just rely on the messaging system, and finally we analyze how to achieve RocketMQ order to send a message from the source point of view. RocketMQ determining a queue to which the message is sent (load balancing policy) by way of polling all queues. Such as the following example, the same message order number are successively sent to the same queue:

// RocketMQ通过MessageQueueSelector中实现的算法来确定消息发送到哪一个队列上
// RocketMQ默认提供了两种MessageQueueSelector实现:随机/Hash
// 当然你可以根据业务实现自己的MessageQueueSelector来决定消息按照何种策略发送到消息队列中
SendResult sendResult = producer.send(msg, new MessageQueueSelector() {
    @Override
    public MessageQueue select(List<MessageQueue> mqs, Message msg, Object arg) {
        Integer id = (Integer) arg;
        int index = id % mqs.size();
        return mqs.get(index);
    }
}, orderId);

In obtaining the routing information in the future, based on MessageQueueSelectorselecting a queue algorithm, access to the same OrderId certainly with a queue.

private SendResult send()  {
    // 获取topic路由信息
    TopicPublishInfo topicPublishInfo = this.tryToFindTopicPublishInfo(msg.getTopic());
    if (topicPublishInfo != null && topicPublishInfo.ok()) {
        MessageQueue mq = null;
        // 根据我们的算法,选择一个发送队列
        // 这里的arg = orderId
        mq = selector.select(topicPublishInfo.getMessageQueueList(), msg, arg);
        if (mq != null) {
            return this.sendKernelImpl(msg, mq, communicationMode, sendCallback, timeout);
        }
    }
}

Second, duplicate message

In solving the above problem message sequence, it introduces a new problem, that is the message repeated. So RocketMQ message repetition is how to solve the problem? Or "just" is not solved.

The root cause of the message is repeated: network is unreachable. As long as the exchange of data through the network, you can not avoid this problem. So the solution to this problem is to bypass this problem. The question then becomes: if the consumer side receive two different messages, what they should do?

  1. Consumer end retaining service logic processing idempotent message
  2. Ensure that each message has a unique number and message processing to ensure the success of the log to re-appear at the same time table

Article 1 well understood, just keep idempotency, no matter how many duplicate messages, but it finally treated the same. Article 2 principle is to use a log table to record the ID has successfully processed the message, if newly arrived message ID is already in the log table, it will no longer deal with this message.

Article 1 solution, it is clear that it should achieve in the consumer end, not part of the message system functions to be implemented. Article 2 may be implemented messaging system, the business end can be achieved. Duplicate Messages probability under normal circumstances is actually very small, if implemented by the messaging system, it will certainly be highly available messaging system throughput and influence, so best to deal with their own messages from the business end of duplicate questions, and this is the reason RocketMQ not solve the problem of duplicate messages.

RocketMQ is no guarantee that the message does not repeat, if your business does not need to be rigorously duplicate messages, you need to go heavy on the business side.

Third, the transaction message

RocketMQ In addition to supporting a common message, the message sequence, in addition to support transactional messages. First, we discuss what is the necessity of transaction messages and transactional messages. We have a transfer of scenario as an example to illustrate the problem: Bob Smith transfers to 100.

In stand-alone environment, the implementation of the transaction, probably looks like this:

Transfer transaction schematic stand-alone environment

When the user growth to a certain extent, Bob Smith and account balance information and no longer on the same server, then the above process becomes this:

Transfer transaction schematic cluster environment

This time you will find, is also a transfer of business, in a clustered environment, time-consuming actually grow exponentially, which is obviously not able to accept. How to avoid that problem?

Large transactions small transactions + = asynchronous

Breaking large transactions into multiple small transactions executed asynchronously. So basically able to execute transactions efficiently across machines optimized to be consistent with the single. Transfer transactions can be broken down into the following two small transactions:

Small transactions asynchronous messaging +


Figure executing the local transaction (Bob debit account) and send asynchronous messages at the same time should ensure success or failure at the same time, that is, charge a success, sending a message must succeed, if the charge fails, you can not send a message. That question is: is to charge it or to send a message?

First, look at the situation before sending the message, a schematic substantially as follows:

Transaction messages: first to send a message

There is a problem: if the message is sent successfully, but failed charge, the consumer end will consume the message, and then add money to Smith account.

First message does not work, then charge it first, rough diagram is as follows:

Transaction messages - first charge

Similar problems with the above: If the charge is successful, sending a message fails, there will be Bob deducting money, but Smith did not add money to the account.

We may have a lot of ways to solve this problem, such as: a message directly into Bob debit transaction to go, if the transmission fails, an exception is thrown, the transaction is rolled back. This approach is also consistent with the principle of "just" do not need to solve.

It should explain: If you use Spring to manage things, then they can send messages into local logical thing to go, sending the message failed to throw an exception, Spring rolls back after this thing to catch an exception, in order to ensure atomicity local things and sending the message.

RocketMQ support transaction message, let's look at how to achieve RocketMQ.

RocketMQ realize sending transaction message

RocketMQ first stage is sent Prepared消息, the message will get the address of the second phase of the implementation of local things, the third stage to the first stage to get the message through to access the address, and modify the status of a message.

Careful you may well find that the problem, if confirmation message fails how to do? RocketMQ periodically scan messages in the message cluster of things, if found Prepared消息, it will be a message to the sender (producer) confirm, Bob's money in the end is cut or did not cut it? If the reduction is to roll back or continue it sends a confirmation message? RocketMQ will be to decide whether to roll back or continue to send a confirmation message according to policies set by the sender. This ensures that messages sent with the local transaction succeed or fail.

Then we look at RocketMQ source, it is how to handle the affairs of the message. Part of the client sends a transaction message (Please see the complete code: rocketmq-exampleunder construction com.alibaba.rocketmq.example.transaction.TransactionProducer):

// =============================发送事务消息的一系列准备工作========================================
// 未决事务,MQ服务器回查客户端
// 也就是上文所说的,当RocketMQ发现`Prepared消息`时,会根据这个Listener实现的策略来决断事务
TransactionCheckListener transactionCheckListener = new TransactionCheckListenerImpl();
// 构造事务消息的生产者
TransactionMQProducer producer = new TransactionMQProducer("groupName");
// 设置事务决断处理类
producer.setTransactionCheckListener(transactionCheckListener);
// 本地事务的处理逻辑,相当于示例中检查Bob账户并扣钱的逻辑
TransactionExecuterImpl tranExecuter = new TransactionExecuterImpl();
producer.start()
// 构造MSG,省略构造参数
Message msg = new Message(......);
// 发送消息
SendResult sendResult = producer.sendMessageInTransaction(msg, tranExecuter, null);
producer.shutdown();
Next View sendMessageInTransaction source method, a total of three stages: Send Prepared messages , perform local affairs, sends a confirmation message.

endTransactionThe method of the request will be sent to the broker(mq server)final transaction to update the status message:

  1. According to sendResultlocate Prepared消息 , sendResultit contains the transaction ID of the message
  2. The localTransactionfinal state update message

If the endTransactionmethod fails, data is not sent to the brokerresulting state of affairs news update fails, brokerthere will be back to check the thread timer (default 1 minute) scan table file for each transaction state storage, the message is already committed or rolled back if the direct jump However, if a prepared状态will to Producerinitiate CheckTransactionthe request, Producercalls the DefaultMQProducerImpl.checkTransactionState()method to handle brokerthe timing of the callback requests and checkTransactionStatecalls the decision to set the method of our business to decide whether to continue or roll back the transaction, the last call endTransactionOnewayto make brokerfinal status update message.

Then transfer back to the example, if Bob's account balance has been reduced, and the message has been sent successfully, Smith began to consume end this message, this time there will be two timeouts consumption and consumer issues fail to address the issue of overtime idea is It has been retried until the consumer side consume messages, and the entire process possible duplication of messages will appear, according to previous thinking to solve.

Consumer Affairs news

So basically you can solve the consumer side timeout problem, but how if the consumer fails to do? Ali provide our solution is: artificial resolved. We can consider, in accordance with the process of the transaction, plus money for some reason Smith fails, then you need to roll back the whole process. If the message system to roll back the process to achieve this, the system will greatly enhance the complexity, and it is prone Bug, Bug estimate the probability of the emergence of the consumer will be much larger than the probability of failure. Need a lot of time and this is RocketMQ reason currently no solution to this problem, in the design and implementation of information systems, we need to measure whether it is worthwhile to spend such a high price to solve such an occurrence probability is very small problem, which is that everyone in solving difficult problems Thinking place.

Four, Producer how to send a message

ProducerAll polling queues at a certain topic way to achieve load balancing of the sender, as shown below:

RocketMQ first analyze the client sends a message source:

// 构造Producer
DefaultMQProducer producer = new DefaultMQProducer("ProducerGroupName");
// 初始化Producer,整个应用生命周期内,只需要初始化1次
producer.start();
// 构造Message
Message msg = new Message("TopicTest1",// topic
                        "TagA",// tag:给消息打标签,用于区分一类消息,可为null
                        "OrderID188",// key:自定义Key,可以用于去重,可为null
                        ("Hello MetaQ").getBytes());// body:消息内容
// 发送消息并返回结果
SendResult sendResult = producer.send(msg);
// 清理资源,关闭网络连接,注销自己
producer.shutdown();

Throughout the application lifecycle, producers need to call a start method to initialize, initialize the main tasks are:

  1. If you do not specify namesrvan address, the address will be automatically
  2. Start a scheduled task: Update namesrv address, update routing information from the topic namsrv, clean up already hang broker, send a heartbeat to all broker ...
  3. Start load balancing service

After initialization, starts transmitting the message, the main code transmitted message is as follows:

private SendResult sendDefaultImpl(Message msg,......) {
    // 检查Producer的状态是否是RUNNING
    this.makeSureStateOK();
    // 检查msg是否合法:是否为null、topic,body是否为空、body是否超长
    Validators.checkMessage(msg, this.defaultMQProducer);
    // 获取topic路由信息
    TopicPublishInfo topicPublishInfo = this.tryToFindTopicPublishInfo(msg.getTopic());
    // 从路由信息中选择一个消息队列
    MessageQueue mq = topicPublishInfo.selectOneMessageQueue(lastBrokerName);
    // 将消息发送到该队列上去
    sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, timeout);
}

Code two methods require attention tryToFindTopicPublishInfoand selectOneMessageQueue. As mentioned in the initialization producer, will start scheduled tasks for routing information and updates to the local cache, it tryToFindTopicPublishInfowill be the first topic to obtain routing information from the cache, if no response is received, it will go its own namesrvrouting information. selectOneMessageQueuePolling method, the return queue, in order to achieve load balancing purposes.

If the Producer fails to send a message, it will automatically retry, retry strategy:

  1. Retries <retryTimesWhenSendFailed (configurable)
  2. Processed Total (comprising retry n times Processed) <(the parameters passed when sending messages) sendMsgTimeout
  3. At the same time satisfying the above two conditions, Producer will choose to send a message to another queue

===============

About transaction message, there are other solutions to reprint another article :

 

Speaking of distributed transactions, will talk about the classic "account transfer" problem: two accounts, distributed in two different DB, or two different subsystems inside, A to deducting money, B to add money, how to ensure atomicity?

General ideas are implemented by messaging middleware "eventual consistency": A system of deducting money, then Clockwork message received this message to the middleware, B system for more money.

But inside there is a problem: A is the first update DB, after sending a message it? It would first send a message after the update DB?

Assuming that the first update DB success, sending a message to a network failure, retransmission and fail, how do? 
Assuming that the success of the first to send a message, update DB failure. Message has been sent out, and they can not withdraw, how do?

So, here's the conclusion: Just send messages and update DB these two operations are not atomic, regardless of who should and who are problematic.

Then how to solve this problem? ?

Wrong program 0

Some people may think, I can "send a message" This network calls and update DB on the inside with a transaction, if the failure to send a message, update DB automatic rollback. This is not to guarantee the two atomic operations yet?

The program seems right, in fact, is wrong, there are two reasons:

(1) 2 general network problem: failure to send a message, the sender does not know is the messaging middleware really do not receive the message? Or the message has been received, simply return the response time to fail?

If a message has been received, and the sender does not receive think, rollback operation update db's. A will result in account money is not deducted, B account of the money added.

(2) The network calls on DB transaction which may be because of network latency, resulting in long transaction DB. Serious, will block the entire DB. This is very risky.

Based on the above analysis, we know that this program is actually wrong!

Scheme 1 - side to achieve their own business

Assume messaging middleware does not provide a "transaction message" function, for example, you are using a Kafka. How then to solve this problem?

Solutions are as follows: 
(. 1) Producer prepare a message table end, and the insert message Update DB these two operations, in which a transaction DB.

(2) preparing a daemon, a steady stream of the message transfer messages to messaging middleware of the table. Failed, continue to retry retransmission. It allows repeated message, but the message is not lost, the order will not be disturbed.

(3) Consumer end of a sentence to prepare heavy table. Processed message, re-recorded in the judgment table inside. Realize the power of business and so on. But here also involves an atomic question: if the message is to ensure that consumer + insert message sentenced to heavy table atom of these two operations?

Consumer success, but insert sentenced to heavy table fails, how do? On this, in Kafka's source code analysis series, the first one, exactly once question, there have been discussions.

By the above three steps, we solved the basic problem of atomic update db send network messages and these two operations herein.

However, a disadvantage of this solution is this: the need to design DB message tables, but also need a background task, constantly scanning the local news. Cause additional processing and business logic coupled to message traffic burden side.

Program 2 - RocketMQ transaction message

In order to solve this problem, without coupling and business, RocketMQ put forward the concept of "transaction message" is.

Specifically, it is sending a message into two stages: Prepare phase and validation phase.

Specifically, the above two steps, is decomposed into three steps: 
(1) Prepared message transmission 
(2) update DB 
(. 3) The update DB success or failure results, Confirm message or a cancel Prepared.

Some may ask, successful implementation of the first two steps, the last step fails how to do? The key point here is related to the RocketMQ: RocketMQ periodically (default is 1 minute) Prepared scan all the messages, ask the sender to confirm this in the end is the message sent out? This entry or cancel the message?

That is the definition of a checkListener, RocketMQ will call back Listener, in order to achieve the above mentioned program.

Summary: comparison of Option 2 and Option 1, RocketMQ biggest change, in fact, the "Scan the message table" this thing, do not let the business side, but helped to do the messaging middleware.

As for the message table, but it is still not save. Because the messaging middleware to ask the sender, things whether they were successful or need a "disguised form of local news sheet" to record the execution state of things.

Human intervention

Some people may have to say, regardless of Option 1, Option 2 or send messages successfully put an end to the queue, but the consumer end consumer failure how to do?

Consumer failed, Retry, has also been a failure how to do? It is not to automatically roll back the entire process?

The answer is manual intervention. From the point of view of engineering practice, this entire process automatic rollback cost is enormous, not only to achieve complex, but also introduce new problems. For example, automatic rollback failed, and how to deal with?

This corresponds to a very low probability of the case, to take manual processing, than to achieve a highly complex automated system rollback, more reliable, and easier.

 

Published 72 original articles · won praise 7 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_39399966/article/details/103382111