How does RocketMQ ensure that the news is not lost (hanging the interviewer)?

 

Interviewers often like to ask: How does Rocket MQ ensure that the news is not lost?

I want to say: this is it? I laughed.

1. The message sending process

Insert picture description here

We divide the message flow into the following three parts, each of which may lose data.

  • Production stage: Producer sends messages to Broker through the network. This transmission may be lost, such as unreachable network delays.
  • Storage stage: Broker must first put the message in the memory, and then persist it to the hard disk according to the flashing strategy. The message has just been received from the Producer, and then it is in the memory, but the message is down abnormally, causing the message to be lost.
  • Consumption stage: Consumption failure is actually a variant of message loss.

2. Producer production stage

The Producer sends the message to the Broker through the network. This transmission may be lost, such as unreachable network delay.

1. Solution one

1.1 Description

There are three send methods, synchronous sending, asynchronous sending, and one-way sending. We can send messages synchronously. When sending messages, we will synchronously block and wait for the results returned by the broker. If it is not successful, we will not receive SendResult. This is the most reliable. The second is asynchronous sending, and you can know whether the sending is successful in the callback method. One-way sending (OneWay) is the most unreliable sending method, and we cannot guarantee that the message is truly reachable.

1.2, source code

/**
 * {@link org.apache.rocketmq.client.producer.DefaultMQProducer}
 */

// 同步发送
public SendResult send(Message msg) throws MQClientException, RemotingException, 	 MQBrokerException, InterruptedException {}

// 异步发送,sendCallback作为回调
public void send(Message msg,SendCallback sendCallback) throws MQClientException, RemotingException, InterruptedException {}

// 单向发送,不关心发送结果,最不靠谱
public void sendOneway(Message msg) throws MQClientException, RemotingException, InterruptedException {}

2. Solution two

2.1. Description

If sending a message fails or times out, it will automatically retry. The default is to retry three times, which can be changed according to the api, for example to 10 times:

producer.setRetryTimesWhenSendFailed(10);

2.2, source code

/**
 * {@link org.apache.rocketmq.client.producer.DefaultMQProducer#sendDefaultImpl(Message, CommunicationMode, SendCallback, long)}
 */

// 自动重试次数,this.defaultMQProducer.getRetryTimesWhenSendFailed()默认为2,如果是同步发送,默认重试3次,否则重试1次
int timesTotal = communicationMode == CommunicationMode.SYNC ? 1 + this.defaultMQProducer.getRetryTimesWhenSendFailed() : 1;
int times = 0;
for (; times < timesTotal; times++) {
  	// 选择发送的消息queue
    MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName);
    if (mqSelected != null) {
        try {
            // 真正的发送逻辑,sendKernelImpl。
            sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout - costTime);
            switch (communicationMode) {
                case ASYNC:
                    return null;
                case ONEWAY:
                    return null;
                case SYNC:
                    // 如果发送失败了,则continue,意味着还会再次进入for,继续重试发送
                    if (sendResult.getSendStatus() != SendStatus.SEND_OK) {
                        if (this.defaultMQProducer.isRetryAnotherBrokerWhenNotStoreOK()) {
                            continue;
                        }
                    }
					// 发送成功的话,将发送结果返回给调用者
                    return sendResult;
                default:
                    break;
            }
        } catch (RemotingException e) {
            continue;
        } catch (...) {
            continue;
        }
    }
}

Description:

Here is just a summary of the core sending logic, not the entire code. It can be seen as follows:

  • The number of retries for synchronization is 1 + this.defaultMQProducer.getRetryTimesWhenSendFailed(), and the default is 1  for other methods.
  • this.defaultMQProducer.getRetryTimesWhenSendFailed() defaults to 2, we can set it manuallyproducer.setRetryTimesWhenSendFailed(10);
  • Call sendKernelImpl to actually send the message
  • If it is sync synchronously and the sending fails, continue, which means that you will enter for again and continue to retry sending
  • If the sending is successful, the sending result will be returned to the caller
  • If an exception is sent and the catch is entered, continue to try again next time.

3. Solution three

3.1. Description

Suppose that the Broker is down, but the production environment is generally more than M and S, so there will be other master nodes to continue to provide services, which will not affect us to send messages, and our messages are still reachable. Because, for example, when it happens to be sent to the broker, the broker is down, and the producer receives the response from the broker and fails to send. At this time, the producer will automatically retry, and the down broker will be kicked offline at this time, so the producer will change. The broker sends a message.

4. Summary

How does Producer ensure that messages are reachable during the sending phase?

If it fails, it will automatically retry. Even after N retrying, the client will know that the message was unsuccessful. This can also compensate by itself, and will not blindly affect the main business logic. For example, even if the Broker is down, there are other Brokers that will provide services again, which is highly available and does not affect it.

Summarized in a few words: synchronous sending + automatic retry mechanism + multiple Master nodes

Three, Broker storage stage

The Broker must first put the message in the memory, and then persist it to the hard disk according to the flashing strategy. The message has just been received from the Producer, and then it is in the memory, but the abnormal downtime causes the message to be lost.

1. Solution one

MQ persistence messages are divided into two types: synchronous flushing and asynchronous flushing. The default is to refresh the disk asynchronously. After receiving the message, the Broker will first store it in the cache and then immediately notify the Producer that I have received the message and the storage is successful. You can continue your business logic, and then the Broker will set up a thread asynchronously to persist If the Broker goes down before being persisted to the disk, the message will be lost. Synchronous flashing means that after receiving the message and storing it in the cache, it will not notify the Producer that the message is ok, but will not notify the Producer that the message is finished until it is persisted to the disk. This also guarantees that the message will not be lost, but the performance is not as high as asynchronous. The choice depends on the business scenario.

Modify the brushing strategy to synchronous brushing. By default, it is asynchronous flashing, the configuration is as follows

## 默认情况为 ASYNC_FLUSH,修改为同步刷盘:SYNC_FLUSH,实际场景看业务,同步刷盘效率肯定不如异步刷盘高。
flushDiskType = SYNC_FLUSH 

The corresponding Java configuration classes are as follows:

package org.apache.rocketmq.store.config;

public enum FlushDiskType {
    // 同步刷盘
    SYNC_FLUSH,
    // 异步刷盘(默认)
    ASYNC_FLUSH
}

Asynchronous flashing is executed once in 10s by default, and the source code is as follows:

/*
 * {@link org.apache.rocketmq.store.CommitLog#run()}
 */

while (!this.isStopped()) {
    try {
        // 等待10s
        this.waitForRunning(10);
        // 刷盘
        this.doCommit();
    } catch (Exception e) {
        CommitLog.log.warn(this.getServiceName() + " service has exception. ", e);
    }
}

2. Solution two

Cluster deployment, master-slave mode, high availability.

Even if the Broker has set a synchronous flushing strategy, the disk is broken after the Broker flushes the disk, which will cause all the messages on the disk to be lost. But even if it is 1 master and 1 slave, but the master has not had time to synchronize to the slave after the flashing, the disk is broken, isn't it also a GG? That's right!

Therefore, we can also configure not only to notify the Producer after the Master has finished flashing the disk, but also to notify the Producer after the Master and Slave have finished flashing the disk to notify the Producer that the message is ok.

## 默认为 ASYNC_MASTER
brokerRole=SYNC_MASTER

3. Summary

If you want to strictly ensure that the message is not lost during the message storage phase of the Broker, the following configuration is required, but the performance is definitely far worse than the default configuration.

# master 节点配置
flushDiskType = SYNC_FLUSH
brokerRole=SYNC_MASTER

# slave 节点配置
brokerRole=slave
flushDiskType = SYNC_FLUSH

The meaning of the above configuration is:

After the Producer sends a message to the Broker, the Broker's Master node first persists to the disk, and then synchronizes the data to the Slave node. After the Slave node is synchronized and the disk is placed, it will return to the Producer to say that the message is ok.

Fourth, Consumer consumption stage

Consumption failure is actually a variant of message loss.

1. Solution one

The consumer will first pull the message to the local, and then perform the business logic, and manually confirm the ack after the business logic is completed. Only then will it truly represent the completion of the consumption. Rather than saying that the message will be consumed after being pulled locally. for example

 consumer.registerMessageListener(new MessageListenerConcurrently() {
     @Override
     public ConsumeConcurrentlyStatus consumeMessage(List<MessageExt> msgs, ConsumeConcurrentlyContext consumeConcurrentlyContext) {
         for (MessageExt msg : msgs) {
             String str = new String(msg.getBody());
             System.out.println(str);
         }
         // ack,只有等上面一系列逻辑都处理完后,到这步CONSUME_SUCCESS才会通知broker说消息消费完成,如果上面发生异常没有走到这步ack,则消息还是未消费状态。而不是像比如redis的blpop,弹出一个数据后数据就从redis里消失了,并没有等我们业务逻辑执行完才弹出。
         return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
     }
 });

2. Solution two

Message consumption failure will automatically retry. If the consumption message fails without ack confirmation, it will automatically retry. The retry strategy and times (default 15 times) are configured as follows

/**
 * Broker可以配置的所有选项
 */
public class org.apache.rocketmq.store.config.MessageStoreConfig {
    private String messageDelayLevel = "1s 5s 10s 30s 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 20m 30m 1h 2h";
}
  •  

Guess you like

Origin blog.csdn.net/qq_33762302/article/details/114772362