[Analysis of the principle of Kafka]

Features of Kafka

High throughput and low latency: Kafka can process hundreds of thousands of messages per second, and its latency is at least a few milliseconds

Scalability: kafka cluster supports hot expansion

Durability and reliability: messages are persisted to local disk, and data backup is supported to prevent data loss

Fault tolerance: allow nodes in the cluster to fail (if the number of replicas is n, allow n-1 nodes to fail)

High concurrency: support thousands of clients to read and write at the same time

 

Some important design ideas of Kafka

The following introduces the main design ideas of Kafka in general, so that relevant personnel can learn about Kafka related features in a short time. If you want to study in depth, each of these features will be introduced in detail later.

 

Consumergroup : Each consumer can form a group, and each message can only be consumed by one consumer in the group. If a message can be consumed by multiple consumers, these consumers must be in different groups.

Message status: In Kafka, the status of the message is stored in the consumer. The broker does not care which message is consumed by whom, and only records an offset value (pointing to the next message position in the partition to be consumed), which It means that if the consumer does not handle it well, a message on the broker may be consumed multiple times.

Message persistence : Kafka will persist messages to the local file system and maintain extremely high efficiency.

Message validity period : Kafka will retain the messages in it for a long time so that the consumer can consume it multiple times. Of course, many of the details are configurable.

Batch sending : Kafka supports batch sending in units of message sets to improve push efficiency.

push-and-pull: The Producer and Consumer in Kafka adopt the push-and-pull mode, that is, the Producer only pushes messages to the broker, and the consumer only pulls messages from the broker, and the production and consumption of messages are asynchronous.

The relationship between the brokers in the Kafka cluster : it is not a master-slave relationship, each broker has the same status in the cluster, and we can add or delete any broker node at will.

Load balancing : Kafka provides a metadata API to manage the load between brokers (for Kafka 0.8.x, for 0.7.x, zookeeper is mainly used to achieve load balancing).

Synchronous and asynchronous: The Producer adopts the asynchronous push method, which greatly improves the throughput rate of the Kafka system (the synchronous or asynchronous method can be controlled by parameters).

Partitioning mechanism partition: The broker side of Kafka supports message partitioning. The Producer can decide which partition to send the message to. The order of the messages in a partition is the order in which the Producer sends the message. There can be multiple partitions in a topic. The specific number of partitions is Configurable. The significance of the partition is very important, and the content will be gradually reflected later.

Offline data loading: Kafka is also very suitable for data loading into Hadoop or data warehouses due to its support for scalable data persistence.

Plug-in support: Many active communities have developed many plug-ins to expand the functions of Kafka, such as plug-ins related to Storm, Hadoop, and flume.

 

 

Kafka HA design analysis

 

How to distribute all Replica evenly across the cluster

In order to do better load balancing, Kafka tries to evenly distribute all Partitions to the entire cluster. A typical deployment method is that the number of Partitions of a Topic is greater than the number of Brokers. At the same time, in order to improve the fault tolerance of Kafka, it is also necessary to distribute the replicas of the same Partition to different machines as much as possible. In fact, if all Replicas are on the same Broker, once the Broker goes down, all Replicas in the Partition will not be able to work, and the effect of HA will not be achieved. At the same time, if a Broker goes down, it is necessary to ensure that the load on it can be evenly distributed to all other surviving Brokers.

 

The algorithm for Kafka to allocate Replica is as follows:

 

Sort all Brokers (assuming a total of n Brokers) and Partitions to be allocated

Assign the i-th Partition to the (i mod n)-th Broker

Assign the jth Replica of the ith Partition to the ((i + j) mode n)th Broker

 

 

Data Replication

 

Kafka's Data Replication needs to solve the following problems:

 

How to Propagate a message

How many Replicas need to ensure that the message has been received before sending an ACK to the Producer

How to deal with a Replica not working

How to deal with Failed Replica recovery

Propagate message

When the Producer publishes a message to a Partition, it first finds the Leader of the Partition through ZooKeeper, and then no matter what the Replication Factor of the Topic is (that is, how many replicas the Partition has), the Producer only sends the message to the Leader of the Partition . Leader will write the message to its local Log. Each Follower pulls data from the Leader. In this way, the order of data stored by the Follower is consistent with that of the Leader. After the Follower receives the message and writes it to its Log, it sends an ACK to the Leader. Once the Leader has received ACKs from all Replicas in the ISR, the message is considered to have been committed, the Leader will increase the HW and send an ACK to the Producer.

 

In order to improve performance, each Follower sends an ACK to the Leader immediately after receiving the data, rather than waiting for the data to be written to the Log. Therefore, for messages that have been committed, Kafka can only guarantee that it is stored in the memory of multiple replicas, but cannot guarantee that they are persisted to disk, and it cannot fully guarantee that the message will be consumed by the Consumer after an exception occurs. . However, considering that this kind of scenario is very rare, it can be considered that this method has achieved a good balance between performance and data persistence. In future releases, Kafka will consider offering higher durability.

 

Consumers also read messages from the Leader, and only committed messages (messages with an offset lower than HW) will be exposed to the Consumer.

 

The Leader keeps track of the list of Replicas that it keeps in sync with, this list is called ISR (ie in-sync Replica). If a follower goes down, or falls too far behind, the leader will remove it from the ISR . "Too much behind" described here means that the number of messages replicated by the Follower is behind the Leader by more than a predetermined value (this value can be configured in $KAFKA_HOME/config/server.properties through replica.lag.max.messages, which defaults to The value is 4000) or the Follower exceeds a certain period of time (this value can be configured through replica.lag.time.max.ms in $KAFKA_HOME/config/server.properties, the default value is 10000) and does not send a fetch request to the Leader.

 

Kafka dynamically maintains an ISR (in-sync replicas) in ZooKeeper. All replicas in this ISR keep up with the leader, and only members in the ISR can be selected as leaders. In this mode, for f+1 replicas, a Partition can tolerate the failure of f replicas without losing committed messages. In most usage scenarios, this mode is very beneficial. In fact, in order to tolerate the failure of f replicas, Majority Vote and ISR need to wait the same number of replicas before committing, but the total number of replicas required by ISR is almost half of Majority Vote.

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326266053&siteId=291194637