Kafka Partitioning Mechanism and Code Examples Kafka Partitioning Mechanism and Code Examples

Kafka Partitioning Mechanism and Code Examples

In Kafka, topic is a logical concept, and partition is a physical concept. Don't worry, these are transparent to the user. The producer only cares which topic it publishes the message to, while the consumer only cares which topic it subscribes to.

如果没有分区的概念,那么topic的消息集合将集中于某一台服务器上,单节点的存储性能马上将成为瓶颈,当访问该topic存取数据时,吞吐也将成为瓶颈。

介于此,kafka的设计方案是,生产者在生产数据的时候,可以为每条消息人为的指定key,这样消息被发送到broker时,会根据分区规则,选择消息将被存储到哪一个分区中。如果分区规则设置合理,那么所有的消息将会被均匀/线性的分布到不同的分区中,这样就实现了负载均衡和水平扩展。另外,在消费者端,同一个消费组可以多线程并发的从多个分区中 同时消费数据。

上述分区规则,实际上是实现了kafka.producer.Partitioner接口的一个类,这个实现类可以根据自己的业务规则进行自定义制定,如根据hash算法指定分区的分布规则。

如以下这个类,我们先获取key的hashcode值,再跟分区数量(配置文件中为numPartitions)做模运算,结果值作为分区存储位置,这样可以实现数据均匀线性的分布。
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

The implementation of the custom Partitioner is as follows:

import kafka.producer.Partitioner;

/**
 * Created by david on 17-3-27.
 */
public class SimplePartitioner implements Partitioner {
    @Override
    public int partition(Object key, int numPartitions) {
        int partition = 0;

        String k = (String) key;

        partition = Math.abs(k.hashCode()) % numPartitions;

        return partition;
    }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

In kafka-topics.sh in the bin directory, the command line parameters for setting the number of partitions are:

--partitions <Integer: # of partitions> The number of partitions for the topic 
                                          being created or altered (WARNING:   
                                          If partitions are increased for a    
                                          topic that has a key, the partition  
                                          logic or ordering of the messages    
                                          will be affected
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

In addition, you can also use the configuration parameter num.partitions in the configuration file server-XXXX.properties to set the global default number of partitions (if it is also set on the command line or in the code, it will overwrite the global default parameter).

But one thing to note is that when creating partitions for a topic, the number of partitions should preferably be an integer multiple of the number of brokers, so that the partitions of a topic can be evenly distributed in the entire Kafka cluster. Suppose my Kafka cluster consists of 4 brokers. The following figure is an example:

Now create a topic "liuwei0376" and specify 4 partitions for this topic, then these 4 partitions will be distributed on each broker:

./kafka-topics.sh \
--create \
--zookeeper localhost:2181,localhost:2182,localhost:2183 \
--replication-factor 1 \
--partitions 4 \
--topic test_diy_partition
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

In this way, all partitions are evenly distributed in the cluster. If 3 partitions are specified when creating a topic, then there will be a partition on the broker that does not have the topic.

Producers are partitioned according to custom partitioning rules

Now use a producer example (PartitionerProducer) to send messages to Topic lxw1234. The partition rule used by the producer is the SimplePartitioner above. There are a total of 11 messages from 0-10, the key of each message is "key"+index, and the message content is "key"+index+"–value"+index. For example: key0–value0, key1–value1, ,, key10–value10.

package com.david.test.kafka.partition;

import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;

import java.util.Properties;

/**
 * Created by david on 17-3-27.
 *
 * 生产者使用自定义hash分区器,向kafka的broker分区中均匀写入数据
 */
public class PartitionerProducer {

    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("serializer.class", "kafka.serializer.StringEncoder");
        props.put("metadata.broker.list", "localhost:9091,localhost:9092,localhost:9091,localhost:9092");
        props.put("partitioner.class", "com.david.test.kafka.partition.SimplePartitioner");

        Producer<String, String> producer = new Producer<String, String>(new ProducerConfig(props));

        String topic = "test_diy_partition";
        for (int i = 0; i <= 10; i++) {
            String k = "key" + i;
            String v = k + "--value" + i;
            producer.send(new KeyedMessage<String, String>(topic, k, v));
        }
        producer.close();
    }

}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33

理论上来说,生产者在发送消息的时候,会按照SimplePartitioner的规则,将key0做hashcode,然后和分区数(4)做模运算,得到分区索引:

hashcode(“key0”) % 4 = 1

hashcode(“key1”) % 4 = 2

hashcode(“key2”) % 4 = 3

hashcode(“key3”) % 4 = 0

     ……
  • 1
  • 2

对应的消息将会被发送至相应的分区中。

消费者消费自定义分区数据

下面的消费者代码用来验证,在消费数据时,打印出消息所在的分区及消息内容:

package com.david.test.kafka.partition;


import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;

import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import kafka.message.MessageAndMetadata;

/**
 * Created by david on 17-3-27.
 *
 * 消费者轮询消费kafka的broker分区中的数据
 */
public class PartitionerConsumer {
    public static void main(String[] args) {
        String topic = "test_diy_partition";
        ConsumerConnector consumer =
                Consumer.createJavaConsumerConnector(createConsumerConfig());

        Map<String, Integer> topicMap = new HashMap<String, Integer>();

        topicMap.put(topic, new Integer(1));

        Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap =
                consumer.createMessageStreams(topicMap);

        KafkaStream<byte[], byte[]> stream = consumerMap.get(topic).get(0);

        ConsumerIterator<byte[], byte[]> it = stream.iterator();

        while (it.hasNext()) {
            MessageAndMetadata<byte[], byte[]> mam = it.next();
            System.out.println("消费者消费数据: 【分区号: [" + mam.partition() + "], 存储的消息: [" + new String(mam.message()) + "] 】");
        }
    }


    private static ConsumerConfig createConsumerConfig() {
        Properties props = new Properties();
        props.put("group.id", "test_diy_part_group");
        props.put("zookeeper.connect", "localhost:2181,localhost:2182,localhost:2183");
        props.put("zookeeper.session.timeout.ms", "400");
        props.put("zookeeper.sync.time.ms", "200");
        props.put("auto.commit.interval.ms", "1000");
        props.put("auto.offset.reset", "smallest");

        return new ConsumerConfig(props);
    }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56

打印一下结果,可以验证上述hash分区落地位置的推断:

消费者消费数据: 【分区号: [2], 存储的消息: [key1--value1]
消费者消费数据: 【分区号: [2], 存储的消息: [key5--value5]
消费者消费数据: 【分区号: [2], 存储的消息: [key9--value9]
消费者消费数据: 【分区号: [2], 存储的消息: [key10--value10]
消费者消费数据: 【分区号: [0], 存储的消息: [key3--value3]
消费者消费数据: 【分区号: [0], 存储的消息: [key7--value7]
消费者消费数据: 【分区号: [1], 存储的消息: [key0--value0]
消费者消费数据: 【分区号: [1], 存储的消息: [key4--value4]
消费者消费数据: 【分区号: [1], 存储的消息: [key8--value8]
消费者消费数据: 【分区号: [3], 存储的消息: [key2--value2]
消费者消费数据: 【分区号: [3], 存储的消息: [key6--value6]

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325943947&siteId=291194637