Use netty freehand line and a simple kafkaClient

Use netty freehand line and a ZkClient

Use netty freehand line and a RedisClient

Two days ago we Bowen describes how to use netty freehand line and a kafka client. The client is the so-called kafka kafka the producer and the consumer.

Tucao about kafka's api design

As we all know, kafka client is reconstructed through a version of Before 0.8 producer and consumer is to use the scala developed later due to various reasons, it is changed no more. To version 0.9 when using heavy java kafka the client configuration.

Although java version of the client is still widely used and does not have much of a performance problem, but according to my research these days on api kafka client, I always felt that one day, kafka clients have come to a completely reconstructed because of what it because it is too - chaos - friends.?:

1. 多版本问题.
   每个api都有好几个版本, 但是每个api使用的版本都不一致. 
   举个例子, 在kafka-client 1.0.0中,broker的版本是2.3.0时:
   METADATA(拉取topic元数据)的api有1个version, 当前使用版本是1.
   PRODUCE(生产消息)的api有6个version, 当前使用版本是6
   FETCH(拉取消息)的api有5个version, 当前使用版本是5

2. 报文的数据结构巨复杂
   等下实现生产消息的报文的时候,你们会看到,这个报文嵌套了6层,即有6个子结构体.
复制代码

Above me. Of course, I may be the insufficient level of small Tucao kafka api's, so it failed to understand the design intent and meaning ~~

kafka message format

kafka has continued to optimize its own, so its message format is always changing.

<< Apache kafka in combat (with Hu Xi) >> book (based on kafka 1.0.0), the author introduces far, a total of three kinds of message formats V0, V1, V2. V0 and V1 which due to various malpractice, has long been gradually phased out. now a new version of kafka use are V2 version of the message format. This paper is implemented on kafka2.3.0, using V2 formatted message to test passed.

So the message format is presented here V2 version.

在开始介绍kafka的消息格式之前, 大家还要理解一个概念: 可变长度.
常规的长度字段要么就是使用4个字节,要么就是使用8个字节来表示,
总之这个字段使用的字节数一般都是固定的.
但是在kafka的v2版本的消息里就不一样了.
它参考了Zig-zag的编码方式, 可以使用不同长度的字段来表示不同的数值.

简单来说就是这样:

用 0 来表示 0
用 1 来表示 1
用 2 来表示 -1
用 3 来表示 2
用 4 来表示 -2
.....

这样的好处就是可以用比较少的字节数来表示绝对值比较小的数字, 
不用每个数字都占用4个或8个字节, 从而可以节省很大的空间

复制代码

Learn the v2 version of "variable length" concept can look after kafka message format the following chart (screenshot from << Apache kafka combat (with Hu Xi) >> book):

image
Let's understand one by one these fields

  1. 消息总长度. 顾名思义, 就是这条消息的总长度啦. 用的是Zig-zag编码表示
  2. 属性. 一个字节表示(8位), 其中第三位用来表示压缩方式.高5位保留,没有用到
  由于我这里的实现没有用到压缩,所以这个字段总是0
  3. 时间戳增量.也是用Zig-zag编码. 所谓增量, 是指针对该消息batch的第一条消息的时间戳的增量.
  消息batch接下来会介绍.
  4. 位移增量. 跟时间戳增量含义差不多
  5. key length. 每条kafka消息都可以有key, 这个就表示key的字节数
  6. key. 这个字段就是kafka消息里面的key.
  7. value size. 更key length含义差不多
  8  value. 就是kafka消息的内容
  9. header个数. kafka消息也可以带有header
  10. header. kafka的header
复制代码

See the third and fourth field is a bit of a look ignorant? It does not matter, continue to read you will understand.

When sending a message kafka not have a send a, but to bring together multiple messages, and then sent together. This is called kafka message batch.

And after that the message sent to kafka batch of broker, it also does not open, but the news batch intact to the consumer, or stored in a log file.

So we understand the message batch to achieve send messages and consume messages are necessary.

The format of the message batch as shown below:

image

Is not a bit sudden collapse Ben, suddenly cropped up so many fields. Did not have way, let us see one by one.

First last "news" is the v2 version of the format of a message described above, there may be a x, x is the penultimate field "message number."

The rest of the field:

1. 起始位移
   最后面的"消息"中第一条消息的位移offset
2. 长度
   表示接下来的报文的长度, 即"消息batch的总长度" - 8Byte(起始位移字段) - 4Byte(长度字段)
3. 分区leader版本号
   我这里的实现写死为-1
4. 版本号
   就是magic. 我们这里是V2,所以是2
5. CRC
   是指接下来的所有字段的CRC码
6. 属性
   跟上面消息中的属性的含义一致
7. 最大位移增量
   就是最后一条消息的"位置增量"的值
8. 起始时间戳
   就是第一条消息的时间戳
9. 最大时间戳
   最后一条消息的时间戳
10. 后面三个pid epoch, seq三个字段都是跟事务等相关的,我们这里没有用到, 所以都写死为-1   
复制代码

Here the "news" and "news batch" I defined in the code bean are KafkaMsgRecordV2 and KafkaMsgRecordBatch. If you look at the above text and pictures do not understand, you can follow the code to see, or can be understood more deeply. Please github see text at the end of the address.

Of course, if you understand this period, that's fine. But do not be happy too early. Because of the above said, kafak send a message nested data structure of six layers, and here only two. That is still waiting for the 4th floor we have to understand. of course, that layer 4 is relatively simpler. has passed the hardest part to understand

requestHeader和responseHeader

api kafka each request must request with a header, and each of the response body api header.requestHeader and also with a response responseHeader are as shown:

image

image

header response is relatively simple, it is a correlationId, this id is actually sent by the client to the server, the server returned intact. xid zookeeper with the role of the same.

We take a look requestHeader

apikey and apiVersion

public enum ApiKeys {
    /**
     * 发送消息
     */
    PRODUCE(0, "Produce", (short) 5),

    /**
     * fetch 消息
     */
    FETCH(1, "Fetch", (short)6),
    /**
     * 拉取元数据
     */
    METADATA(3, "Metadata", (short) 1);

    public final short id;

    public final String name;

    public short apiVersion;

    ApiKeys(int id, String name, short apiVersion) {
        this.id = (short) id;
        this.name = name;
        this.apiVersion = apiVersion;
    }


}
复制代码

Code id field is apiKey, apiVersion is the corresponding header in apiVersion. As we like the beginning Tucao, each version api is not the same. In this realization, I just realized three api. But In fact kafka provide dozen api.

correlationId

Id and relevance in xid zkClient effect is the same, mainly in association with the request and response. Kafka response packet will contain this field.

clientIdLen和clientId

Whether kafka producers and consumers, need to specify a clientId. In the official client, if we do not specify, it will automatically generate a clientId.

Finally it is worth mentioning that, clientIdLen here is represented by two bytes. Kafka string length which are represented by 2 bytes. Zookeeper with which this is not the same.

Producers

Producer logic is implemented in the send method KafkaClient:

  public ProduceResponse send(KafkaProduceConfig config, String topic , String key, String val)
复制代码

As has been mentioned above, the total request packet producer nested layer 6, specifically as follows:

  1. ProduceRequest继承KafkaRequestHeader, 持有TopicProduceData对象
  2. TopicProduceData 持有PartitionData对象
  3. PartitionData持有Record对象
  4. Record持有KafkaMsgRecordBatch对象
  5. KafkaMsgRecordBatch持有KafkaMsgRecordV2对象
复制代码

You can see that, in fact, is the "broker information" => "topic information" => "Partition Information" => "record information" => "message batch" => "Message" and other levels gradually packaging.

Illustrated fields and the packet will not be given here, the students may be interested with the code, directly from the start of the sequence, it will be appreciated that the communication protocol kafka producers, logic generally as follows:


- ProduceRequest.serializable()
- KafkaRequestHeader.serializable()
    - TopicProduceData.serializable()
        - PartitionData.serializable()
            - Record.serializable()
               - KafkaMsgRecordBatch.serializable()
                   - KafkaMsgRecordV2.serializable()
复制代码

After a series serializable above to ultimately convert a ProduceRequest objects into a ByteBuf, kafka sent to the Broker, a message is successfully generated.

consumer

Producer logic is implemented in the poll method KafkaClient:

public Map<Integer, List<ConsumerRecord>> poll(KafkaConsumerConfig consumerConfig, String topic, int partition, long fetchOffset)
复制代码

With respect to the producers, consumers request packets is relatively simple, it is also a "Configuration broker" => "topic information" => "partition information" from the packaging process

As follows:

1. FetchRequest 继承KafkaRequestHeader, 持有FetchTopicRequest对象
2. FetchTopicRequest持有FetchTopicPartitionRequest对象

复制代码

However, consumer response body is relatively much more complex than the response of the body producers.

As mentioned above, the producers sent broker "Message batch", broker is not going to resolve it into a specific message. Intact and save it to log in to, which is untouched by consumers to consume, so work to resolve this news naturally falls on the shoulders of consumers.

For details, see KafkaClient # parseResp () method

Code runs

And before ZkClient and RedisClient, as here, also achieved a kafkaClientTest, easy to experience and debugging.

The test for several scenarios:

  1. Production message kafkaClientTest, the use of kafka consume comes kafka-console-consumer.sh

Production news:

    private final static String host = "localhost";
    private final static int port = 9092;
    private final static String topic = "testTopic1";
 @Test
    public void testProducer(){
        KafkaClient kafkaClient = new KafkaClient("producer-111", host, port);
        KafkaProduceConfig kafkaConfig = new KafkaProduceConfig();
        // 注意这里设置为0时, broker不会响应任何数据, 但是消息实际上是发送到broker了的
        short ack = -1;
        kafkaConfig.setAck(ack);
        kafkaConfig.setTimeout(30000);
        ProduceResponse response  = kafkaClient.send(kafkaConfig, topic,"testKey","helloWorld1113");
        assert ack == 0 || response != null;
        System.out.println(new Gson().toJson(response));
    }
复制代码

You can see the message console is consumed:

lhhMacBook-Air:bin$ sh kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic testTopic1
helloWorld1113
复制代码
  1. Production of news (news scene 1) In kafkaClientTest, the consumer message kafkaClientTest:
   private final static String host = "localhost";
    private final static int port = 9092;
    private final static String topic = "testTopic1";
   @Test
    public void testConsumer(){
        // 如果broker上不存在这个topic的话, 直接消费可能会报错, 可以fetch一下metadata, 或先生产消息
        // testMetaData();
        // testProducer();
        KafkaClient kafkaClient = new KafkaClient("consumer-111", host, port);
        KafkaConsumerConfig consumerConfig = new KafkaConsumerConfig();
        consumerConfig.setMaxBytes(Integer.MAX_VALUE);
        consumerConfig.setMaxWaitTime(30000);
        consumerConfig.setMinBytes(1);
        Map<Integer, List<ConsumerRecord>>  response = kafkaClient.poll(consumerConfig, topic, 0, 0L);
        assert response != null && response.size() > 0;
        Set<Map.Entry<Integer, List<ConsumerRecord>>> entrySet =response.entrySet();
        for(Map.Entry<Integer, List<ConsumerRecord>> entry : entrySet){
            Integer partition = entry.getKey();
            System.out.println("partition" + partition + "的数据:");
            for(ConsumerRecord consumerRecord : entry.getValue()){
                System.out.println(new Gson().toJson(consumerRecord));
            }
        }

    }
复制代码

Console messages just print out the production (including test message before), indicating the success of the consumer:

partition0的数据:
{"offset":0,"timeStamp":1573896186007,"key":"testKey","val":"helloWorld"}
{"offset":1,"timeStamp":1573896202787,"key":"testKey","val":"helloWorld"}
{"offset":2,"timeStamp":1573896309808,"key":"testKey","val":"helloWorld111"}
{"offset":3,"timeStamp":1573899639313,"key":"testKey","val":"helloWorld1113"}
{"offset":4,"timeStamp":1574011584095,"key":"testKey","val":"helloWorld1113"}
复制代码
  1. Use kafka-console-producer.sh production news, consumer news in kafkaClientTest:

Production news:

lhhMacBook-Air:bin$ sh kafka-console-producer.sh --broker-list localhost:9092 --topic testTopic222
>hi
>h
复制代码

Consumer news output, the consumer success

partition0的数据:
{"offset":0,"timeStamp":1574012251856,"val":"hi"}
{"offset":1,"timeStamp":1574012270368,"val":"h"}
复制代码

Source

Finally, attach the github source address:

github.com/NorthWard/a…

Interested students can refer to, to learn together and progress.

Guess you like

Origin juejin.im/post/5ddb5605e51d4523551669b3