Consumer mechanism

Consumer mechanism

1 Introduction

In Kafka in, Consumer complexity is much higher than the producer, for the Producer, there is no concept of the producer group does not need to care offset other issues, and Consumer is not the same, a lot of content that needs attention, needs Distribute consumer (consumer Group), in order to prevent repeated consumption or part of the data does not need to consider consumption offset, which are on the consumer Server design and its proposed deal with high demands. This chapter describes the consumer about the mechanism.

Use Example 2

First, a simple demo look at consumer clients use

 

 

Mainly comprising about several steps:

  • l configuration Propertity, for consumer-related arrangement;
  • l create objects of consumer KafkaConsumer;
  • l subscription list corresponding topic;
  • l poll method calls consumer pulling subscription message.

The first two steps in the Consumer bottom just create a consumer objects, the third step only record information about the topic subscribed, the actual operation of consumer is the fourth step, which is the method implemented in the poll, poll model for understanding consumer design Very important.

 

3 Consumer two kinds of subscription models

In kafka in, under normal circumstances, with different consumers in a group.id will not be spending the same partition, a partition that is a consumer can only have been the same consumer group.id in at any time. It is this mechanism to ensure that important features of kafka:

  • 1, it is possible to improve the throughput by increasing the partitions and Consumer;
  • 2, to ensure the same copy of the message will not be consumed more than once.

In KafkaConsumer class (the official API), customers can specify the topic-partition to be consumed by assign and subscribe in two ways. Specific reference to the following source can be,

Both interfaces are seemingly do the same thing, but there are subtle differences, first-time students may be confused, the following describes the differences between the two in detail under.

In the new version of the Consumer in, subscribe model now called subscription model, KafkaConsumer provides three API, as follows:

 

 

These three are in accordance with API level to subscribe topic, topic may be obtained dynamically-assigned partition, which is the use of  Group dynamic management , it can not be used with manual management partition. When the following event monitoring to occur, Group triggered rebalance operation:

  • l subscribe to the topic list changes;
  • l topic to be created or deleted;
  • L consumer group to a consumer instance hang;
  • l a new instance to a consumer by a group join method.

In this mode, when calling pollOnce method KafkaConsumer, the first step will be added to a first group in the list and acquires its assigned topic-partition.

Here is what you call when subscribe () method, doing things Consumer, two cases introduce a list by topic subscription A subscription model is based on pattern:

1.topic Subscribe

  • l Update Subscription SubscriptionState recorded (recording is subscribed topic list), the type is set to SubscriptionType  AUTO_TOPICS ;
  • l Update topic list (Topics variable) Metadata in, and Metadata update request;

2. pattern mode subscription

  • l update subscribedPattern SubscriptionState recorded in the pattern is set, the type is set to SubscriptionType  AUTO_PATTERN ;
  • Metadata is provided needMetadataForAllTopics l is true, i.e. when requested metadata, metadata information to update all of the topic, and then setting request updates metadata;
  • l call coordinator.updatePatternSubscription () method, through all the topic of metadata, to find a list of all meet the pattern of the topic, to update topics SubscriptionState of subscriptions and Metadata in;
  • l by calling addMetadataListener () method in ConsumerCoordinator add listener in the Metadata when you call a method in the third step of updating each time a metadata update, but only if the local cache topic list and now want to subscribe to the topic list is not at the same time, will trigger rebalance operation.

Other parts, the two are basically the same, but the pattern models in each update topic-metadata, to obtain a list of global topic, if there are qualified new entrants to the topic of discovery, flew to subscribe to other places, including the Group management, topic-partition allocation is the same.

The following look at the distribution pattern Consumer offer when calling  assign() when manually assigning a list of topic-partition method is not going to use the Group's management mechanism consumer, that is, when the metadata information changes consumer group member change or topic that does not trigger rebalance operation. For example: When increasing topic of partition, this is not perceived, require the user to make the appropriate treatment, this approach is the use of Apache Flink.

 

 

If the assign mode, i.e. when a non AUTO_TOPICS or AUTO_PATTERN mode, when calling Consumer instance poll method is not sent join-group / sync-group / heartbeat requests to GroupCoordinator, that is to say not get GroupCoordinator this information Consumer instance, will not go to maintain the member is alive, in which case it requires the user to manage their own handlers. However, in this mode can be offset commit a.

To do something simple summary:

mode

the difference

In common

subscribe()

Use Kafka Group management operation is automatically performed rebalance

You can save offset in Kafka

assign()

Users to handle their own related

May also be offset commit, but try to ensure uniqueness group.id, if using the above pattern as a group, offset commit request will be denied

 

4 consumer poll model

 

 

consumer poll method is mainly to do the following things:

  • l The corresponding topic-partition if the consumer subscription;
  • l calling pollOnce () method to get the corresponding Records;
  • l  在返回获取的 records 前,发送下一次的 fetch 请求,避免用户在下次请求时线程 block 在 pollOnce() 方法中;
  • l  如果在给定的时间(timeout)内获取不到可用的 records,返回空数据。

其过程可以用下面的伪代码表示:

poll(timeout){

    根据poll(timeout)参数,估算剩余时间

    while(还有剩余时间)

      从Fetcher端拉取消费到的消息

      if(消息数量不为空)

         创建发送请求

         立刻将请求发送

      else

         return

      end   //if ends

      计算剩余时间

    end  //while ends

}

这里可以看出,poll 方法的真正实现是在 pollOnce 方法中,poll 方法通过 pollOnce 方法获取可用的数据。

Consumer poll 方法的真正实现是在 pollOnce() 方法中,这里直接看下其源码:

 

 

pollOnce 可以简单分为6步来看,其作用分别如下:

  • coordinator.poll():获取 GroupCoordinator 的地址,并建立相应 tcp 连接,发送 join-group、sync-group,之后才真正加入到了一个 group 中,这时会获取其要消费的 topic-partition 列表,如果设置了自动 commit,也会在这一步进行 commit,总之,对于一个新建的 group,group 状态将会从 Empty –> PreparingRebalance –> AwaiSync –> Stable
  • updateFetchPositions(): 在上一步中已经获取到了这个 consumer 实例要订阅的 topic-partition list,这一步更新其 fetch-position offset,以便进行拉取;
  • fetcher.sendFetches():返回其 fetched records,并更新其 fetch-position offset,只有在 offset-commit 时(自动 commit 时,是在第一步实现的),才会更新其 committed offset;
  • fetcher.sendFetches():只要订阅的 topic-partition list 没有未处理的 fetch 请求,就发送对这个 topic-partition 的 fetch 请求,在真正发送时,还是会按 node 级别去发送,leader 是同一个 node 的 topic-partition 会合成一个请求去发送;
  • client.poll():调用底层 NetworkClient 提供的接口去发送相应的请求;
  • coordinator.needRejoin():如果当前实例分配的 topic-partition 列表发送了变化,那么这个 consumer group 就需要进行 rebalance。

 

5 Consumer 加入消费组

这里先简单介绍一下 GroupCoordinator 这个角色,后续有一篇文章进行专门讲述,GroupCoordinator 是运行在 Kafka Broker 上的一个服务,每台 Broker 在运行时都会启动一个这样的服务,但一个 consumer 具体与哪个 Broker 上这个服务交互,就需要先介绍一下 __consumer_offsets 这个 topic。__consumer_offsets 是 Kafka 内部使用的一个 topic,专门用来存储 group 消费的情况,默认情况下有50个 partition,每个 partition 三副本,如下图所示(只列出了30 个 partition):

 

 

GroupCoordinator 是负责 consumer group member 管理以及 offset 管理。每个 Consumer Group 都有其对应的 GroupCoordinator,但具体是由哪个 GroupCoordinator 负责与 group.id 的 hash 值有关,通过这个 abs(GroupId.hashCode()) % NumPartitions 来计算出一个值(其中,NumPartitions 是 __consumer_offsets 的 partition 数,默认是50个),这个值代表了 __consumer_offsets 的一个 partition,而这个 partition 的 leader 即为这个 Group 要交互的 GroupCoordinator 所在的节点。

6 consumer offset commit

6.1客户端commit请求处理

两种commit机制:一种是同步 commit,一种是异步 commit

 

 

 

 

同步 commit 的实现方式,client.poll() 方法会阻塞直到这个request 完成或超时才会返回。

 

 

对于异步的 commit,最后调用的都是 doCommitOffsetsAsync 方法,其具体实现如下:

 

 

在异步 commit 中,可以添加相应的回调函数,如果 request 处理成功或处理失败,ConsumerCoordinator 会通过 invokeCompletedOffsetCommitCallbacks() 方法唤醒相应的回调函数。

 

6.2服务端commit offset 请求处理

当 Kafka Serve 端受到来自 client 端的 Offset Commit 请求时,其处理逻辑如下所示,是在 kafka.coordinator.GroupCoordinator 中实现的。

 

 

 

处理过程如下:

  • l  如果这个 group 还不存在(groupManager没有这个 group 信息),并且 generation 为 -1(一般情况下应该都是这样),就新建一个 GroupMetadata, 其 Group 状态为 Empty;
  • l  现在 group 已经存在,就调用 doCommitOffsets() 提交 offset;
  • l  如果是来自 assign 模式的请求,并且其对应的 group 的状态为 Empty(generationId < 0 && group.is(Empty)),那么就记录这个 offset;
  • l  如果是来自 assign 模式的请求,但这个 group 的状态不为 Empty(!group.has(memberId)),也就是说,这个 group 已经处在活跃状态,assign 模式下的 group 是不会处于的活跃状态的,可以认为是 assign 模式使用的 group.id 与 subscribe 模式下使用的 group 相同,这种情况下就会拒绝 assign 模式下的这个 offset commit 请求。

 

7总结

 consumer总体流程

 

 

7参考资料:

http://matt33.com/2017/11/11/consumer-pollonce/

https://blog.csdn.net/zhanyuanlin/article/details/76269308

https://juejin.im/post/5c0bd405e51d45524146e05c

 

Guess you like

Origin www.cnblogs.com/zhy-heaven/p/10993961.html