9.Kafka系列之设计思想(七)-配额

4.9 Quotas配额

Kafka cluster has the ability to enforce quotas on requests to control the broker resources used by clients. Two types of client quotas can be enforced by Kafka brokers for each group of clients sharing a quota:

Kafka 集群能够对请求强制执行配额,以控制客户端使用的代理资源。Kafka 代理可以为共享配额的每组客户端强制执行两种类型的客户端配额:

1.Network bandwidth quotas define byte-rate thresholds (since 0.9)
网络带宽配额定义字节速率阈值(自 0.9 起)
2.Request rate quotas define CPU utilization thresholds as a percentage of network and I/O threads (since 0.11)
请求率配额将 CPU 利用率阈值定义为网络和 I/O 线程的百分比(自 0.11 起

Why are quotas necessary?为什么需要配额?

It is possible for producers and consumers to produce/consume very high volumes of data or generate requests at a very high rate and thus monopolize broker resources, cause network saturation and generally DOS other clients and the brokers themselves. Having quotas protects against these issues and is all the more important in large multi-tenant clusters where a small set of badly behaved clients can degrade user experience for the well behaved ones. In fact, when running Kafka as a service this even makes it possible to enforce API limits according to an agreed upon contract.

生产者和消费者有可能生产/消费非常大量的数据或以非常高的速率生成请求,从而垄断代理资源,导致网络饱和,并且通常 DOS 其他客户端和代理本身。拥有配额可以防止这些问题,并且在大型多租户集群中尤为重要,在这种情况下,一小部分行为不端的客户端可能会降低行为良好的用户体验。事实上,当将 Kafka 作为服务运行时,这甚至可以根据商定的合同强制执行 API 限制

Client groups客户群体

The identity of Kafka clients is the user principal which represents an authenticated user in a secure cluster. In a cluster that supports unauthenticated clients, user principal is a grouping of unauthenticated users chosen by the broker using a configurable PrincipalBuilder. Client-id is a logical grouping of clients with a meaningful name chosen by the client application. The tuple (user, client-id) defines a secure logical group of clients that share both user principal and client-id.

Kafka 客户端的身份是用户主体,代表安全集群中经过身份验证的用户。在支持未经身份验证的客户端的集群中,用户主体是代理使用可配置的PrincipalBuilder. Client-id 是客户端的逻辑分组,具有由客户端应用程序选择的有意义的名称。元组 (user, client-id) 定义了共享用户主体和客户端 id 的安全逻辑客户端组。

Quotas can be applied to (user, client-id), user or client-id groups. For a given connection, the most specific quota matching the connection is applied. All connections of a quota group share the quota configured for the group. For example, if (user=“test-user”, client-id=“test-client”) has a produce quota of 10MB/sec, this is shared across all producer instances of user “test-user” with the client-id “test-client”.

配额可以应用于(用户、客户端 ID)、用户或客户端 ID 组。对于给定的连接,将应用与该连接匹配的最具体的配额。配额组的所有连接共享为该组配置的配额。例如,如果 (user=“test-user”, client-id=“test-client”) 的生产配额为 10MB/秒,这将在用户“test-user”的所有生产者实例与客户端之间共享- ID“测试客户端”

Quota Configuration配额配置

Quota configuration may be defined for (user, client-id), user and client-id groups. It is possible to override the default quota at any of the quota levels that needs a higher (or even lower) quota. The mechanism is similar to the per-topic log config overrides. User and (user, client-id) quota overrides are written to ZooKeeper under /config/users and client-id quota overrides are written under /config/clients. These overrides are read by all brokers and are effective immediately. This lets us change quotas without having to do a rolling restart of the entire cluster. See here for details. Default quotas for each group may also be updated dynamically using the same mechanism.

可以为(用户、客户端 ID)、用户和客户端 ID 组定义配额配置。可以在任何需要更高(或更低)配额的配额级别覆盖默认配额。该机制类似于每个主题的日志配置覆盖。用户和 (user, client-id) 配额覆盖被写入/config/users下的 ZooKeeper ,client-id 配额覆盖被写入/config/clients下。这些覆盖被所有经纪人读取并立即生效。这让我们无需滚动重启整个集群即可更改配额。有关详细信息,请参见此处。每个组的默认配额也可以使用相同的机制动态更新。

The order of precedence for quota configuration is:

  1. /config/users//clients/
  2. /config/users//clients/
  3. /config/users/
  4. /config/users//clients/
  5. /config/users//clients/
  6. /config/users/
  7. /config/clients/
  8. /config/clients/

Network Bandwidth Quotas网络带宽配额

Network bandwidth quotas are defined as the byte rate threshold for each group of clients sharing a quota. By default, each unique client group receives a fixed quota in bytes/sec as configured by the cluster. This quota is defined on a per-broker basis. Each group of clients can publish/fetch a maximum of X bytes/sec per broker before clients are throttled.

网络带宽配额定义为共享配额的每组客户端的字节速率阈值。默认情况下,每个唯一的客户端组都会收到由集群配置的以字节/秒为单位的固定配额。此配额是基于每个经纪人定义的。在客户端受到限制之前,每组客户端最多可以发布/获取每个代理的 X 字节/秒

Request Rate Quotas请求速率配额

Request rate quotas are defined as the percentage of time a client can utilize on request handler I/O threads and network threads of each broker within a quota window. A quota of n% represents n% of one thread, so the quota is out of a total capacity of ((num.io.threads + num.network.threads) * 100)%. Each group of clients may use a total percentage of upto n% across all I/O and network threads in a quota window before being throttled. Since the number of threads allocated for I/O and network threads are typically based on the number of cores available on the broker host, request rate quotas represent the total percentage of CPU that may be used by each group of clients sharing the quota.

请求率配额定义为客户端可以在配额窗口内使用每个代理的请求处理程序 I/O 线程和网络线程的时间百分比。n%的配额表示 一个线程的n%,因此配额超出了((num.io.threads + num.network.threads) * 100)%的总容量。每组客户端最多可以使用n%的总百分比在被限制之前跨配额窗口中的所有 I/O 和网络线程。由于为 I/O 和网络线程分配的线程数通常基于代理主机上可用的内核数,因此请求率配额代表每组共享配额的客户端可能使用的 CPU 的总百分比

Enforcement执法

By default, each unique client group receives a fixed quota as configured by the cluster. This quota is defined on a per-broker basis. Each client can utilize this quota per broker before it gets throttled. We decided that defining these quotas per broker is much better than having a fixed cluster wide bandwidth per client because that would require a mechanism to share client quota usage among all the brokers. This can be harder to get right than the quota implementation itself!

默认情况下,每个唯一的客户端组都会收到集群配置的固定配额。此配额是基于每个经纪人定义的。在限制之前,每个客户端都可以使用每个代理的这个配额。我们决定为每个代理定义这些配额比为每个客户端设置固定的集群带宽要好得多,因为这需要一种机制来在所有代理之间共享客户端配额使用情况。这可能比配额实施本身更难做到!

How does a broker react when it detects a quota violation? In our solution, the broker first computes the amount of delay needed to bring the violating client under its quota and returns a response with the delay immediately. In case of a fetch request, the response will not contain any data. Then, the broker mutes the channel to the client, not to process requests from the client anymore, until the delay is over. Upon receiving a response with a non-zero delay duration, the Kafka client will also refrain from sending further requests to the broker during the delay. Therefore, requests from a throttled client are effectively blocked from both sides. Even with older client implementations that do not respect the delay response from the broker, the back pressure applied by the broker via muting its socket channel can still handle the throttling of badly behaving clients. Those clients who sent further requests to the throttled channel will receive responses only after the delay is over.

代理在检测到配额违规时如何反应?在我们的解决方案中,代理首先计算将违规客户端置于其配额之下所需的延迟量,并立即返回包含延迟的响应。在获取请求的情况下,响应将不包含任何数据。然后,代理将到客户端的通道静音,不再处理来自客户端的请求,直到延迟结束。在收到具有非零延迟持续时间的响应后,Kafka 客户端还将避免在延迟期间向代理发送进一步的请求。因此,来自受限客户端的请求会被双方有效阻止。即使使用不尊重代理延迟响应的旧客户端实现,代理通过静音其套接字通道施加的背压仍然可以处理行为不端的客户端的节流。那些向受限通道发送进一步请求的客户端只有在延迟结束后才会收到响应

Byte-rate and thread utilization are measured over multiple small windows (e.g. 30 windows of 1 second each) in order to detect and correct quota violations quickly. Typically, having large measurement windows (for e.g. 10 windows of 30 seconds each) leads to large bursts of traffic followed by long delays which is not great in terms of user experience.

在多个小窗口(例如 30 个窗口,每个窗口 1 秒)上测量字节速率和线程利用率,以便快速检测和纠正配额违规。通常,具有较大的测量窗口(例如 10 个窗口,每个窗口 30 秒)会导致大量流量突发,随后出现长时间延迟,这在用户体验方面并不是很好。

猜你喜欢

转载自blog.csdn.net/SJshenjian/article/details/130353047