kafka脚本命令、配置解读

文章目录

1.常用命令总结

启动kafka

# 以守护进程的方式启动kafka服务端，去掉 -daemon 参数则关闭当前窗口服务端自动退出
$ bin/kafka-server-start.sh -daemon config/server.properties

1）kafka-topic.sh 脚本相关常用命令，主要操作 Topic（主题增删查改）

创建名字为 “op_log” 的 Topic。

分区数量为3
分区影响因子为1

$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partition 3 --topic op_log

查看特定ZK所管理的 Topic 列表

$ bin/kafka-topics.sh --list --zookeeper localhost:2181

查看指定 Topic 的详细信息，包括 Partition 个数，副本数，ISR 信息

$ bin/kafka-topics.sh --zookeeper localhost:2181 --describe op_log

修改指定 Topic 的 Partition 个数。注意：该命令只能增加 Partition 的个数，不能减少 Partition 个数，当修改的 Partition 个数小于当前的个数，会报如下错误：

Error while executing topic command : The number of partitions for a topic can only be increased

$ bin/kafka-topics.sh --alter --zookeeper localhost:2181 --topic op_log1 --partition 4

删除名字为 “op_log” 的 Topic。注意如果直接执行该命令，而没有设置 delete.topic.enable 属性为 true 时，是不会立即执行删除操作的，而是仅仅将指定的 Topic 标记为删除状态，之后会启动后台删除线程来删除。

$ bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic op_log

2）kafka-consumer-groups.sh 脚本常用命令，主要用于操作消费组相关的（）

查看消费端的所有消费组信息（ZK 维护消费信息）

$ bin/kafka-consumer-groups.sh --zookeeper localhost:2181 --list

查看消费端的所有消费组信息（Kafka 维护消费信息）

$ bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server localhost:9092 --list

查看指定 group 的详细消费情况，包括当前消费的进度，消息积压量等等信息（ZK 维护消息信息）

$ bin/kafka-consumer-groups.sh --zookeeper localhost:2181 --group console-consumer-1291 --describe

查看指定 group 的详细消费情况，包括当前消费的进度，消息积压量等等信息（Kafka 维护消费信息）

$ bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server localhost:9092 --group mygroup --describe

3）kafka-consumer-offset-checker.sh 脚本常用命令，用于检查 OffSet 相关信息。（注意：该脚本在 0.9 以后可以使用 kafka-consumer-groups.sh 脚本代替，官方已经标注为 deprecated (v.对…表示极不赞成; 强烈反对 )了）

查看指定 group 的 OffSet 信息、所有者等信息

$ bin/kafka-consumer-offset-checker.sh --zookeeper localhost:2181 --topic mytopic --group console-consumer-1291

查看指定 group 的 OffSet 信息，包括消费者所在的 IP 信息等等

$ bin/kafka-consumer-offset-checker.sh --zookeeper localhost:2181 --topic mytopic --group console-consumer-1291 --broker-info

4）kafka-configs.sh 脚本常用命令，该脚本主要用于增加/修改 Kafka 集群的配置信息

对指定 entry 添加配置，其中 entry 可以是 topics、clients；如下，给指定的 topic 添加两个属性，分别为 max.message.byte、flush.message。

$ bin/kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --entity-name mytopic --add-config 'max.message.bytes=50000000,flush.message=5'

对指定 entry 删除配置，其中 entry 可以是 topics、clients；如下。对指定的 topic 删除 flush.message 属性。

$ bin/kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --entity-name mytopic --delete-config 'flush.message'

查看指定 topic 的配置项。

$ bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name mytopic --describe

5）kafka-reassign-partitions.sh 脚本相关常用命令，主要操作 Partition 的负载情况

生成 partition 的移动计划，–broker-list 参数指定要挪动的 Broker 的范围，–topics-to-move-json-file 参数指定 Json 配置文件

$ bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --broker-list "0" --topics-to-move-json-file ~/json/op_log_topic-to-move.json --generate

执行 Partition 的移动计划，–reassignment-json-file 参数指定 RePartition 后 Assigned Replica 的分布情况。

$ bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file ~/json/op_log_reassignment.json --execute

检查当前 rePartition 的进度情况

$ bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file ~/json/op_log_reassignment.json --verify

2.kafka配置解读（总览配置，全部的配置信息）

1）Broker Config：kafka 服务端的配置

zookeeper.connect（可以直接配置）

连接 zookeeper 集群的地址，用于将 kafka 集群相关的元数据信息存储到指定的 zookeeper 集群中

advertised.port

注册到 zookeeper 中的地址端口信息，在 IaaS 环境中，默认注册到 zookeeper 中的是内网地址，通过该配置指定外网访问的地址及端口，advertised.host.name 和 advertised.port 作用和 advertised.port 差不多，在 0.10.x 之后，直接配置 advertised.port 即可，前两个参数被废弃掉了。

auto.create.topics.enable

自动创建 topic，默认为 true。其作用是当向一个还没有创建的 topic 发送消息时，此时会自动创建一个 topic，并同时设置 -num.partition 1 (partition 个数) 和 default.replication.factor (副本个数，默认为 1) 参数。一般该参数需要手动关闭，因为自动创建会影响 topic 的管理，我们可以通过 kafka-topic.sh 脚本手动创建 topic，通常也是建议使用这种方式创建 topic。在 0.10.x 之后提供了 kafka-admin 包，可以使用其来创建 topic。

auto.leader.rebalance.enable

自动 rebalance，默认为 true。其作用是通过后台线程定期扫描检查，在一定条件下触发重新 leader 选举；在生产环境中，不建议开启，因为替换 leader 在性能上没有什么提升。

background.threads

后台线程数，默认为 10。用于后台操作的线程，可以不用改动。

broker.id（可以直接配置）

Broker 的唯一标识，用于区分不同的 Broker。kafka 的检查就是基于此 id 是否在 zookeeper 中/brokers/ids 目录下是否有相应的 id 目录来判定 Broker 是否健康。

compression.type（可以直接配置）

压缩类型。此配置可以接受的压缩类型有 gzip、snappy、lz4。另外可以不设置，即保持和生产端相同的压缩格式。

delete.topic.enable

启用删除 topic。如果关闭，则无法使用 admin 工具进行 topic 的删除操作。

leader.imbalance.check.interval.secondspartition

检查重新 rebalance 的周期时间

leader.imbalance.per.broker.percentage

标识每个 Broker 失去平衡的比率，如果超过改比率，则执行重新选举 Broker 的 leader

log.dir / log.dirs（可以直接配置）

保存 kafka 日志数据的位置。如果 log.dirs 没有设置，则使用 log.dir 指定的目录进行日志数据存储。

log.flush.interval.messagespartition （可以直接配置）

分区的数据量达到指定大小时，对数据进行一次刷盘操作。比如设置了 1024k 大小，当 partition 积累的数据量达到这个数值时则将数据写入到磁盘上。

log.flush.interval.ms（可以直接配置）

数据写入磁盘时间间隔，即内存中的数据保留多久就持久化一次，如果没有设置，则使用 log.flush.scheduler.interval.ms 参数指定的值。

log.retention.bytes

表示 topic 的容量达到指定大小时，则对其数据进行清除操作，默认为-1，永远不删除。

log.retention.hours

标示 topic 的数据最长保留多久，单位是小时

log.retention.minutes

表示 topic 的数据最长保留多久，单位是分钟，如果没有设置该参数，则使用 log.retention.hours 参数

log.retention.ms

表示 topic 的数据最长保留多久，单位是毫秒，如果没有设置该参数，则使用 log.retention.minutes 参数

log.roll.hours

新的 segment 创建周期，单位小时。kafka 数据是以 segment 存储的，当周期时间到达时，就创建一个新的 segment 来存储数据。

log.segment.bytes

segment 的大小。当 segment 大小达到指定值时，就新创建一个 segment。

message.max.bytestopic

能够接收的最大文件大小。需要注意的是 producer 和 consumer 端设置的大小需要一致。

min.insync.replicas

最小副本同步个数。当 producer 设置了 request.required.acks 为-1 时，则 topic 的副本数要同步至该参数指定的个数，如果达不到，则 producer 端会产生异常。

num.io.threads

指定 io 操作的线程数

num.network.threads

执行网络操作的线程数

num.recovery.threads.per.data.dir

每个数据目录用于恢复数据的线程数

num.replica.fetchers

从 leader 备份数据的线程数

offset.metadata.max.bytes

允许消费者端保存 offset 的最大个数

offsets.commit.timeout.ms

offset 提交的延迟时间

offsets.topic.replication.factor

topic 的 offset 的备份数量。该参数建议设置更高保证系统更高的可用性

port

端口号，Broker 对外提供访问的端口号。

request.timeout.ms

Broker 接收到请求后的最长等待时间，如果超过设定时间，则会给客户端发送错误信息

zookeeper.connection.timeout.ms

客户端和 zookeeper 建立连接的超时时间，如果没有设置该参数，则使用 zookeeper.session.timeout.ms 值

connections.max.idle.ms

空连接的超时时间。即空闲连接超过该时间时则自动销毁连接。

2）Producer Config：生产端的配置

bootstrap.servers

服务端列表。即接收生产消息的服务端列表

key.serializer

消息键的序列化方式。指定 key 的序列化类型

value.serializer

消息内容的序列化方式。指定 value 的序列化类型

acks

消息写入 Partition 的个数。通常可以设置为 0，1，all；当设置为 0 时，只需要将消息发送完成后就完成消息发送功能；当设置为 1 时，即 Leader Partition 接收到数据并完成落盘；当设置为 all 时，即主从 Partition 都接收到数据并落盘。

buffer.memory

客户端缓存大小。即 Producer 生产的数据量缓存达到指定值时，将缓存数据一次发送的 Broker 上。

compression.type

压缩类型。指定消息发送前的压缩类型，通常有 none, gzip, snappy, or, lz4 四种。不指定时消息默认不压缩。

retries

消息发送失败时重试次数。当该值设置大于 0 时，消息因为网络异常等因素导致消息发送失败进行重新发送的次数。

3）Consumer Config：消费端的配置

bootstrap.servers
服务端列表。即消费端从指定的服务端列表中拉取消息进行消费。
key.deserializer

消息键的反序列化方式。指定 key 的反序列化类型，与序列化时指定的类型相对应。

value.deserializer

消息内容的反序列化方式。指定 value 的反序列化类型，与序列化时指定的类型相对应。

fetch.min.bytes

抓取消息的最小内容。指定每次向服务端拉取的最小消息量。

group.id

消费组中每个消费者的唯一表示。

heartbeat.interval.ms

心跳检查周期。即在周期性的向 group coordinator 发送心跳，当服务端发生 rebalance 时，会将消息发送给各个消费者。该参数值需要小于 session.timeout.ms，通常为后者的 1/3。

max.partition.fetch.bytes

Partition 每次返回的最大数据量大小。

session.timeout.ms

consumer 失效的时间。即 consumer 在指定的时间后，还没有响应则认为该 consumer 已经发生故障了。

auto.offset.reset

当 kafka 中没有初始偏移量或服务器上不存在偏移量时，指定从哪个位置开始消息消息。earliest：指定从头开始；latest：从最新的数据开始消费

4）Kafka Connect Config：kafka 连接相关的配置

group.id
消费者在消费组中的唯一标识
internal.key.converter

内部 key 的转换类型。

internal.value.converter

内部 value 的转换类型。

key.converter

服务端接收到 key 时指定的转换类型。

value.converter

服务端接收到 value 时指定的转换类型。

bootstrap.servers

服务端列表。

heartbeat.interval.ms

心跳检测，与 consumer 中指定的配置含义相同。

session.timeout.ms
session 有效时间，与 consumer 中指定的配置含义相同

5）配置文件本体（kafka版本2.11-2.3.0）

（1）server

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
//1.每个broker的id需要不同
broker.id=0

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
//2.brokers对外提供的服务入口地址
#listeners=PLAINTEXT://:9092

# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3//网络IO的线程默认为3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
//存放消息日志文件的地址
log.dirs=/tmp/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=localhost:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

（2）consumer

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
# 
#    http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see org.apache.kafka.clients.consumer.ConsumerConfig for more details

# list of brokers used for bootstrapping knowledge about the rest of the cluster
# format: host1:port1,host2:port2 ...
bootstrap.servers=localhost:9092

# consumer group id
group.id=test-consumer-group

# What to do when there is no initial offset in Kafka or if the current
# offset does not exist any more on the server: latest, earliest, none
#auto.offset.reset=

（3）producer

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see org.apache.kafka.clients.producer.ProducerConfig for more details

############################# Producer Basics #############################

# list of brokers used for bootstrapping knowledge about the rest of the cluster
# format: host1:port1,host2:port2 ...
bootstrap.servers=localhost:9092

# specify the compression codec for all data generated: none, gzip, snappy, lz4, zstd
compression.type=none

# name of the partitioner class for partitioning events; default partition spreads data randomly
#partitioner.class=

# the maximum amount of time the client will wait for the response of a request
#request.timeout.ms=

# how long `KafkaProducer.send` and `KafkaProducer.partitionsFor` will block for
#max.block.ms=

# the producer will wait for up to the given delay to allow other records to be sent so that the sends can be batched together
#linger.ms=

# the maximum size of a request in bytes
#max.request.size=

# the default batch size in bytes when batching multiple records sent to a partition
#batch.size=

# the total bytes of memory the producer can use to buffer records waiting to be sent to the server
#buffer.memory=

3.kafka性能检测Eagle监控(可忽略，我重新写篇完整的)

0）开启JMX检测端口

（1）前台方式启动：
JMX_PORT=9997 bin/kafka-server-start.sh  config/server.properties
（2）后台方式启动：
export JMX_PORT=9997 bin/kafka-server-start.sh  -daemon  config/server.properties

1）修改kafka-server-start.sh脚本

分发命令到其他的节点，这样就不用一个一个命令打了，但是这个脚本我没写，还是得自己打

xsync bin/kafka-server-start.sh

2）安装kafka-eagle-1.3.7

tar -zxvf XXXX
cd XXXX
cd bin/
chmod 777 ke.sh

3）修改配置文件system-config.properties

1.根据需要修改kafka集群
2.数组不存储在zk，只存储在kafka
3.metrics.charts=true 表示显示图表数据