KafkaMirror Maker erection notes

Use Kafka Mirror Maker fortunate to synchronize data in two places, the following is a personal note-taking (unfinished).

Open JMX

Using JMX monitoring Kafka sensitive data needs to open JMX interface, can be achieved by setting environment variables. My own approach is to add the following configuration kafka-server-start.sh script file

...
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
    export KAFKA_HEAP_OPTS="-Xmx6G -Xms6G"
    # 增加JMX_PORT配置
    export JMX_PORT="9999"
...

Consumer Configuration

# 消费目标集群
bootstrap.servers=<kafka1host_from>:<port>,<kafka2host_from>:<port>,<kafka3host_from>:<port>
# 消费组的ID
group.id=mm-test-consumer-group
# 选取镜像数据的起始?即镜像MirrorMaker启动后的数据,参数latest,还是镜像之前的数据,参数earliest
auto.offset.reset=earliest
# 消费者心跳数据,默认3000,由于是远程镜像,此处设为30秒
heartbeat.interval.ms=30000
# 消费连接超时值,默认10000,由于远程镜像,此处设为100秒
session.timeout.ms=100000
# 更改分区策略,默认是range,虽然有一定优势但会导致不公平现象,特别是镜像大量的主题和分区的时候
partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor
# 单个poll()执行的最大record数,默认是500
max.poll.records=20000
# 读数据时tcp接收缓冲区大小,默认是65536(64KiB)
receive.buffer.bytes=4194304
# 设置每个分区总的大小,默认是1048576
max.partition.fetch.bytes=10485760

Producers Configuration

# 生产者目的集群
bootstrap.servers=<kafka1host_to>:<port>,<kafka2host_to>:<port>,<kafka3host_to>:<port>
# 开启压缩
compression.type=snappy

Excuting an order

nohup bin/kafka-mirror-maker --consumer.config config/consumer.properties --num.streams 2 --producer.config config/producer.properties --whitelist '*.test|*.temp' >/var/log/mirrormaker/mirrormaker.log 2>&1 & 

Description:

  1. --num.streams: a stream is a consumer, a producer of all consumer public.
  2. --whitelist: white list shows that need to be synchronized, you can use '|' to connect multiple topic, you can also use java-style regular expressions, and this corresponds to that blacklist.
  3. For logging files suggestions into / var / log / folder and use logrotate periodic management to achieve logs to be investigated, but also to avoid the log files to fill the entire disk.
  4. Mirror Maker general purpose cluster deployment and that a reliable producer (data input to the destination cluster) is more important, can not connect to a cluster of consumer than a producer can not connect to the cluster to be much safer.

Maintaining configuration data is not lost

For the consumer

enable.auto.commit=false

For producer

max.in.flight.requests.per.connection=1
retries=Integer.MAX_VALUE

max.block.ms=Long.MAX_VALUE

For MirrorMaker

Increase parameter

--abort.on.send.failure

For the effect of MirrorMaker

  • Mirror maker will only send one request to a broker at any given point.
  • If any exception is caught in mirror maker thread, mirror maker will try to commit the acked offsets then exit immediately.
  • For RetriableException in producer, producer will retry indefinitely. If retry did not work, eventually the entire mirror maker will block on producer buffer full.
  • For None-retriable exception, if --abort.on.send.fail is specified, stops the mirror maker. Otherwise producer callback will record the message that was not successfully sent but let the mirror maker move on. In this case, that message will be lost in target cluster.

As the last point stated if there is any error occurred your mirror maker process will be killed. So users are recommend to use a watchdog process like supervisord to restart the killed mirrormaker process.

Each issue summary

Java version switch

When installing a plurality java versions, sometimes necessary when debugging switching between multiple java versions, you can use the Command update-alternatives to achieve, in particular

# update-alternatives --config java
...
# update-alternatives --config javac

Then choose what you need based on java versions listed.

kafka not consumption data

After kafka mirror maker operation, the cluster can not consume the data object (in addition to its own kafka kafka-console-consumer

Guess you like

Origin blog.51cto.com/huanghai/2424406