1. Kafka installation and simple use

Chapter 1 Kafka Overview
1.1 Definition

Traditional definition of Kafka: Kafka is a distributed message queue (Message Queue) based on the publish/subscribe model, which is mainly used in the field of real-time processing of big data.
The latest definition of Kafka: Kafka is an open source distributed event streaming platform (Event Streaming Platform) used by thousands of companies for high-performance data pipelines, stream analysis, data integration and mission-critical applications.

Insert image description here

1.2 Message Queue
Currently, the more common message queue products in enterprises mainly include Kafka, ActiveMQ, RabbitMQ, RocketMQ, etc.
In big data scenarios, Kafka is mainly used as the message queue. ActiveMQ, RabbitMQ, and RocketMQ are mainly used in JavaEE development.

1.2.1 Application scenarios of traditional message queues

The main application scenarios of traditional message queues include: caching/peak elimination, decoupling and asynchronous communication.

Buffering/peak shedding: Helps control and optimize the speed of data flow through the system, and solves the problem of inconsistent processing speeds of production messages and consumer messages.

Insert image description here
Decoupling: Allows you to extend or modify the processes on both sides independently, as long as they comply with the same interface constraints

Insert image description here
Asynchronous communication: allows users to put a message into the queue but not process it immediately, and then process them when needed.

Insert image description here
1.2.2 Two modes of message queue

1) Point-to-point mode
• Consumers actively pull data and clear messages after receiving them.
Insert image description here
2) Publish/subscribe mode
• There can be multiple topics (browse, like, collect, comment, etc.)
• After consumers consume data, no Delete data
•Each consumer is independent of each other and can consume data
Insert image description here

1.3 Kafka infrastructure
Insert image description here
(1) Producer: The message producer is the client that sends messages to the Kafka broker.
(2) Consumer: Message consumer, the client that obtains messages from the Kafka broker.
(3) Consumer Group (CG): Consumer group consists of multiple consumers. Each consumer in the consumer group
is responsible for consuming data from different partitions. A partition can only be consumed by consumers in one group; consumer groups do not
affect each other. All consumers belong to a certain consumer group, that is, the consumer group is a logical subscriber.
(4) Broker: A Kafka server is a broker. A cluster consists of multiple brokers. A
broker can accommodate multiple topics.
(5) Topic: It can be understood as a queue, and both producers and consumers are oriented to the same topic.
(6) Partition: In order to achieve scalability, a very large topic can be distributed to multiple brokers (ie servers
). A topic can be divided into multiple partitions, and each partition is an ordered queue.
(7) Replica: copy. Each partition of a topic has several copies, a Leader and several
Followers.
(8) Leader: The "master" of multiple copies of each partition, the object to which producers send data, and the object to which consumers consume data
are all Leaders.
(9) Follower: The "slave" among multiple copies of each partition, synchronizes data from the Leader in real time, and maintains the
Leader data synchronization. When the Leader fails, a Follower will become the new Leader.

Chapter 2 Kafka Quick Start

2.1 Installation and deployment
2.1.1 Cluster planning
Insert image description here
2.1.2 Cluster deployment
0) Official download address: http://kafka.apache.org/downloads.html
1) Unzip the installation package

[hadoop102 software]$tar -zxvf kafka_2.12-3.0.0.tgz -C 
/opt/module/

2) Modify the decompressed file name

[hadoop102 module]$ mv kafka_2.12-3.0.0/ kafka

3) Enter the /opt/module/kafka directory and modify the configuration file

[hadoop102 kafka]$ cd config/
[hadoop102 config]$ vim server.properties

Enter the following:

#broker 的全局唯一编号,不能重复,只能是数字。
broker.id=0
#处理网络请求的线程数量
num.network.threads=3
#用来处理磁盘 IO 的线程数量
num.io.threads=8
#发送套接字的缓冲区大小
socket.send.buffer.bytes=102400
#接收套接字的缓冲区大小
socket.receive.buffer.bytes=102400
#请求套接字的缓冲区大小
socket.request.max.bytes=104857600
#kafka 运行日志(数据)存放的路径,路径不需要提前创建,kafka 自动帮你创建,可以
配置多个磁盘路径,路径与路径之间可以用","分隔
log.dirs=/opt/module/kafka/datas
#topic 在当前 broker 上的分区个数
num.partitions=1
#用来恢复和清理 data 下数据的线程数量
num.recovery.threads.per.data.dir=1
# 每个 topic 创建时的副本数,默认时 1 个副本
offsets.topic.replication.factor=1
#segment 文件保留的最长时间,超时将被删除
log.retention.hours=168
#每个 segment 文件的大小,默认最大 1G
log.segment.bytes=1073741824
# 检查过期数据的时间,默认 5 分钟检查一次是否数据过期
log.retention.check.interval.ms=300000
#配置连接 Zookeeper 集群地址(在 zk 根目录下创建/kafka,方便管理)
zookeeper.connect=hadoop102:2181,hadoop103:2181,hadoop104:2181/ka
fka

4) Distribute installation packages

[hadoop102 module]$ xsync kafka/

5) Modify
broker.id=1 and broker.id=2 in the configuration file /opt/module/kafka/config/server.properties on hadoop103 and hadoop104 respectively
. Note: broker.id must not be repeated and is unique in the entire cluster.

[hadoop103 module]$ vim kafka/config/server.properties
修改:
# The id of the broker. This must be set to a unique integer for 
each broker.
broker.id=1
[atguigu@hadoop104 module]$ vim kafka/config/server.properties
修改:
# The id of the broker. This must be set to a unique integer for 
each broker.
broker.id=2

6) Configure environment variables
(1) Add kafka environment variable configuration in the /etc/profile.d/my_env.sh file

 sudo vim /etc/profile.d/my_env.sh

Add the following:
#KAFKA_HOME

export KAFKA_HOME=/opt/module/kafka
export PATH=$PATH:$KAFKA_HOME/bin

(2) Refresh the environment variables.

[hadoop102 module]$ source /etc/profile

(3) Distribute environment variable files to other nodes and source them.

[hadoop102 module]$ sudo /home/atguigu/bin/xsync 
/etc/profile.d/my_env.sh
[hadoop103 module]$ source /etc/profile
[hadoop104 module]$ source /etc/profile

7) Start the cluster
(1) First start the Zookeeper cluster, and then start Kafka.
[atguigu@hadoop102 kafka]$ zk.sh start
(2) Start Kafka on hadoop102, hadoop103, and hadoop104 nodes in sequence.

[hadoop102 kafka]$ bin/kafka-server-start.sh -daemon
config/server.properties
[hadoop103 kafka]$ bin/kafka-server-start.sh -daemon
config/server.properties
[hadoop104 kafka]$ bin/kafka-server-start.sh -daemon
config/server.properties

Note: The path to the configuration file must be able to reach server.properties

8) Shut down the cluster

[hadoop102 kafka]$ bin/kafka-server-stop.sh 
[ahadoop103 kafka]$ bin/kafka-server-stop.sh 
[ahadoop104 kafka]$ bin/kafka-server-stop.sh

2.1.3 Cluster startup and shutdown script
1) Create the kf.sh script file in the /home/atguigu/bin directory

[hadoop102 bin]$ vim kf.sh

The script is as follows:

#! /bin/bash
case $1 in
"start"){
    
    
 for i in hadoop102 hadoop103 hadoop104
 do
 echo " --------启动 $i Kafka-------"
 ssh $i "/opt/module/kafka/bin/kafka-server-start.sh -
daemon /opt/module/kafka/config/server.properties"
 done
};;
"stop"){
    
    
 for i in hadoop102 hadoop103 hadoop104
 do
 echo " --------停止 $i Kafka-------"
 ssh $i "/opt/module/kafka/bin/kafka-server-stop.sh "
 done
};;
esac

2) Add execution permissions

[hadoop102 bin]$ chmod +x kf.sh

3) Start cluster command

[hadoop102 ~]$ kf.sh start

4) Stop cluster command

[hadoop102 ~]$ kf.sh stop

Note: When stopping the Kafka cluster, be sure to wait until all Kafka node processes have stopped before stopping the Zookeeper
cluster. Because the Zookeeper cluster records information related to the Kafka cluster, once the Zookeeper cluster is stopped first, the
Kafka cluster has no way to obtain the information about the stopped process and can only manually kill the Kafka process.

2.2 Kafka command line operation
Insert image description here
2.2.1 Topic command line operation
1) View the operation topic command parameters

[hadoop102 kafka]$ bin/kafka-topics.sh

Insert image description here
2) View all topics in the current server

[ahadoop102 kafka]$ bin/kafka-topics.sh --bootstrap-server 
hadoop102:9092 --list

3) Create first topic

[hadoop102 kafka]$ bin/kafka-topics.sh --bootstrap-server 
hadoop102:9092 --create --partitions 1 --replication-factor 3 --
topic first

Option description:
–topic defines the topic name
–replication-factor defines the number of copies
–partitions defines the number of partitions
4) View the details of the first topic

[hadoop102 kafka]$ bin/kafka-topics.sh --bootstrap-server 
hadoop102:9092 --describe --topic first

5) Modify the number of partitions (note: the number of partitions can only be increased, not reduced)

[hadoop102 kafka]$ bin/kafka-topics.sh --bootstrap-server 
hadoop102:9092 --alter --topic first --partitions 3

6) Check the details of the first topic again

[hadoop102 kafka]$ bin/kafka-topics.sh --bootstrap-server 
hadoop102:9092 --describe --topic first

7) Delete topic

[hadoop102 kafka]$ bin/kafka-topics.sh --bootstrap-server 
hadoop102:9092 --delete --topic first

2.2.2 Producer command line operation
1) View the operation producer command parameters

[hadoop102 kafka]$ bin/kafka-console-producer.sh

Insert image description here
2) Send message

[hadoop102 kafka]$ bin/kafka-console-producer.sh --
bootstrap-server hadoop102:9092 --topic first
>hello world
>atguigu atguigu

2.2.3 Consumer command line operation
1) View the operation consumer command parameters

[hadoop102 kafka]$ bin/kafka-console-consumer.sh

Insert image description here
2) Consume messages
(1) Consume the data in the first topic.

[hadoop102 kafka]$ bin/kafka-console-consumer.sh --
bootstrap-server hadoop102:9092 --topic first

(2) Read all the data in the topic (including historical data).

[hadoop102 kafka]$ bin/kafka-console-consumer.sh --
bootstrap-server hadoop102:9092 --from-beginning --topic first

Guess you like

Origin blog.csdn.net/weixin_45817985/article/details/133376495