flink1.17.0 integrates kafka, and calculates


foreword

Flink is an important integration component for real-time computing. Here is how to integrate it and use a small example. An example is kafka input messages, separated by commas, counting the number of occurrences of each identical word, such a function.


1. Kafka environment preparation

1.1 start kafka

The kafka version I use here is 3.2.0, the deployment method can refer to
kafka deployment

cd kafka_2.13-3.2.0
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

After starting, check whether the java process exists, and execute the next step after it exists.

1.2 Create a new topic

Create a new topic dedicated to flink consumption

bin/kafka-topics.sh --create --topic flinkTest --bootstrap-server 192.168.184.129:9092

1.3 Test whether production and consumption are normal

Production end:

bin/kafka-console-producer.sh --topic flinkTest --bootstrap-server 192.168.184.129:9092

client:

bin/kafka-console-consumer.sh --topic flinkTest --from-beginning --bootstrap-server 192.168.184.129:9092

1.4 Test Production Consumption

Enter aaa on the production side
insert image description here
to check whether the client can consume.
insert image description here
You can see that the client has successfully consumed, and the kafka environment is ready.

2. Flink integrates kafka

2.1 pom file modification

Before modifying the pom file, let’s take a look at the official website’s guidance and dependencies.
Here we use the datastream api to do it.
Flink1.17.0 official document

insert image description here
Here is the version of the dependency package that needs to be imported for the relevant dependencies, and the version of the connection package that needs to be imported when using kafka consumption. The
insert image description here
complete pom import dependency is as follows:

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.wh.flink</groupId>
    <artifactId>flink</artifactId>
    <version>1.0-SNAPSHOT</version>

    <name>flink</name>
    <!-- FIXME change it to the project's website -->
    <url>http://www.example.com</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <flink.version>1.17.1</flink.version>
    </properties>

    <dependencies>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka</artifactId>
            <version>${
    
    flink.version}</version>
        </dependency>
        <!-- Flink 依赖 -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${
    
    flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java</artifactId>
            <version>${
    
    flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>


        <!-- Flink Kafka连接器的依赖 -->
<!--        <dependency>-->
<!--            <groupId>org.apache.flink</groupId>-->
<!--            <artifactId>flink-connector-kafka-0.11_2.11</artifactId>-->
<!--            <version>${
    
    flink.version}</version>-->
<!--        </dependency>-->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.11</version>
            <scope>test</scope>
        </dependency>

        <!-- Flink 开发Scala需要导入以下依赖 -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-scala_2.12</artifactId>
            <version>${
    
    flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_2.12</artifactId>
            <version>${
    
    flink.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>

        <!--<dependency>-->
        <!--<groupId>org.scala-lang</groupId>-->
        <!--<artifactId>scala-library</artifactId>-->
        <!--<version>2.11.12</version>-->
        <!--</dependency>-->

        <!-- log4j 和slf4j 包,如果在控制台不想看到日志,可以将下面的包注释掉-->
        <!--<dependency>-->
        <!--<groupId>org.slf4j</groupId>-->
        <!--<artifactId>slf4j-log4j12</artifactId>-->
        <!--<version>1.7.25</version>-->
        <!--<scope>test</scope>-->
        <!--</dependency>-->
        <!--<dependency>-->
        <!--<groupId>log4j</groupId>-->
        <!--<artifactId>log4j</artifactId>-->
        <!--<version>1.2.17</version>-->
        <!--</dependency>-->
        <!--<dependency>-->
        <!--<groupId>org.slf4j</groupId>-->
        <!--<artifactId>slf4j-api</artifactId>-->
        <!--<version>1.7.25</version>-->
        <!--</dependency>-->
        <!--<dependency>-->
        <!--<groupId>org.slf4j</groupId>-->
        <!--<artifactId>slf4j-nop</artifactId>-->
        <!--<version>1.7.25</version>-->
        <!--<scope>test</scope>-->
        <!--</dependency>-->
        <!--<dependency>-->
        <!--<groupId>org.slf4j</groupId>-->
        <!--<artifactId>slf4j-simple</artifactId>-->
        <!--<version>1.7.5</version>-->
        <!--</dependency>-->



    </dependencies>

    <build>
        <plugins>

            <!-- 在maven项目中既有java又有scala代码时配置 maven-scala-plugin 插件打包时可以将两类代码一起打包 -->
<!--            <plugin>-->
<!--                <groupId>org.scala-tools</groupId>-->
<!--                <artifactId>maven-scala-plugin</artifactId>-->
<!--                <version>2.15.2</version>-->
<!--                <executions>-->
<!--                    <execution>-->
<!--                        <goals>-->
<!--                            <goal>compile</goal>-->
<!--                            <goal>testCompile</goal>-->
<!--                        </goals>-->
<!--                    </execution>-->
<!--                </executions>-->
<!--            </plugin>-->

            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.4</version>
                <configuration>
                    <!-- 设置false后是去掉 MySpark-1.0-SNAPSHOT-jar-with-dependencies.jar 后的 “-jar-with-dependencies” -->
                    <!--<appendAssemblyId>false</appendAssemblyId>-->
                    <archive>
                        <manifest>
                            <mainClass>com.hadoop.demo.service.flinkDemo.FlinkDemo</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>assembly</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

The project structure is shown in the figure
insert image description here

2.2 Code writing

package com.hadoop.demo.service.flinkDemo;

import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.functions.FlatMapIterator;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import java.util.Arrays;
import java.util.Iterator;

public class FlinkDemo {
    
    




    public static void main(String[] args) throws Exception {
    
    
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //新建kafka连接
        KafkaSource<String> kfkSource = KafkaSource.<String>builder()
                .setBootstrapServers("192.168.184.129:9092")
                .setGroupId("flink")
                .setTopics("flinkTest")
                .setStartingOffsets(OffsetsInitializer.earliest())
                .setValueOnlyDeserializer(new SimpleStringSchema())
                .build();
        //添加到flink环境
        DataStreamSource<String> lines = env.fromSource(kfkSource, WatermarkStrategy.noWatermarks(), "kafka source");
        //根据逗号分组
        SingleOutputStreamOperator<Tuple2<String, Integer>> map = lines.flatMap(new FlatMapIterator<String, String>() {
    
    
            @Override
            public Iterator<String> flatMap(String s) throws Exception {
    
    
                return Arrays.asList(s.split(",")).iterator();
            }
        }).map(new MapFunction<String, Tuple2<String, Integer>>() {
    
    
            @Override
            public Tuple2<String, Integer> map(String s) throws Exception {
    
    
                return new Tuple2<>(s, 1);
            }
        });

        //统计每个单词的数量
        SingleOutputStreamOperator<Tuple2<String, Integer>> sum = map.keyBy(0).sum(1);
        sum.print();
        //System.out.println(sum.get);
        env.execute();
    }

}

2.3 maven packaginginsert image description here

Click the package button, pay attention to select the jar package with dependencies, otherwise the following error will appear.

NoClassDefFoundError: org/apache/flink/connector/kafka/source/KafkaSource

3. Test

3.1 Start hadoop cluster, start flink cluster

If you don’t know how to build these two clusters here, you can read my other articles
hadoop integrated flink

./hadoop.sh start
./bin/yarn-session.sh --detached

3.2 Upload the jar package to the flink cluster

insert image description here
After uploading, fill in the main class name and click Submit
insert image description here

3.3 Testing

After clicking, you can see the execution job. Here you can see the running job.
insert image description here
Click on the running task .
insert image description here
Click on
insert image description here
the output. Here you can see the output content.
Enter the content on the kafka consumer side.
insert image description here
The jbs here appears 4 times. Look at the output console.
insert image description here
You can see that it has been added four times in turn, indicating that the statistics have taken effect.


Summarize

Here is just a simple flink example of consuming kafka. After the consumption is successful, it can be sent out through the sink, and can also be converted with transform. I will demonstrate it later here. If it is wrong, you can point it out.

Guess you like

Origin blog.csdn.net/qq_34526237/article/details/130968153