Flink-pom project construction and simple WordCount program (Java)

Build pom

It is strongly recommended to use the official recommended writing method and enter the following code on the command line (you don’t need to type it, just change your flink version number, I used 1.9.1)

mvn archetype:generate \
-DarchetypeGroupId=org.apache.flink \
-DarchetypeArtifactId=flink-quickstart-java \
-DarchetypeVersion=1.9.1

Then follow the prompts and enter the groupId, artId, package and other information.
What are the benefits of using this official case to create a pom?

  1. The pom has been written for you, and the log configuration
  2. It also comes with a demo, an offline demo, and a real-time demo

Write a WordCount program (without lambda)

First, open a port 8888 on the flink cluster you configured, such as the master node:

nc -lk 8888

code show as below:

public class StreamWordCount {
    
    
    public static void main(String[] args) throws Exception {
    
    
        // 1.创建一个flink stream 程序执行的环境
        StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
        // 2.通过这个环境创建一个抽象的的数据集dataStream
        DataStreamSource<String> dataStream = environment.socketTextStream("192.168.237.130", 8888);
        // 3.调用dataStream上的方法 ,如:transformation(可以不调用) 和sink(必须调用,类似于spark的action,提交动作)。
        // 调用transformation会将一个dataStream转换为一个新的dataStream
        SingleOutputStreamOperator<String> dataStream2 = dataStream.flatMap(new FlatMapFunction<String, String>() {
    
    
            public void flatMap(String line, Collector<String> out) throws Exception {
    
    
                // 将一行单词进行切分
                String[] words = line.split(" ");
                for (String word : words) {
    
    
                    // 切分后输出
                    out.collect(word);
                }
            }
        });
        // 4.将单词和数字1进行组合,返回一个dataStream
        SingleOutputStreamOperator<Tuple2<String, Integer>> dataStream3 =
                dataStream2.map(new MapFunction<String, Tuple2<String, Integer>>() {
    
    
            public Tuple2<String, Integer> map(String word) throws Exception {
    
    
                return Tuple2.of(word, 1);
            }
        });
        // 5.进行分组聚合,根据单词进行keyBy,然后把对应的第一个数据进行累加。这里的数字是下标,对应的Tuple2<String,Integer>
        SingleOutputStreamOperator<Tuple2<String, Integer>> dataStream4 =
                dataStream3.keyBy(0).sum(1);
        // 到这里transformation结束

        // 6.调用sink
        dataStream4.print();

        // 7.启动
        environment.execute("StreamWordCount");
    }
}

Then enter on the terminal:
Insert picture description here
At this time, the printer of idea will appear:
Insert picture description here

Write a WordCount program (using lambda)

Code:

public class LambdaStreamWordCount {
    
    
    public static void main(String[] args) throws Exception {
    
    
        // 创建环境
        StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
        // 创建第一个dataStream
        DataStreamSource<String> dataStream = environment.socketTextStream("192.168.237.130", 8888);
        SingleOutputStreamOperator<Tuple2<String, Integer>> dataStream2 = dataStream.flatMap((String line, Collector<Tuple2<String, Integer>> out) -> {
    
    
            Arrays.stream(line.split(" ")).forEach(word -> {
    
    
                out.collect(Tuple2.of(word, 1));
            });
        })      // 如果使用了lambda表达式,必须使用returns来返回一个规定的类型
                .returns(Types.TUPLE(Types.STRING, Types.INT));

        SingleOutputStreamOperator<Tuple2<String, Integer>> sum = dataStream2.keyBy(0)
                .sum(1);

        sum.print();
        environment.execute("LambdaStreamWordCount");
    }
}

Type the program into a jar package and put it on the page for execution

  1. The line where the code needs to be changed: (The first parameter is ip, the second parameter is the port)
DataStreamSource<String> dataStream = environment.socketTextStream(args[0], Integer.parseInt(args[1]));

No other code changes

  1. Enter the command on the terminal:
mvn clean package

Insert picture description here

  1. Upload to flink: (enter the full class name and port number + ip of this class) For
    specific operations, see the content of another blog:
    Flink-introduction and standalone cluster installation and simple test page operations to start the Task module (different The full class name will not be automatically completed, you need to fill in by yourself)
    Insert picture description here
  2. Enter on the terminal
    Insert picture description here
  3. View Results:

Insert picture description here

Guess you like

Origin blog.csdn.net/Zong_0915/article/details/107735743