Flink之Time与Watermark

(1)时间分类

在Flink的流式处理中,会涉及到时间的不同概念,如下图所示
事件时间EventTime: 事件真真正正发生产生的时间
摄入时间IngestionTime: 事件到达Flink的时间
处理时间ProcessingTime: 事件真正被处理/计算的时间

上面的三个时间,我们更关注事件时间EventTime

在这里插入图片描述

(2)Watermark详解

(2.1)Watermark图解

在这里插入图片描述
在这里插入图片描述

(2.2)什么是Watermark?

Watermark就是给数据再额外的加的一个时间列也就是Watermark是个时间戳!

(2.3)如何计算Watermark?

Watermark =当前窗口的最大的事件时间 - 最大允许的延迟时间或乱序时间

这样可以保证Watermark水位线会一直上升(变大),不会下降

(2.4)Watermark有什么用?

之前的窗口都是按照系统时间来触发计算的,如:[10:00:00~10:00:10) 的窗口,一但系统时间到了10:00:10就会触发计算,那么可能会导致延迟到达的数据丢失!那么现在有了Watermark,窗口就可以按照Watermark来触发计算!

也就是说Watermark是用来触发窗口计算的!

(2.5)Watermark如何出发窗口计算?

窗口计算的触发条件为:

  1. 窗口中有数据
  2. Watermaker >= 窗口的结束时间

注意:
上面的触发公式进行如下变形:

Watermaker >= 窗口的结束时间
Watermaker = 当前窗口的最大的事件时间 - 最大允许的延迟时间或乱序时间
当前窗口的最大的事件时间 - 最大允许的延迟时间或乱序时间 >=  窗口的结束时间
当前窗口的最大的事件时间 >= 窗口的结束时间 + 最大允许的延迟时间或乱序时间

Watermark API:

(3)EventTime 和 WaterMark 的使用

Flink 内置了两个 WaterMark 生成器:

  1. Monotonously Increasing Timestamps(时间戳单调增长:其实就是允许的延迟为 0)
WatermarkStrategy.<WaterSensor>forMonotonousTimestamps()
  1. Fixed Amount of Lateness(允许固定时间的延迟)
WatermarkStrategy.<WaterSensor>forBoundedOutOfOrderness(Duration.ofSeconds(2))

(3.1)基于事件时间的滚动窗口测试watermark机制

代码开发:

package com.aikfk.flink.datastream.bean;

/**
 * @author :caizhengjie
 * @description:TODO
 * @date :2021/3/20 9:19 下午
 * 水位传感器:用于接收水位数据
 * <p>
 * id:传感器编号
 * ts:时间戳
 * vc:水位
 */
public class WaterSensor {
    
    
    private String id;
    private Long ts;
    private Integer vc;

    public WaterSensor(String id, Long ts, Integer vc) {
    
    
        this.id = id;
        this.ts = ts;
        this.vc = vc;
    }

    public String getId() {
    
    
        return id;
    }

    public void setId(String id) {
    
    
        this.id = id;
    }

    public Long getTs() {
    
    
        return ts;
    }

    public void setTs(Long ts) {
    
    
        this.ts = ts;
    }

    public Integer getVc() {
    
    
        return vc;
    }

    public void setVc(Integer vc) {
    
    
        this.vc = vc;
    }

    @Override
    public String toString() {
    
    
        return "WaterSensor{" +
                "id='" + id + '\'' +
                ", ts=" + ts +
                ", vc=" + vc +
                '}';
    }
}
package com.aikfk.flink.datastream.watermark;

import com.aikfk.flink.datastream.bean.WaterSensor;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;

import java.time.Duration;

/**
 * @author :caizhengjie
 * @description:基于事件事件滚动窗口测试watermark机制
 * @date :2021/3/20 9:21 下午
 */
public class EventTimeTumbling {
    
    

    public static void main(String[] args) throws Exception {
    
    

        // 1.获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 2.读取端口数据并转换为JavaBean
        SingleOutputStreamOperator<WaterSensor> waterSensorDS = env.socketTextStream("bigdata-pro-m07", 9999)
                .map(new MapFunction<String, WaterSensor>() {
    
    
                    @Override
                    public WaterSensor map(String s) throws Exception {
    
    
                        String[] split = s.split(",");
                        return new WaterSensor(split[0],Long.parseLong(split[1]),Integer.parseInt(split[2]));
                    }
                });

        // 3.提取数据中的时间戳字段
        SingleOutputStreamOperator<WaterSensor> waterSensorSingleOutputStreamOperator = waterSensorDS
                .assignTimestampsAndWatermarks(WatermarkStrategy
                        // 设置最大允许的延迟时间
                        .<WaterSensor>forBoundedOutOfOrderness(Duration.ofSeconds(2))
                        // 指定事时间件列
                .withTimestampAssigner(new SerializableTimestampAssigner<WaterSensor>() {
    
    
                    @Override
                    public long extractTimestamp(WaterSensor element, long recordTimestamp) {
    
    
                        return element.getTs() * 1000L;
                    }
                }));

        // 4.按照id分组
        KeyedStream<WaterSensor, String> keyedStream = waterSensorSingleOutputStreamOperator.keyBy(WaterSensor::getId);

        // 5.开窗
        WindowedStream<WaterSensor, String, TimeWindow> window = keyedStream.window(TumblingEventTimeWindows.of(Time.seconds(5)));

        // 6.计算总和
        SingleOutputStreamOperator<WaterSensor> result = window.reduce(new ReduceFunction<WaterSensor>() {
    
    
            @Override
            public WaterSensor reduce(WaterSensor t1, WaterSensor t2) throws Exception {
    
    
                return new WaterSensor(t1.getId(),t1.getTs(),t1.getVc() + t2.getVc());
            }
        });

        // 7.打印
        result.print();

        // 8.执行任务
        env.execute();
    }
}

测试非乱序数据:

ws_001,1577844001,1
ws_001,1577844002,1
ws_001,1577844003,1
ws_001,1577844005,1
ws_001,1577844006,1
ws_001,1577844009,1

运行结果:

WaterSensor{
    
    id='ws_001', ts=1577844001, vc=3}

运行过程解释:
因为滚动窗口是基于事件时间0到5秒,左闭右开[0,5)。输入的数据事件时间1到3秒时,会落入窗口为[0,5),当输入的数据事件时间为t(比如是9秒),假设设置最大允许的延迟时间为2秒,即watermark为7秒,而wm >= 窗口最大边界值5秒,所以触发[0,5)的窗口,得到的结果为vc = 3
在这里插入图片描述

测试乱序数据:

ws_001,1577844001,1
ws_001,1577844002,1
ws_001,1577844003,1
ws_001,1577844005,1
ws_001,1577844001,1
ws_001,1577844002,1
ws_001,1577844009,1

运行结果:

WaterSensor{
    
    id='ws_001', ts=1577844001, vc=5}

运行过程解释:

因为滚动窗口是基于事件时间0到5秒,左闭右开[0,5)。输入的数据事件时间1到3秒时,会落入窗口为[0,5),后面来了第5秒的数据,落入的窗口为[5,10),再后面又来了1,2秒的数据,为迟到的数据,因为在来第5秒数据的时候,wm为3秒,它是小于窗口的边界值,所以[0,5)窗口没有关闭,因此来的1,2秒数据会落入到[0,5)窗口中。当输入的数据事件时间为t(比如是9秒),假设设置最大允许的延迟时间为2秒,即watermark为7秒,而wm >= 窗口最大边界值5秒,所以触发[0,5)的窗口,得到的结果为vc = 5.
在这里插入图片描述

(3.2)基于事件时间的滚动窗口测试允许迟到数据(allowedLateness)机制与侧输出流(sideOutput)

已经添加了 wartemark 之后, 仍有数据会迟到怎么办? Flink 的窗口, 也允许迟到数据.

当触发了窗口计算后, 会先计算当前的结果, 但是此时并不会关闭窗口.以后每来一条 迟到数据, 则触发一次这条数据所在窗口计算(增量计算).

那么什么时候会真正的关闭窗口呢? wartermark 超过了 窗口结束时间+等待时间

.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.allowedLateness(Time.seconds(3))

注意:允许迟到只能运用在 event time 上

允许迟到数据, 窗口也会真正的关闭, 如果还有迟到的数据怎么办? Flink 提供了一种叫做侧输出流的来处理关窗之后到达的数据.

代码开发:

package com.aikfk.flink.datastream.watermark;

import com.aikfk.flink.datastream.bean.WaterSensor;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.OutputTag;

import java.time.Duration;

/**
 * @author :caizhengjie
 * @description:基于事件事件滚动窗口测试watermark机制
 * @date :2021/3/20 9:21 下午
 */
public class LateAndSideOutPut {
    
    

    public static void main(String[] args) throws Exception {
    
    

        // 1.获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 2.读取端口数据并转换为JavaBean
        SingleOutputStreamOperator<WaterSensor> waterSensorDS = env.socketTextStream("bigdata-pro-m07", 9999)
                .map(new MapFunction<String, WaterSensor>() {
    
    
                    @Override
                    public WaterSensor map(String s) throws Exception {
    
    
                        String[] split = s.split(",");
                        return new WaterSensor(split[0],Long.parseLong(split[1]),Integer.parseInt(split[2]));
                    }
                });

        // 3.提取数据中的时间戳字段
        SingleOutputStreamOperator<WaterSensor> waterSensorSingleOutputStreamOperator = waterSensorDS
                .assignTimestampsAndWatermarks(WatermarkStrategy
                        // 设置最大允许的延迟时间
                        .<WaterSensor>forBoundedOutOfOrderness(Duration.ofSeconds(2))
                        // 指定事时间件列
                .withTimestampAssigner(new SerializableTimestampAssigner<WaterSensor>() {
    
    
                    @Override
                    public long extractTimestamp(WaterSensor element, long recordTimestamp) {
    
    
                        return element.getTs() * 1000L;
                    }
                }));

        // 4.按照id分组
        KeyedStream<WaterSensor, String> keyedStream = waterSensorSingleOutputStreamOperator.keyBy(WaterSensor::getId);

        // 5.开窗,允许迟到数据,侧输出流
        WindowedStream<WaterSensor, String, TimeWindow> window = keyedStream.window(TumblingEventTimeWindows.of(Time.seconds(5)))
                .allowedLateness(Time.seconds(2))
                .sideOutputLateData(new OutputTag<WaterSensor>("Side") {
    
    
                });

        // 6.计算总和
        SingleOutputStreamOperator<WaterSensor> result = window.reduce(new ReduceFunction<WaterSensor>() {
    
    
            @Override
            public WaterSensor reduce(WaterSensor t1, WaterSensor t2) throws Exception {
    
    
                return new WaterSensor(t1.getId(),t1.getTs(),t1.getVc() + t2.getVc());
            }
        });
        DataStream<WaterSensor> sideOutput = result.getSideOutput(new OutputTag<WaterSensor>("Side") {
    
    
        });

        // 7.打印
        result.print();
        sideOutput.print("Side");

        // 8.执行任务
        env.execute();
    }
}

测试数据:

ws_001,1577844001,1
ws_001,1577844002,1
ws_001,1577844003,1
ws_001,1577844008,1
ws_001,1577844001,1
ws_001,1577844002,1
ws_001,1577844003,1
ws_001,1577844009,1
ws_001,1577844001,1
ws_001,1577844002,1

运行结果:

WaterSensor{
    
    id='ws_001', ts=1577844001, vc=3}
WaterSensor{
    
    id='ws_001', ts=1577844001, vc=4}
WaterSensor{
    
    id='ws_001', ts=1577844001, vc=5}
WaterSensor{
    
    id='ws_001', ts=1577844001, vc=6}
Side> WaterSensor{
    
    id='ws_001', ts=1577844001, vc=1}
Side> WaterSensor{
    
    id='ws_001', ts=1577844002, vc=1}

运行过程解释:

因为滚动窗口是基于事件时间0到5秒,左闭右开[0,5)。输入的数据事件时间1到3秒时,会落入窗口为[0,5),后面来了第8秒的数据,假设设置最大允许的延迟时间为2秒 ,此时的wm = 6秒大于窗口的最大边界值,触发窗口计算,所以输入第8秒的数据会得到vc=3,但是由于添加了允许迟到数据(allowedLateness)机制,设置允许迟到时间是2秒,因此窗口并没有关闭,而是持续到了wm = 7秒,后面来了1,2,3秒的迟到数据,还会落入到[0,5)窗口中,但是是来一条迟到数据则触发一次这条数据所在窗口计算(增量计算)。当输入的数据事件时间为t(比如是9秒),即watermark为7秒,而wm >= 窗口结束时间+等待时间,窗口关闭,后面再来的1,2秒迟到数据就不会落入到[0,5)窗口中,即通过侧输出流来处理关窗之后到达的数据。

在这里插入图片描述

(3.3)基于事件时间的滑动窗口测试watermark机制

代码开发:

package com.aikfk.flink.datastream.watermark;

import com.aikfk.flink.datastream.bean.WaterSensor;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.SlidingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;

import java.time.Duration;

/**
 * @author :caizhengjie
 * @description:基于事件事件滚动窗口测试watermark机制
 * @date :2021/3/20 9:21 下午
 */
public class EventTimeSliding {
    
    

    public static void main(String[] args) throws Exception {
    
    

        // 1.获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 2.读取端口数据并转换为JavaBean
        SingleOutputStreamOperator<WaterSensor> waterSensorDS = env.socketTextStream("bigdata-pro-m07", 9999)
                .map(new MapFunction<String, WaterSensor>() {
    
    
                    @Override
                    public WaterSensor map(String s) throws Exception {
    
    
                        String[] split = s.split(",");
                        return new WaterSensor(split[0],Long.parseLong(split[1]),Integer.parseInt(split[2]));
                    }
                });

        // 3.提取数据中的时间戳字段,生成watermark
       SingleOutputStreamOperator<WaterSensor> waterSensorSingleOutputStreamOperator = waterSensorDS
                .assignTimestampsAndWatermarks(WatermarkStrategy
                        // 设置最大允许的延迟时间
                        .<WaterSensor>forBoundedOutOfOrderness(Duration.ofSeconds(2))
                        // 指定事时间件列
                .withTimestampAssigner(new SerializableTimestampAssigner<WaterSensor>() {
    
    
                    @Override
                    public long extractTimestamp(WaterSensor element, long recordTimestamp) {
    
    
                        return element.getTs() * 1000L;
                    }
                }));

        // 4.按照id分组
        KeyedStream<WaterSensor, String> keyedStream = waterSensorSingleOutputStreamOperator.keyBy(WaterSensor::getId);

        // 5.开窗
        WindowedStream<WaterSensor, String, TimeWindow> window = keyedStream.window(SlidingEventTimeWindows.of(Time.seconds(6), Time.seconds(2)));

        // 6.计算总和
        SingleOutputStreamOperator<WaterSensor> result = window.reduce(new ReduceFunction<WaterSensor>() {
    
    
            @Override
            public WaterSensor reduce(WaterSensor t1, WaterSensor t2) throws Exception {
    
    
                return new WaterSensor(t1.getId(),t1.getTs(),t1.getVc() + t2.getVc());
            }
        });

        // 7.打印
        result.print();

        // 8.执行任务
        env.execute();
    }
}

测试数据:

ws_001,1577844001,1
ws_001,1577844008,1
ws_001,1577844012,1

运行结果:

WaterSensor{
    
    id='ws_001', ts=1577844001, vc=1}
WaterSensor{
    
    id='ws_001', ts=1577844001, vc=1}
WaterSensor{
    
    id='ws_001', ts=1577844001, vc=1}
WaterSensor{
    
    id='ws_001', ts=1577844008, vc=1}

运行过程解释:
程序中设置的滑动窗口大小为6秒,步长为2秒,当输入的数据事件时间为1秒时,所属的窗口为[-4,2),[-2,4),[0,6)这三个窗口中,当输入的数据事件时间为8秒时,wm为6秒 >= [0,6)这个窗口的最大边界值,关闭窗口,触发前面三个窗口计算,所以直接输出三个结果。而8秒属于[4,10),[6,12),[8,14)这三个窗口,如果想输出一个结果,则输出数据事件时间为12秒,wm为10秒 >= [4,10)这个窗口的最大边界值,触发窗口计算,得到一个结果。

(3.4)基于事件时间的会话窗口测试watermark机制

时间间隔:指的是WaterMark跟数据本身的时间差值,包含间隔时间

代码开发:

package com.aikfk.flink.datastream.watermark;

import com.aikfk.flink.datastream.bean.WaterSensor;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.EventTimeSessionWindows;
import org.apache.flink.streaming.api.windowing.assigners.SlidingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;

import java.time.Duration;

/**
 * @author :caizhengjie
 * @description:基于事件事件滚动窗口测试watermark机制
 * @date :2021/3/20 9:21 下午
 */
public class EventTimeSession {
    
    

    public static void main(String[] args) throws Exception {
    
    

        // 1.获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 2.读取端口数据并转换为JavaBean
        SingleOutputStreamOperator<WaterSensor> waterSensorDS = env.socketTextStream("bigdata-pro-m07", 9999)
                .map(new MapFunction<String, WaterSensor>() {
    
    
                    @Override
                    public WaterSensor map(String s) throws Exception {
    
    
                        String[] split = s.split(",");
                        return new WaterSensor(split[0],Long.parseLong(split[1]),Integer.parseInt(split[2]));
                    }
                });

        // 3.提取数据中的时间戳字段,生成watermark
       SingleOutputStreamOperator<WaterSensor> waterSensorSingleOutputStreamOperator = waterSensorDS
                .assignTimestampsAndWatermarks(WatermarkStrategy
                        // 设置最大允许的延迟时间
                        .<WaterSensor>forBoundedOutOfOrderness(Duration.ofSeconds(2))
                        // 指定事时间件列
                .withTimestampAssigner(new SerializableTimestampAssigner<WaterSensor>() {
    
    
                    @Override
                    public long extractTimestamp(WaterSensor element, long recordTimestamp) {
    
    
                        return element.getTs() * 1000L;
                    }
                }));

        // 4.按照id分组
        KeyedStream<WaterSensor, String> keyedStream = waterSensorSingleOutputStreamOperator.keyBy(WaterSensor::getId);

        //5.开窗,时间间隔:指的是WaterMark跟数据本身的时间差值,包含间隔时间
        WindowedStream<WaterSensor, String, TimeWindow> window = keyedStream.window(EventTimeSessionWindows.withGap(Time.seconds(5)));

        // 6.计算总和
        SingleOutputStreamOperator<WaterSensor> result = window.reduce(new ReduceFunction<WaterSensor>() {
    
    
            @Override
            public WaterSensor reduce(WaterSensor t1, WaterSensor t2) throws Exception {
    
    
                return new WaterSensor(t1.getId(),t1.getTs(),t1.getVc() + t2.getVc());
            }
        });

        // 7.打印
        result.print();

        // 8.执行任务
        env.execute();
    }
}

测试数据:

ws_001,1577844002,1
ws_001,1577844007,1
ws_001,1577844014,1

运行结果:

WaterSensor{
    
    id='ws_001', ts=1577844002, vc=2}

运行过程解释:
程序中设置的会话窗口大小为5秒,第一次输入的数据事件时间是2秒,第二次输入的数据事件时间是7秒,不会触发窗口,因为只有输入数据的watermark >= 上一次的数据事件时间 + 时间间隔(5秒)。当输入的数据时间为14秒,wm为12秒 >= 7 + 5,所以触发窗口计算,得到两个结果。

(4)自定义 WatermarkStrategy

有 2 种风格的 WaterMark 生产方式: periodic(周期性) and punctuated(间歇性).
都需要继承接口: WatermarkGenerator

(4.1)周期性

package com.aikfk.flink.datastream.watermark;

import com.aikfk.flink.datastream.bean.WaterSensor;
import org.apache.flink.api.common.eventtime.*;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;

/**
 * @author :caizhengjie
 * @description:基于事件事件滚动窗口测试watermark机制
 * @date :2021/3/20 9:21 下午
 */
public class EventTimeTumblingCustomerPeriod {
    
    

    public static void main(String[] args) throws Exception {
    
    

        // 1.获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 2.读取端口数据并转换为JavaBean
        SingleOutputStreamOperator<WaterSensor> waterSensorDS = env.socketTextStream("bigdata-pro-m07", 9999)
                .map(new MapFunction<String, WaterSensor>() {
    
    
                    @Override
                    public WaterSensor map(String s) throws Exception {
    
    
                        String[] split = s.split(",");
                        return new WaterSensor(split[0],Long.parseLong(split[1]),Integer.parseInt(split[2]));
                    }
                });

        // 3.提取数据中的时间戳字段,生成watermark
        SingleOutputStreamOperator<WaterSensor> waterSensorSingleOutputStreamOperator = waterSensorDS
                .assignTimestampsAndWatermarks(new WatermarkStrategy<WaterSensor>() {
    
    
                   @Override
                   public WatermarkGenerator<WaterSensor> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {
    
    
                       return new MyPeriod(2000L);
                   }
               }.withTimestampAssigner(new SerializableTimestampAssigner<WaterSensor>() {
    
    
                                @Override
                                public long extractTimestamp(WaterSensor element, long recordTimestamp) {
    
    
                                    return element.getTs() * 1000L;
                                }
                            }));

        // 4.按照id分组
        KeyedStream<WaterSensor, String> keyedStream = waterSensorSingleOutputStreamOperator.keyBy(WaterSensor::getId);

        // 5.开窗
        WindowedStream<WaterSensor, String, TimeWindow> window = keyedStream.window(TumblingEventTimeWindows.of(Time.seconds(5)));

        // 6.计算总和
        SingleOutputStreamOperator<WaterSensor> result = window.reduce(new ReduceFunction<WaterSensor>() {
    
    
            @Override
            public WaterSensor reduce(WaterSensor t1, WaterSensor t2) throws Exception {
    
    
                return new WaterSensor(t1.getId(),t1.getTs(),t1.getVc() + t2.getVc());
            }
        });

        // 7.打印
        result.print();

        // 8.执行任务
        env.execute();
    }

    /**
     * 自定义周期性的Watermark生成器
     */
    public static class MyPeriod implements WatermarkGenerator<WaterSensor> {
    
    

        private Long maxTs;

        // 允许的最大延迟时间 ms
        private Long maxDelay;

        public MyPeriod(Long maxDelay) {
    
    
            this.maxDelay = maxDelay;
            this.maxTs = Long.MIN_VALUE + maxDelay + 1;
        }

        // 每收到一个元素, 执行一次. 用来生产WaterMark中的时间戳
        @Override
        public void onEvent(WaterSensor event, long eventTimestamp, WatermarkOutput output) {
    
    
            //有了新的元素找到最大的时间戳
            System.out.println("取数据中最大的时间戳");
            maxTs = Math.max(eventTimestamp, maxTs);
        }

        // 周期性的把WaterMark发射出去, 默认周期是200ms
        @Override
        public void onPeriodicEmit(WatermarkOutput output) {
    
    
            // 周期性的发射水印: 相当于Flink把自己的时钟调慢了一个最大延迟
            System.out.println("生成WaterMark" + (maxTs - maxDelay));
            output.emitWatermark(new Watermark(maxTs - maxDelay));
        }
    }
}

(4.2)间歇性

package com.aikfk.flink.datastream.watermark;

import com.aikfk.flink.datastream.bean.WaterSensor;
import org.apache.flink.api.common.eventtime.*;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;

/**
 * @author :caizhengjie
 * @description:基于事件事件滚动窗口测试watermark机制
 * @date :2021/3/20 9:21 下午
 */
public class EventTimeTumblingCustomerPunt {
    
    

    public static void main(String[] args) throws Exception {
    
    

        // 1.获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 2.读取端口数据并转换为JavaBean
        SingleOutputStreamOperator<WaterSensor> waterSensorDS = env.socketTextStream("bigdata-pro-m07", 9999)
                .map(new MapFunction<String, WaterSensor>() {
    
    
                    @Override
                    public WaterSensor map(String s) throws Exception {
    
    
                        String[] split = s.split(",");
                        return new WaterSensor(split[0],Long.parseLong(split[1]),Integer.parseInt(split[2]));
                    }
                });

        // 3.提取数据中的时间戳字段
        SingleOutputStreamOperator<WaterSensor> waterSensorSingleOutputStreamOperator = waterSensorDS
                .assignTimestampsAndWatermarks(new WatermarkStrategy<WaterSensor>() {
    
    
                    @Override
                    public WatermarkGenerator<WaterSensor> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {
    
    
                        return new MyPunt(2000L);
                    }
                }.withTimestampAssigner(new SerializableTimestampAssigner<WaterSensor>() {
    
    
                    @Override
                    public long extractTimestamp(WaterSensor element, long recordTimestamp) {
    
    
                        return element.getTs() * 1000L;
                    }
                }));

        // 4.按照id分组
        KeyedStream<WaterSensor, String> keyedStream = waterSensorSingleOutputStreamOperator.keyBy(WaterSensor::getId);

        // 5.开窗
        WindowedStream<WaterSensor, String, TimeWindow> window = keyedStream.window(TumblingEventTimeWindows.of(Time.seconds(5)));

        // 6.计算总和
        SingleOutputStreamOperator<WaterSensor> result = window.reduce(new ReduceFunction<WaterSensor>() {
    
    
            @Override
            public WaterSensor reduce(WaterSensor t1, WaterSensor t2) throws Exception {
    
    
                return new WaterSensor(t1.getId(),t1.getTs(),t1.getVc() + t2.getVc());
            }
        });

        // 7.打印
        result.print();

        // 8.执行任务
        env.execute();
    }

    /**
     * 自定义间歇性watermark
     * */
    public static class MyPunt implements WatermarkGenerator<WaterSensor> {
    
    

        private Long maxTs;
        private Long maxDelay;

        public MyPunt(Long maxDelay) {
    
    
            this.maxDelay = maxDelay;
            this.maxTs = Long.MIN_VALUE + maxDelay + 1;
        }

        //当数据来的时候调用
        @Override
        public void onEvent(WaterSensor event, long eventTimestamp, WatermarkOutput output) {
    
    
            System.out.println("取数据中最大的时间戳");
            maxTs = Math.max(eventTimestamp, maxTs);
            output.emitWatermark(new Watermark(maxTs - maxDelay));
        }

        //周期性调用
        @Override
        public void onPeriodicEmit(WatermarkOutput output) {
    
    
        }
    }
}

测试数据:

ws_001,1577844001,1
ws_001,1577844002,1
ws_001,1577844012,1

运行结果:

取数据中最大的时间戳
取数据中最大的时间戳
取数据中最大的时间戳
WaterSensor{
    
    id='ws_001', ts=1577844001, vc=2}

(5)多并行度下 WaterMark 的传递

WaterMark传递:

  1. 使用广播的方式传输的
  2. 某个并行度中Watermark值取决于前面所有并行度的最小WaterMark值
  3. 当WaterMark值没有增长的时候,不会向下游传递,注意:生成不变

在这里插入图片描述
总结: 多并行度的条件下, 向下游传递 WaterMark 的时候, 总是以最小的那个 WaterMark 为准! 木桶原理!


以上内容仅供参考学习,如有侵权请联系我删除!
如果这篇文章对您有帮助,左下角的大拇指就是对博主最大的鼓励。
您的鼓励就是博主最大的动力!

猜你喜欢

转载自blog.csdn.net/weixin_45366499/article/details/114783695