一、AggregatFunction概念
Flink 的AggregateFunction是一个基于中间计算结果状态进行增量计算的函数,AggregateFunction接口相对ReduceFunction更加灵活,实现复杂度也相对较高,输入数据类型和输出数据类型可以不一致,通常和WindowFunction一起结合使用。
二、案例实践:每隔3秒计算最近5秒内,每个基站的日志数量
1.创建日志数据对象
case class Log(sid:String,var callOut:String, var callIn:String, callType:String, callTime:Long, duration:Long)
2.业务实现
import org.apache.flink.api.common.functions.AggregateFunction
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.scala.function.WindowFunction
import org.apache.flink.streaming.api.windowing.assigners.SlidingProcessingTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
import org.apache.flink.util.Collector
/**
* 增量聚合函数
*/
object TestAggregatFunctionByWindow {
// 每隔3秒计算最近5秒内,每个基站的日志数量
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
// source
var stream = env.socketTextStream("flink101", 8888)
.map(line => {
var arr = line.split(",")
Log(arr(0).trim,arr(1).trim, arr(2).trim, arr(3).trim, arr(4).trim.toLong, arr(5).trim.toLong)
})
stream.map(log=> (log.sid, 1))
.keyBy(_._1)
.window(SlidingProcessingTimeWindows.of(Time.seconds(5), Time.seconds(3)))
.aggregate(new MyAggregateFunction, new MyWindowFunction)
env.execute("TestAggregatFunctionByWindow")
}
}
// 优点是简单实现聚合,缺点是不能输出key
// add方法,是来一条数据执行一次
// getResult 窗口结束的时候执行一次
class MyAggregateFunction extends AggregateFunction[(String, Int), Long, Long]{
override def createAccumulator(): Long = 0 // 初始化累加器
override def add(in: (String, Int), acc: Long): Long = acc + in._2 // 定义数据的添加逻辑
override def getResult(acc: Long): Long = acc // 定义计算结果的逻辑
override def merge(acc: Long, acc1: Long): Long = acc + acc1 // 合并分区数据
}
// 为了输出key,输入数据来自于AggregateFunction,在窗口结束的时候先执行AggregateFunction对象的getResult方法,然后再执行apply方法
/** WindowFunction
* * @param IN The type of the input value.
* * @param OUT The type of the output value.
* * @param KEY The type of the key.
*/
class MyWindowFunction extends WindowFunction[Long, (String, Long), String, TimeWindow] {
override def apply(key: String, window: TimeWindow, input: Iterable[Long], out: Collector[(String, Long)]): Unit = {
out.collect((key, input.iterator.next())) // next得到第一个值,迭代器中只有一个值
}
}
三、总结
AggregateFunction 接口中定义了三个 需要复写的方法,其中 add()定义数据的添加逻辑,getResult 定义了根据 accumulator 计 算结果的逻辑,merge 方法定义合并 accumulator 的逻辑。