1.RDD去重
之前一直用Distinct进行去重,可以只能处理一元元组;
采用分组,只取一个的方法来实现去重
filteredStartupLogDStream = filteredStartupLogDStream
.map(log => (log.uid, log))
.groupByKey
.flatMap {
case (_, logIt) => logIt.toList.sortBy(_.ts).take(1) //这里排序后,只取第一条记录
}
case (_, logIt) => logIt.toList.minBy(_.ts) //取最小的