直接上代码,注意textFileStream数据源没有 receiver
import org.apache.spark.SparkConf import org.apache.spark.streaming.{Seconds, StreamingContext} object TextFileStreamWordCountApp { def main(args: Array[String]){ val sparkConf = new SparkConf().setAppName("
TextFileStreamWordCountApp").setMaster("local[2]")val ssc = new StreamingContext(sparkConf, Seconds( 10)) val lines = ssc.textFileStream( "C: \\ wc") //val lines = ssc.socketTextStream("hadoop",9999) lines.flatMap(_.split( ",")).map((_, 1)).reduceByKey(_+_).print ssc.start() ssc.awaitTermination() }}
SparkStreaming关键性方法:
1、transform 将RDD转化为Dstream
2、updateStateByKey 对批处理结果进行累计
Output Operations on DSstream
1、saveAsTextFiles 会导致小文件过多
2、saveAsHadoopFiles
3、foreachRDD(写入到关系型数据库中,必须用到的方法)