Spark Streaming-02

直接上代码,注意textFileStream数据源没有 receiver

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}

object TextFileStreamWordCountApp {
  def main(args: Array[String]){
    val sparkConf = new SparkConf().setAppName("
TextFileStreamWordCountApp").setMaster("local[2]")
val ssc = new StreamingContext(sparkConf, Seconds( 10)) val lines = ssc.textFileStream( "C: \\ wc") //val lines = ssc.socketTextStream("hadoop",9999) lines.flatMap(_.split( ",")).map((_, 1)).reduceByKey(_+_).print ssc.start() ssc.awaitTermination() }}

SparkStreaming关键性方法:

1、transform 将RDD转化为Dstream

2、updateStateByKey 对批处理结果进行累计


Output Operations on DSstream

1、saveAsTextFiles 会导致小文件过多

2、saveAsHadoopFiles 

3、foreachRDD(写入到关系型数据库中,必须用到的方法)


猜你喜欢

转载自blog.csdn.net/qq_15300683/article/details/80215667