计数器的使用及脏数据的输出

输入文件内容如下:

https://segmentfault.com/q/1010000000318379 [2018-1202:00] 50
http://ruozedata.com/teacher.html 201802:00 65
http://ruozedata.com/student.html 201802:00 56
https://www.cnblogs.com/MOBIN/p/5384543.html [2018-12-12 22:00:00] 40
https://www.cnblogs.com/huxiuqian/p/10152166.html 201802:00 4
https://www.cnblogs.com/littleorange7/p/10152286.html [2018-12-12 22:00:00] 7
http://ruozedata.com/advanced.html [2018-12-14 22:02:00] j
https://www.baidu.com/baidu?tn=monline_3_dg&ie=utf-8&wd=%E6%9C%89%E9%81%93%E7%BF%BB%E8%AF%91 [2018-1202:00] 5
https://blog.csdn.net/maybe_fly/article/details/77979867 201802:00 h
https://blog.csdn.net/bitcarmanlee/article/details/75949268 [2018-12-13 22:02:00] 40
https://blog.csdn.net/tswisdom/article/details/79882308 [2018-12-13 22:02:00] 30

脏数据文件输出如下:

(http://ruozedata.com/advanced.html,0)
(https://blog.csdn.net/maybe_fly/article/details/77979867,0)

idea输出计数器的总记录数和脏数据数如下:

代码实现

import java.io.{File, PrintWriter}
import org.apache.spark.{SparkConf, SparkContext}
object AccumulatorApp {
  def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("AccumulatorApp")
val sc = new SparkContext(sparkConf)
val accum1=sc.longAccumulator("totalaccumulator1")
val accum2=sc.longAccumulator("erroraccumulator")
val lines=sc.textFile("file:///C:\\Users\\HJ\\Desktop/secondhomework.txt")//读取输入文件
val templine=lines.map(x=>{
  val temp=x.split("\t")
  val domain=temp(0)
  var response=0L
  try{
    accum1.add(1L)
    response=temp(2).toLong}catch{
    case e:Exception=>accum2.add(1L)
  }
  (domain,response)
})
val errorline=templine.filter((_._2==0))//.map(x=>(x._1,x._2.toString))
  // 开始以为RDD[(String,Long)]类型无法用saveAsTextFile保存,所以就用map把第二位转为String,后来注释掉发现也可以
  //可能是熬夜时候写的脑子糊掉了。。。
  .saveAsTextFile("file:///C:\\Users\\HJ\\Desktop/text.txt")
val errortxt=new PrintWriter("C:/Users/HJ/Desktop/errortxt.txt")//用于保存脏数据记录数
val tatoltxt=new PrintWriter("C:/Users/HJ/Desktop/tatoltxt.txt")//用于保存总数据记录数
println(accum1.value)
println(accum2.value)
errortxt.println(accum2.value)
tatoltxt.println(accum1.value)
errortxt.close()
tatoltxt.close()
 sc.stop()//最后一定要关掉
}
}
//注:计数器只有在触发action之后才会计数,更有趣的是,如果多次action,它得到的值是会翻倍的

猜你喜欢

转载自blog.csdn.net/qq_42694416/article/details/85602140