Spark Streaming实现黑名单过滤(scala)

对于黑名单过滤,采用的是访问日志形成的DStreaml与黑名单列表形成的RDD进行left join的方式,进行过滤。

步骤解读:

日志列表:

20190102,192.168.10.101
20190102,192.168.10.102
20190102,192.168.10.103

将访问日志转换为相应的 DStream
   ==>  (192.168.10.101:20190102,192.168.10.101)(192.168.10.102:20190102,192.168.10.102)(192.168.10.103: 20190102,192.168.101.103)

黑名单列表(前提已知以下ip为黑名单ip)
192.168.10.101
192.168.10.102

黑名单列表转换为相应的 RDD
   ==>(192.168.10.101: true)(192.168.10.101: true)

将DStream和RDD进行leftjoin
(192.168.10.101: [<20190102,192.168.10.101>, <true>])  x 
(192.168.10.102: [<20190102,192.168.10.102>, <true>])  x
(192.168.10.103: [<20190102,192.168.10.103>, <false>])  ==> tuple 1

用scala代码实现如下:

package com.fyy.spark.streaming

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * @Title: TransformApp
  * @ProjectName SparkStreamingProject
  * @Description: 黑名单过滤
  * @author fanyanyan
  */
/**
  * 黑名单过滤
  */
object TransformApp {
  def main(args: Array[String]): Unit = {

    val sparkConf = new SparkConf().setAppName("TransformApp").setMaster("local[*]")
    /**
      * 创建StreamingContext需要两个参数:SparkConf和batch interval
      */
    val ssc = new StreamingContext(sparkConf, Seconds(5))
    /**
      * 构建黑名单RDD
      * (192.168.10.101: true)(192.168.10.101: true)
      */
    val blacks = List("192.168.101.101", "192.168.101.102")
    val blacksRDD = ssc.sparkContext.parallelize(blacks).map(x => (x, true))


    /**
      * 获取日志记录DStream(在真正的线上系统需要一些处理)
      * (192.168.10.101:20190102)
      * (192.168.10.102:20190102)
      * (192.168.10.103: 20190102)
      */
    val loglines = ssc.socketTextStream("01.server.bd", 6666)

    /**
      * 1)进行DStream的数据组合形成新的map
      * (192.168.10.101:20190102,192.168.10.101)
      * (192.168.10.102:20190102,192.168.10.102)
      * (192.168.10.103: 20190102,192.168.101.103)
      *
      * 2)进行DStream和黑名单RDD的leftjoin
      * (192.168.10.101: [<20190102,192.168.10.101>, <true>])  x 
      * (192.168.10.102: [<20190102,192.168.10.102>, <true>])  x
      * (192.168.10.103: [<20190102,192.168.10.103>, <false>])  ==> tuple 1
      */
    val logs = loglines.map(x => (x.split(",")(1), x)).transform(rdd => {
      rdd.leftOuterJoin(blacksRDD)
        .filter(x => x._2._2.getOrElse(false) != true)
        .map(x => x._2._1)
    })
    
    // 将过滤结果进行打印
    logs.print()

    ssc.start()
    ssc.awaitTermination()

  }

}

猜你喜欢

转载自blog.csdn.net/adayan_2015/article/details/88417850