Spark案例实战之三

Spark案例实战之三

一.简易日志分析

1.现有如下记录的日志,欲把每种状态提取并计数,然后从低到高排数。

INFO This is a message with content
INFO This is some other content

INFO Here are more messages
WARN This is a warning

ERROR Something bad happened
WARN More details on the bad thing
INFO back to normal messages

2.具体代码如下:

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

/**
  * 1.一个简单的日志分析系统
  * 2,从文本中读取数据,然后记录日志中不同状态的个数
  */
object EasyLogAnalyze{
  var  blankLines = 0

  def main(args:Array[String]): Unit = {
    val conf = new SparkConf().setAppName("EasyLogAnalyze").setMaster("local")
    val sc = new SparkContext(conf)

    val text: RDD[String] = sc.textFile(args(0))
    //read file from args(0)
    text.foreach(println)

    /*
    1.line是参数
    2.{}中的内容是函数处理步骤
     */
    val res1: RDD[(String, Int)] = text.map(line=>{
      var symbol :String = null
      if(line!=""){
        symbol = line.substring(0,line.indexOf(" "))//字符标志
      }//取首字符串
      else {
        blankLines += 1
      }
      (symbol,1)//返回symbol
    })

    val res2: RDD[(String, Int)] = res1.reduceByKey(_+_)
    res2.sortBy(_._2).foreach(println)
  }
}

输出结果如下:

(ERROR,1)
(null,2)
(WARN,2)
(INFO,4)

将处理文本行的那个函数提炼一下,就得到一个函数,然后将函数作为参数输入到map中,则有:

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

/**
  * 1.a simple log analyze program
  * 2.read data from input.text,and count the number of different states in the log
  */
object EasyLogAnalyze{
  var  blankLines = 0
  /*
  1.Define a function to process text rows
  2.RDD[String] is not equals string,but they are very similar
 */
  def process(line:String): (String, Int) ={
    var symbol :String = null
    if(line!=""){
      symbol = line.substring(0,line.indexOf(" "))//temp result
    }//get first string
    else {
      blankLines += 1
    }
    (symbol,1)//return symbol
  }

  def main(args:Array[String]): Unit = {
    val conf = new SparkConf().setAppName("EasyLogAnalyze").setMaster("local")
    val sc = new SparkContext(conf)

    val text: RDD[String] = sc.textFile(args(0))
    text.foreach(println)

    /*
    1.line is a parameter
    2.the content in big brace is the specific step of procession
     */
    val res1: RDD[(String, Int)] = text.map((line: String) => process(line))

    val res2: RDD[(String, Int)] = res1.reduceByKey(_+_)
    res2.sortBy(_._2).foreach(println)
  }
}

猜你喜欢

转载自blog.csdn.net/liu16659/article/details/81254911