spark的RDD练习(关于求学生的成绩)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/lv_yishi/article/details/83958059

给定数据如下:

数据
班级ID 姓名 年龄 性别 科目 成绩
12 张三 25 男 chinese 50
12 张三 25 男 math 60
12 张三 25 男 english 70
12 李四 20 男 chinese 50
12 李四 20 男 math 50
12 李四 20 男 english 50
12 王芳 19 女 chinese 70
12 王芳 19 女 math 70
12 王芳 19 女 english 70
13 张大三 25 男 chinese 60
13 张大三 25 男 math 60
13 张大三 25 男 english 70
13 李大四 20 男 chinese 50
13 李大四 20 男 math 60
13 李大四 20 男 english 50
13 王小芳 19 女 chinese 70
13 王小芳 19 女 math 80
13 王小芳 19 女 english 70

需求:

需求

1. 一共有多少人参加考试?
1.1 一共有多少个小于20岁的人参加考试?
1.2 一共有多少个等于20岁的人参加考试?
1.3 一共有多少个大于20岁的人参加考试?

2. 一共有多个男生参加考试?
2.1 一共有多少个女生参加考试?

3. 12班有多少人参加考试?
3.1 13班有多少人参加考试?

4. 语文科目的平均成绩是多少?
4.1 数学科目的平均成绩是多少?
4.2 英语科目的平均成绩是多少?

5. 单个人平均成绩是多少?

6. 12班平均成绩是多少?
6.1 12班男生平均总成绩是多少?
6.2 12班女生平均总成绩是多少?
6.3 同理求13班相关成绩

7. 全校语文成绩最高分是多少?
7.1 12班语文成绩最低分是多少?
7.2 13班数学最高成绩是多少?

8. 总成绩大于150分的12班的女生有几个?

9. 总成绩大于150分,且数学大于等于70,且年龄大于等于20岁的学生的平均成绩是多少?

这里收集了2个人做的方式,其中有的会重复,但是主要是看方法

方式一:

需求如下:
1. 一共有多少人参加考试?
val file = sc.textFile("file:///jar/score")
val name = file.map(x => {val line = x.split(" ");line(0) + "," + line(1)})
val numPeo = name.distinct.count()


1.1 一共有多少个小于20岁的人参加考试?
val file = sc.textFile("file:///jar/score")
val age = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) + "," + line(2)})
val numPeo = age.distinct.filter(_.split(",")(2).toInt<20).count()

1.2  一共有多少个等于20岁的人参加考试?
val file = sc.textFile("file:///jar/score")
val age = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) + "," + line(2)})
val numPeo = age.distinct.filter(_.split(",")(2).toInt == 20).count()

1.3 一共有多少个大于20岁的人参加考试?
val file = sc.textFile("file:///jar/score")
val age = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) + "," + line(2)})
val numPeo = age.distinct.filter(_.split(",")(2).toInt == 20).count()

2. 一共有多个男生参加考试?
val file = sc.textFile("file:///jar/score")
val sex = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) + "," + line(3)})
val numPeo = sex.distinct.filter(_.split(",")(2) == "男").count()

2.1 一共有多少个女生参加考试?
val file = sc.textFile("file:///jar/score")
val sex = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) + "," + line(3)})
val numPeo = sex.distinct.filter(_.split(",")(2) == "女").count()

3. 12班有多少人参加考试?
val file = sc.textFile("file:///jar/score")
val classNum = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) })
val numPeo = classNum.distinct.filter(_.split(",")(0).toInt == 12).count()
sc.makeRDD(Array(numPeo)).saveAsTextFile("file:///jar/result/class12numPeo")

3.1 13班有多少人参加考试?
val file = sc.textFile("file:///jar/score")
val classNum = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) })
val numPeo = classNum.distinct.filter(_.split(",")(0).toInt == 13).count()
sc.makeRDD(Array(numPeo)).saveAsTextFile("file:///jar/result/class13numPeo")

4. 语文科目的平均成绩是多少?
val chineseLine = file.map(x => {val line = x.split(" "); line(4)+ "," + line(5)})
val chineseGennal = chineseLine.filter(_.split(",")(0) == "chinese")
val chineseLength = chineseGennal.count.toInt//6
val chineseSum = chineseGennal.map(_.split(",")(1).toInt).reduce(_ + _)//350
val chineseAvg = chineseSum/chineseLength//58
sc.makeRDD(Array(chineseGennal.map(_.split(",")(1).toInt)
			 .reduce(_ + _)/chineseGennal.count.toInt))
			.saveAsTextFile("file:///jar/result/chineseAvg")

4.1 数学科目的平均成绩是多少?
val mathLine = file.map(x => {val line = x.split(" "); line(4)+ "," + line(5)})
val mathGennal = mathLine.filter(_.split(",")(0) == "math")
val mathLength = mathGennal.count.toInt
val mathSum = mathGennal.map(_.split(",")(1).toInt).reduce(_ + _)
val mathAvg = mathSum/mathLength
sc.makeRDD(Array(mathGennal.map(_.split(",")(1).toInt)
			.reduce(_ + _)/mathGennal.count.toInt))
			.saveAsTextFile("file:///jar/result/mathAvg")

4.2 英语科目的平均成绩是多少?
val englishLine = file.map(x => {val line = x.split(" "); line(4)+ "," + line(5)})
val englishGennal = englishLine.filter(_.split(",")(0) == "english")
val englishLength = englishGennal.count.toInt
val englishSum = englishGennal.map(_.split(",")(1).toInt).reduce(_ + _)
val englishAvg = englishSum/englishLength
sc.makeRDD(Array(englishGennal.map(_.split(",")(1).toInt)
			.reduce(_ + _)/englishGennal.count.toInt))
			.saveAsTextFile("file:///jar/result/englishAvg")

5. 单个人平均成绩是多少?
val scoreLine = file.map(x => {val line = x.split(" "); (line(0)+","+line(1),line(5).toInt)})
val perScore = scoreLine.map(a => (a._1,(a._2,1)))
			.reduceByKey((a,b) => (a._1+b._1,a._2+b._2))
			.map(y => (y._1,y._2._1/y._2._2))
			.saveAsTextFile("file:///jar/result/perScore")

6. 12班平均成绩是多少?
val classScore12 = file.map(x => {val line = x.split(" ");
 (line(0),line(5).toInt)}).filter(a =>(a._1 == "12"))
classScore12.map(a => (a._1,(a._2,1)))
		    .reduceByKey((a,b) => (a._1+b._1,a._2+b._2))
		    .map(y => (y._1,y._2._1/y._2._2))//12,60
		    .saveAsTextFile("file:///jar/result/perClass12")

6.1 12班男生平均总成绩是多少?
val BoyclassScore12 = file.map(x => {val line = x.split(" ");
    (line(0) + "," + line(3) + "," + line(5).toInt)})
    .filter(_.split(",")(0) == "12").filter(_.split(",")(1)=="男")
val BoyclassScore12Num = BoyclassScore12.count//6
val BoyclassScore12Sum= BoyclassScore12.map(y => {val row = y.split(",");row(2).toInt}).reduce(_+_)//330
val BoyperClass12 = BoyclassScore12Sum/BoyclassScore12Num//55


6.2 12班女生平均总成绩是多少?
val GirlclassScore12 = file.map(x => {val line = x.split(" "); 
      (line(0) + "," + line(3) + "," + line(5).toInt)})
      .filter(_.split(",")(0) == "12").filter(_.split(",")(1)=="女")
val GirlclassScore12Num = GirlclassScore12.count//3
val GirlclassScore12Sum= GirlclassScore12.map(y => {val row = y.split(",");row(2).toInt}).reduce(_+_)//210
val GirlperClass12 = GirlclassScore12Sum/GirlclassScore12Num//70

6.3.0 13班平均成绩是多少?
val classScore13 = file.map(x => {val line = x.split(" "); 
     (line(0),line(5).toInt)}).filter(a =>(a._1 == "13"))
val perClass13 = classScore13.map(a => (a._1,(a._2,1)))
     .reduceByKey((a,b) => (a._1+b._1,a._2+b._2))
     .map(y => (y._1,y._2._1/y._2._2))//12,63

6.3.1 13班男生平均总成绩是多少?
val BoyclassScore13 = file.map(x => {val line = x.split(" "); 
    (line(0) + "," + line(3) + "," + line(5).toInt)})
   .filter(_.split(",")(0) == "13").filter(_.split(",")(1)=="男")
val BoyclassScore13Num = BoyclassScore13.count//6
val BoyclassScore13Sum= BoyclassScore13.map(y => {val row = 
     y.split(",");row(2).toInt}).reduce(_+_)//350
val BoyperClass13 = BoyclassScore13Sum/BoyclassScore13Num//58
6.3.2 13班女生平均总成绩是多少?
val GirlclassScore13 = file.map(x => {val line = x.split(" "); (line(0) + "," + 
    line(3) + "," + line(5).toInt)})
   .filter(_.split(",")(0) == "13").filter(_.split(",")(1)=="女")
val GirlclassScore13Num = GirlclassScore13.count//3
val GirlclassScore13Sum= GirlclassScore13.map(y => {val row = y.split(",");row(2).toInt}).reduce(_+_)//220
val GirlperClass13 = GirlclassScore13Sum/GirlclassScore13Num//73

7. 全校语文成绩最高分是多少?
val chineseLine = file.map(x => {val line = x.split(" "); 
    line(4)+ "," + line(5)})
val chineseMax = chineseLine.distinct
	.filter(_.split(",")(0) == "chinese").max


7.1 12班语文成绩最低分是多少?
val chineseLine12 = file.map(x => {val line = x.split(" "); 
     line(0)+ "," + line(4)+ "," + line(5)})
val chineseMin12 = chineseLine12.distinct.
	.filter(_.split(",")(0).toInt == 12).
	.filter(_.split(",")(1) == "chinese")
     .min.saveAsTextFile("file:///jar/result/chineseMin12")

val chineseMax = file.map(x => {val line = x.split(" "); (line(4),line(5).toInt)})
sc.makeRDD(Array(chineseMax.filter( a=>(a._1.equals("chinese")))
	.map(a => (a._2)).max))
    .saveAsTextFile("file:///jar/result/chineseMin12")

7.2 13班数学最高成绩是多少?
val mathLine13 = file.map(x => {val line = x.split(" "); 
     line(0)+ "," + line(4)+ "," + line(5)})
val mathMax13 = mathLine13.distinct.filter(_.split(",")(0).toInt == 13).filter(_.split(",")(1) == "math").max//mathMax13: String = 13,math,80

val mathLine13 = file.map(x => {val line = x.split(" "); 
     (line(0)+ "," + line(4),line(5).toInt)})
val mathMax13 = mathLine13.filter(a => (a._1.split(",")(1).equals("math")) && (a._1.split(",")(0).equals("13"))).max//mathMax13: (String, Int) = (13,math,80)

8. 总成绩大于150分的12班的女生有几个?
val sumScore12Line = file.map(x => {val line = x.split(" "); (line(0)+","+line(1)+","+line(3),line(5).toInt)})
val sumScore12Dayu150 = sumScore12Line.reduceByKey(_+_).filter(a => (a._2>150 && 
 a._1.split(",")(0).equals("12") && a._1.split(",")(2).equals("女"))).count

9. 总成绩大于150分,且数学大于等于70,且年龄大于等于19岁的学生的平均成绩是多少?
val complex1 = file.map(x => {val line = x.split(" "); (line(0)+","+line(1)+","+line(3),line(5).toInt)})
val complex2 = file.map(x => {val line = x.split(" "); (line(0)+","+line(1)+","+line(3)+","+line(4),line(5).toInt)})
 //过滤出总分大于150的,并求出平均成绩
val com1 = complex1.map(a => (a._1, (a._2, 1))).reduceByKey((a,b) => (a._1+b._1,a._2+b._2)).filter(a => (a._2._1>150))
     .map(t => (t._1,t._2._1/t._2._2))
//过滤出 数学大于等于70,且年龄大于等于19岁的学生
val com2 = complex2.filter(a => {val line = a._1.split(","); line(3).equals("math") && a._2>70})
				   .map(a => {val line2 = a._1.split(","); (line2(0)+","+line2(1)+","+line2(2),a._2.toInt)})

(com1).join(com2).map(a =>(a._1,a._2._1))

方式2:



import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD

object SparkT2 {
  case class Person(classID:Int,name:String,age:Int, sex:String,keMu:String, score:Int)
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("SparkWordCount").setMaster("local")
    //创建SparkContext,提交作业
    //map(x => Person(x(0).toInt, x(1), x(2).toInt))
    val sc = new SparkContext(conf)
    val rdd1: RDD[Array[String]] = sc.textFile("D:\\spark/ksdata.txt").map(_.split(" "))

    val rdd2: RDD[(String, String, String, String, String, String)] = rdd1.map(x => (x(0),x(1),x(2),x(3),x(4),x(5)))
   //T1
    /*
      1. 一共有多少人参加考试?
      1.1 一共有多少个小于20岁的人参加考试?
      1.2 一共有多少个等于20岁的人参加考试?
      1.3 一共有多少个大于20岁的人参加考试?
      */
    val rdd3: Long = rdd2.groupBy(_._2).count()
    println(rdd3)
    //T1.3
    val rdd4 = rdd2.filter(_._3.toInt<20).groupBy(_._2).count()
    println(rdd4)
    val rdd5= rdd2.filter(_._3.toInt == 20).groupBy(_._2).count()
    println(rdd5)
    val rdd6: RDD[(String, String, String, String, String, String)] = rdd2.filter(_._3.toInt>20)
    rdd6.foreach(println)
    val res6 = rdd6.groupBy(_._2).count()
    println(res6)
    /*
      2. 一共有多个男生参加考试?
      2.1 一共有多少个女生参加考试?
      */
    val rdd7 = rdd2.filter(_._4.equals("男")).groupBy(_._2).count()
    println(rdd7)
    val rdd8 = rdd2.filter(_._4.equals("女")).groupBy(_._2).count()
    println(rdd8)

    /*
      3. 12班有多少人参加考试?
      3.1 13班有多少人参加考试?
    */
    val rdd9 = rdd2.filter(_._1.toInt == 12).groupBy(_._2).count()
    println(rdd9)
    val rdd10 = rdd2.filter(_._1.toInt == 13).groupBy(_._2).count()
    println(rdd10)

   
    4. 语文科目的平均成绩是多少?
    4.1 数学科目的平均成绩是多少?
    4.2 英语科目的平均成绩是多少?
    
    val rdd9_2: Long = rdd2.filter(_._5.equals("chinese")).count()
    val rdd9_1: Int = rdd2.filter(_._5.equals("chinese")).map(x => x._6.toInt).reduce(_+_)
    val rdd9 = rdd9_1/rdd9_2
    println(rdd9)
    val rdd10_2: Long = rdd2.filter(_._5.equals("english")).count()
    val rdd10_1: Int = rdd2.filter(_._5.equals("english")).map(x => x._6.toInt).reduce(_+_)
    val rdd10 = rdd10_1/rdd10_2
    println(rdd10)
    val rdd11_2: Long = rdd2.filter(_._5.equals("math")).count()
    val rdd11_1: Int = rdd2.filter(_._5.equals("math")).map(x => x._6.toInt).reduce(_+_)
    val rdd11 = rdd11_1/rdd11_2
    println(rdd11)



//    5. 单个人平均成绩是多少?
    val rdd12_1: RDD[(String, Iterable[(String, String, String, String, String, String)])] = rdd2.groupBy(_._2)
    val rdd12_2 = rdd2.groupBy(_._5).count()
    println(rdd12_2)
    val rdd12: RDD[(String, Long)] = rdd12_1.mapValues(x => x.map(s => s._6.toInt).reduce(_+_)/rdd12_2)
    rdd12.foreach(println)



    6. 12班平均成绩是多少?
    6.1 12班男生平均总成绩是多少?
    6.2 12班女生平均总成绩是多少?
    6.3 同理求13班相关成绩*/
    //T6.1
   val rdd13_1: Long = rdd2.filter(_._1.toInt==12).groupBy(_._2).count()
    val rdd13_2 = rdd2.groupBy(_._5).count()
    val rdd13_3: RDD[(String, Long)] = rdd2.filter(_._1.toInt==12).groupBy(_._1).mapValues(x => x.map(s => s._6.toInt).reduce(_+_)/(rdd13_1*rdd13_2))
   rdd13_3.foreach(println)*/
    //T6.2
    val r_man_1 = rdd2.filter(_._1.toInt==12).filter(_._4.equals("男")).groupBy(_._2).count()
    val r_man_2: Long = rdd2.filter(_._1.toInt==12).filter(_._4.equals("男")).map(s => s._6.toInt).reduce(_+_)/r_man_1
    println(r_man_2)*/
  //其他同理

  7. 全校语文成绩最高分是多少?
  7.1 12班语文成绩最低分是多少?
  7.2 13班数学最高成绩是多少?*/
  //T7.1
  val rddt7: Array[(Int, String)] = rdd2.filter(_._5.equals("chinese")).map(x => (x._6.toInt,x._2)).top(1)
  rdd10.foreach(print)
    
  //T7.2
  val r_man_2 = rdd2.filter(_._1.toInt==12).filter(_._5.equals("chinese")).map(x => (x._6.toInt,x._6)).takeOrdered(1)
  r_man_2.foreach(println)
  
  //T7.3
   val r_math = rdd2.filter(_._1.toInt==13).filter(_._5.equals("math")).map(x => (x._6.toInt,x._2)).top(1)
  r_math.foreach(println)

//    8. 总成绩大于150分的12班的女生有几个?
  val rddt8: Long = rdd2.filter(x => if(x._1.toInt==12 && x._4.equals("女")) true else false).groupBy(_._2).mapValues(x => x.map(x => x._6.toInt).reduce(_+_)).map(_._2>150).count()
  println(rddt8)
    rddt8.foreach(println)
//总成绩大于150分,且数学大于等于70,且年龄大于等于20岁的学生的平均成绩是多少?

//-----------9. 总成绩大于150分,且数学大于等于70,且年龄大于等于19岁的学生的平均成绩是多少?--------------

    //班级  科目  成绩  性别 姓名 age

    val quan=lines.map(x=>{
      val a = x.split(" ")
      (a(0),a(4),a(5).toInt,a(3),a(1),a(2).toInt)
    })

    //班级  科目  成绩  性别 姓名 age
    val mathsore1= quan.filter(_._2.equals("math")).filter(_._3>=70).filter(_._6>=19).map(x=>{
      val a=1;
      (x._5,a)
    })
  println(mathsore1.collect().toBuffer)
    //ArrayBuffer((王芳,1), (王小芳,1))
    val quan1= quan.map(x=>{
      (x._5,x._3)
    })
    val rdd3=quan1.join(mathsore1)
    println(rdd3.collect().toBuffer)
   //ArrayBuffer((王芳,(70,1)), (王芳,(70,1)), (王芳,(70,1)), (王小芳,(70,1)), (王小芳,(80,1)), (王小芳,(70,1)))

    val rdd4=rdd3.reduceByKey((a,b) => (a._1+b._1,a._2+b._2)).filter(_._2._1>150).map(x=>{
      val a = (x._2._1/x._2._2).toDouble
      val d = a.formatted("%.2f")
      (x._1,d)
    })
    rdd4.foreach(println)
    //(王芳,70.00)
    //(王小芳,73.00)
  }
}

方式2的第9问使用join的方式是比较简便

另附方式3,写的比较繁琐:建议看看就行

object classDemo {
  def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf()

    sparkConf.setAppName("Demo1")
      .setMaster("local[*]")

    val sc: SparkContext = new SparkContext(sparkConf)

    val lines: RDD[String] = sc.textFile("G:\\测试文档\\calssDemo\\text1.txt")
    //(1)统计一共有多少人参加考试
    //以姓名分组
    val name1: RDD[(String, String)] = lines.map(x => {
      val a = x.split(" ")
      (a(1), a(2))

    })
    val name2: RDD[(String, Iterable[(String, String)])] = name1.groupBy(_._1)
    val counts = name2.count()
    // println(counts)  //6

    //大于20的人数
    val sex1 = lines.map(x=>{
      val age = x.split(" ")
      (age(1),age(2).toInt)
    })
    // println(sex1.distinct().filter(_._2> 20).count())

    //小于20的人数
    val sex2 = lines.map(x=>{
      val age = x.split(" ")
      (age(1),age(2).toInt)
    })
    // println(sex2.distinct().filter(_._2< 20).count())
    //等于20的人数
    val sex3 = lines.map(x=>{
      val age = x.split(" ")
      (age(1),age(2).toInt)
    })
    //  println(sex3.distinct().filter(_._2= 20).count())


    //第二题:使用分组的方式做求男女的数量
    val sex= lines.map(x => {
      val a = x.split(" ")
      val id1=a(0) //班级
      val id = a(1)//学生
      val sexs = a(3)//年龄
      (id1,id, sexs)
    }).toArray()
    //以性别进行分组
    val group1: Map[String, Array[(String, String, String)]] = sex.distinct.groupBy(_._3)
    //统计分组之后的数量
    val sexnum: Map[String, Int] = group1.mapValues(_.length)
    //println(sexnum.toBuffer) //ArrayBuffer((男,4), (女,2))


    //第三题:统计各班多少人考试
    val group2: Map[String, Array[(String, String, String)]] = sex.distinct.groupBy(_._1)
    val sexnum2= group2.mapValues(_.length)
    //println(sexnum2.toBuffer)//ArrayBuffer((12,3), (13,3))

    //第四题:每一科的平均成绩
    val danke: RDD[(String, Int)] = lines.map(x => {
      val a = x.split(" ")
      (a(4), a(5).toInt)
    })
    val danke1: RDD[(String, Iterable[(String, Int)])] = danke.groupBy(_._1)
    val danke2: RDD[(String, String)] = danke1.mapValues(x => {
      var a: Int = 0
      var b: Double = 0
      for (i <- x) {
        a += (i._2)
        b += 1
      }
      val c: Double = (a / b)
      c.formatted("%.2f") //调用的时候可以使用doubue,但是返回的时候是字符串类型
    })
    //println(danke2.collect().toBuffer)
    // ArrayBuffer((math,63.33), (english,63.33), (chinese,58.33))

    //第5题:
    //(1)以姓名分组
    val name3: RDD[(String, String, String, String)] = lines.map(x => {
      val a = x.split(" ")
      (a(0), a(1), a(5), a(3))
    })
    val name4: RDD[(String, Iterable[(String, String, String, String)])] = name3.groupBy(_._2)
    // println(name4.collect().toBuffer)
    //ArrayBuffer((张大三,CompactBuffer((13,张大三,60,男), (13,张大三,60,男), (13,张大三,70,男))),
    // (李大四,CompactBuffer((13,李大四,50,男), (13,李大四,60,男), (13,李大四,50,男))),
    // (王芳,CompactBuffer((12,王芳,70,女), (12,王芳,70,女), (12,王芳,70,女))),
    // (张三,CompactBuffer((12,张三,50,男), (12,张三,60,男), (12,张三,70,男))),
    // (王小芳,CompactBuffer((13,王小芳,70,女), (13,王小芳,80,女), (13,王小芳,70,女))),
    // (李四,CompactBuffer((12,李四,50,男), (12,李四,50,男), (12,李四,50,男))))

    val score: RDD[(String, String)] = name4.mapValues(x => {
      var a: Int = 0
      var b: Double = 0
      for (i <- x) {
        a += (i._3).toInt
        b += 1
      }
      val c: Double = (a / b)
      c.formatted("%.2f") //调用的时候可以使用doubue,但是返回的时候是字符串类型

    })
    //println(score.collect().toBuffer)
    //ArrayBuffer((张大三,63.33), (李大四,53.33), (王芳,70.00), (张三,60.00), (王小芳,73.33), (李四,50.00))


    //6班级平均成绩
    val score2: RDD[(String, List[String])] = name4.mapValues(x => {
      var a: Int = 0
      var b: Double = 0
      var ids = ""
      for (i <- x) {
        a += (i._3).toInt
        b += 1
        ids = i._1
      }
      val c: Double = (a / b)
      val d = c.formatted("%.2f")
      val empt = List()
      val list2 = d :: empt
      val list3 = ids :: list2
      //(d,ids)
      list3

    })
    // println(score2.collect().toBuffer)
    //ArrayBuffer((张大三,List(13, 63.33)), (李大四,List(13, 53.33)),
    // (王芳,List(12, 70.00)), (张三,List(12, 60.00)), (王小芳,List(13, 73.33)), (李四,List(12, 50.00)))


    val value1: RDD[(String, String)] = score2.map(x => {
      val toArray: Array[String] = x._2.toArray
      (toArray(0), toArray(1))
    })
    // println(value1.collect().toBuffer)
    //ArrayBuffer((13,63.33), (13,53.33), (12,70.00), (12,60.00), (13,73.33), (12,50.00))
    val id_score: RDD[(String, Iterable[(String, String)])] = value1.groupBy(_._1)
    //println(id_score.collect().toBuffer)
    //ArrayBuffer((13,CompactBuffer((13,63.33), (13,53.33), (13,73.33))), (12,CompactBuffer((12,70.00), (12,60.00), (12,50.00))))
    val id_score1: RDD[(String, String)] = id_score.mapValues(x => {
      var a: Double = 0
      var b: Double = 0
      for (i <- x) {
        a += (i._2).toDouble
        b += 1
      }
      val c: Double = (a / b)
      c.formatted("%.2f") //调用的时候可以使用doubue,但是返回的时候是字符串类型
    })

    //println(id_score1.collect().toBuffer)
    //ArrayBuffer((13,63.33), (12,60.00))



    // 6.1 12班男生平均总成绩是多少?
    // 6.2 12班女生平均总成绩是多少?
    //-----------------------------------------------------------------------

    val sex_score1: RDD[(String, List[String])] = name4.mapValues(x => {

      var a: Int = 0
      var b: Double = 0
      var ids = ""
      var ids1 = ""
      for (i <- x) {
        a += (i._3).toInt
        b += 1
        ids = i._1
        ids1 = i._4
      }
      val c: Double = (a / b)
      val d = c.formatted("%.2f")
      val empt = List()
      val list2 = d :: empt
      val list3 = ids :: list2
      val list4 = ids1 :: list3
      //(d,ids)
      list4

    })
    //println(sex_score1.collect().toBuffer)
    //ArrayBuffer((张大三,List(男, 13, 63.33)), (李大四,List(男, 13, 53.33)), (王芳,List(女, 12, 70.00)),
    // (张三,List(男, 12, 60.00)), (王小芳,List(女, 13, 73.33)), (李四,List(男, 12, 50.00)))

    val sex_score2: RDD[(String, String, String)] = sex_score1.map(x => {
      val array: Array[String] = x._2.toArray
      (array(0), array(1),array(2))
    })

    val sex_score3: RDD[(String, Iterable[(String, String, String)])] = sex_score2.groupBy(_._2)

    /* //: RDD[(String, List[(String, Int, Double)])]
     val acb= sex_score3.mapValues(x=>(x.map(t=>(t._1,t._2.toInt,t._3.toDouble))).toList).mapValues(x=>{

       val n1 = x.map(x1=>(x1._1,x1._2,x1._3,1))
       val n2= n1.filter(_._1.equals("男")).reduce(_._3+_._3)//.reduce(_._3+_._3)
       val v2= n1.filter(_._1.equals("女")).reduce((a,b)=>(a._3+b._3))
       (n2,v2)
     })
     println(acb.collect().toBuffer)*/

    val sex_score4: RDD[(String, (String, String))] = sex_score3.mapValues(x=>{
      var a=0 //计算性别的人数
      var a0=0 //计算性别的人数
      var score:Double=0
      var score1:Double=0
      for(i<-x){

        if(i._1.equals("男")) {
          score += i._3.toDouble
          a+=1
        }
        else {
          score1 += i._3.toDouble
          a0+=1
        }
      }
      val a1="n"+(score/a).toString
      val a2="v"+(score1/a0).toString
      (a1,a2)
    })

    //println(sex_score4.collect().toBuffer)
    //ArrayBuffer((13,(n58.33,v73.33)), (12,(n55.0,v70.0)))

    //println("---------------7----------------")

    //班级  科目  成绩  性别 姓名 age

   /* val quan=lines.map(x=>{
      val a = x.split(" ")
      (a(0),a(4),a(5).toInt,a(3),a(1),a(2).toInt)
    })

    val kemu= quan.groupBy(_._2)

    // println(kemu.collect().toBuffer)
    //ArrayBuffer((math,CompactBuffer((12,math,60), (12,math,50), (12,math,70), (13,math,60), (13,math,60), (13,math,80))),
    // (english,CompactBuffer((12,english,70), (12,english,50), (12,english,70), (13,english,70), (13,english,50), (13,english,70)))

    //T7.1
    val rddt7: Array[(Int, String)] = quan.filter(_._2.equals("chinese")).map(x => (x._3,x._2)).top(1)
    //ArrayBuffer((70,chinese))
    //println(rddt7.toBuffer)

    //T7.2
    val r_man_2 = quan.filter(_._1.toInt==12).filter(_._1.equals("chinese")).map(x => (x._3,x._2)).takeOrdered(1)
    // r_man_2.foreach(println)

    //T7.3
    val r_math = quan.filter(_._1.toInt==13).filter(_._1.equals("math")).map(x => (x._3,x._2)).top(1)
    // r_math.foreach(println)


    //-----------8. 总成绩大于150分的12班的女生有几个?-----------------
    val soer: RDD[(String, Int)] = quan.filter(_._4.equals("女")).filter(_._1.equals("12")).map(x=>{
      (x._1,x._3)
    })
    val soer1=soer.reduceByKey(_+_).filter(a=>(a._2>150)).count()
    // println(soer1)
*/
    //-----------9. 总成绩大于150分,且数学大于等于70,且年龄大于等于19岁的学生的平均成绩是多少?--------------

    //班级  科目  成绩  性别 姓名 age

    val quan=lines.map(x=>{
      val a = x.split(" ")
      (a(0),a(4),a(5).toInt,a(3),a(1),a(2).toInt)
    })

    //班级  科目  成绩  性别 姓名 age
    val mathsore1= quan.filter(_._2.equals("math")).filter(_._3>=70).filter(_._6>=19).map(x=>{
      val a=1;
      (x._5,a)
    })
  println(mathsore1.collect().toBuffer)
    //ArrayBuffer((王芳,1), (王小芳,1))
    val quan1= quan.map(x=>{
      (x._5,x._3)
    })
    val rdd3=quan1.join(mathsore1)
    println(rdd3.collect().toBuffer)
   //ArrayBuffer((王芳,(70,1)), (王芳,(70,1)), (王芳,(70,1)), (王小芳,(70,1)), (王小芳,(80,1)), (王小芳,(70,1)))

    val rdd4=rdd3.reduceByKey((a,b) => (a._1+b._1,a._2+b._2)).filter(_._2._1>150).map(x=>{
      val a = (x._2._1/x._2._2).toDouble
      val d = a.formatted("%.2f")
      (x._1,d)
    })
    rdd4.foreach(println)
    //(王芳,70.00)
    //(王小芳,73.00)

    /*
    //总成绩大于150分,且数学大于等于70,且年龄大于等于20岁的学生的平均成绩是多少
    //姓名、年龄、科目、分数
    val res1 = tuples.map(x=>(x._2,x._3,x._5,x._6)).filter(_._2>=20)
    val res2 = res1.map(x=>(x._1,x._4)).reduceByKey(_+_).filter(_._2>150)
    val res3 = res1.filter(_._3.equals("math")).filter(_._4>=70).map(x=>(x._1))
    val list = res3.collect().toList
      for (i <- list) {
         println(res2.filter(_._1.equals(i)).collect().toList)
      }

     */

    //println("-----------------完--------------------")
    sc.stop()

  }
}

猜你喜欢

转载自blog.csdn.net/lv_yishi/article/details/83958059