版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/lv_yishi/article/details/83958059
给定数据如下:
班级ID 姓名 年龄 性别 科目 成绩 12 张三 25 男 chinese 50 12 张三 25 男 math 60 12 张三 25 男 english 70 12 李四 20 男 chinese 50 12 李四 20 男 math 50 12 李四 20 男 english 50 12 王芳 19 女 chinese 70 12 王芳 19 女 math 70 12 王芳 19 女 english 70 13 张大三 25 男 chinese 60 13 张大三 25 男 math 60 13 张大三 25 男 english 70 13 李大四 20 男 chinese 50 13 李大四 20 男 math 60 13 李大四 20 男 english 50 13 王小芳 19 女 chinese 70 13 王小芳 19 女 math 80 13 王小芳 19 女 english 70 |
需求:
1. 一共有多少人参加考试? 2. 一共有多个男生参加考试? 3. 12班有多少人参加考试? 4. 语文科目的平均成绩是多少? 5. 单个人平均成绩是多少? 6. 12班平均成绩是多少? 7. 全校语文成绩最高分是多少? 8. 总成绩大于150分的12班的女生有几个? 9. 总成绩大于150分,且数学大于等于70,且年龄大于等于20岁的学生的平均成绩是多少? |
这里收集了2个人做的方式,其中有的会重复,但是主要是看方法
方式一:
需求如下:
1. 一共有多少人参加考试?
val file = sc.textFile("file:///jar/score")
val name = file.map(x => {val line = x.split(" ");line(0) + "," + line(1)})
val numPeo = name.distinct.count()
1.1 一共有多少个小于20岁的人参加考试?
val file = sc.textFile("file:///jar/score")
val age = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) + "," + line(2)})
val numPeo = age.distinct.filter(_.split(",")(2).toInt<20).count()
1.2 一共有多少个等于20岁的人参加考试?
val file = sc.textFile("file:///jar/score")
val age = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) + "," + line(2)})
val numPeo = age.distinct.filter(_.split(",")(2).toInt == 20).count()
1.3 一共有多少个大于20岁的人参加考试?
val file = sc.textFile("file:///jar/score")
val age = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) + "," + line(2)})
val numPeo = age.distinct.filter(_.split(",")(2).toInt == 20).count()
2. 一共有多个男生参加考试?
val file = sc.textFile("file:///jar/score")
val sex = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) + "," + line(3)})
val numPeo = sex.distinct.filter(_.split(",")(2) == "男").count()
2.1 一共有多少个女生参加考试?
val file = sc.textFile("file:///jar/score")
val sex = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) + "," + line(3)})
val numPeo = sex.distinct.filter(_.split(",")(2) == "女").count()
3. 12班有多少人参加考试?
val file = sc.textFile("file:///jar/score")
val classNum = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) })
val numPeo = classNum.distinct.filter(_.split(",")(0).toInt == 12).count()
sc.makeRDD(Array(numPeo)).saveAsTextFile("file:///jar/result/class12numPeo")
3.1 13班有多少人参加考试?
val file = sc.textFile("file:///jar/score")
val classNum = file.map(x => {val line = x.split(" ");line(0) + "," + line(1) })
val numPeo = classNum.distinct.filter(_.split(",")(0).toInt == 13).count()
sc.makeRDD(Array(numPeo)).saveAsTextFile("file:///jar/result/class13numPeo")
4. 语文科目的平均成绩是多少?
val chineseLine = file.map(x => {val line = x.split(" "); line(4)+ "," + line(5)})
val chineseGennal = chineseLine.filter(_.split(",")(0) == "chinese")
val chineseLength = chineseGennal.count.toInt//6
val chineseSum = chineseGennal.map(_.split(",")(1).toInt).reduce(_ + _)//350
val chineseAvg = chineseSum/chineseLength//58
sc.makeRDD(Array(chineseGennal.map(_.split(",")(1).toInt)
.reduce(_ + _)/chineseGennal.count.toInt))
.saveAsTextFile("file:///jar/result/chineseAvg")
4.1 数学科目的平均成绩是多少?
val mathLine = file.map(x => {val line = x.split(" "); line(4)+ "," + line(5)})
val mathGennal = mathLine.filter(_.split(",")(0) == "math")
val mathLength = mathGennal.count.toInt
val mathSum = mathGennal.map(_.split(",")(1).toInt).reduce(_ + _)
val mathAvg = mathSum/mathLength
sc.makeRDD(Array(mathGennal.map(_.split(",")(1).toInt)
.reduce(_ + _)/mathGennal.count.toInt))
.saveAsTextFile("file:///jar/result/mathAvg")
4.2 英语科目的平均成绩是多少?
val englishLine = file.map(x => {val line = x.split(" "); line(4)+ "," + line(5)})
val englishGennal = englishLine.filter(_.split(",")(0) == "english")
val englishLength = englishGennal.count.toInt
val englishSum = englishGennal.map(_.split(",")(1).toInt).reduce(_ + _)
val englishAvg = englishSum/englishLength
sc.makeRDD(Array(englishGennal.map(_.split(",")(1).toInt)
.reduce(_ + _)/englishGennal.count.toInt))
.saveAsTextFile("file:///jar/result/englishAvg")
5. 单个人平均成绩是多少?
val scoreLine = file.map(x => {val line = x.split(" "); (line(0)+","+line(1),line(5).toInt)})
val perScore = scoreLine.map(a => (a._1,(a._2,1)))
.reduceByKey((a,b) => (a._1+b._1,a._2+b._2))
.map(y => (y._1,y._2._1/y._2._2))
.saveAsTextFile("file:///jar/result/perScore")
6. 12班平均成绩是多少?
val classScore12 = file.map(x => {val line = x.split(" ");
(line(0),line(5).toInt)}).filter(a =>(a._1 == "12"))
classScore12.map(a => (a._1,(a._2,1)))
.reduceByKey((a,b) => (a._1+b._1,a._2+b._2))
.map(y => (y._1,y._2._1/y._2._2))//12,60
.saveAsTextFile("file:///jar/result/perClass12")
6.1 12班男生平均总成绩是多少?
val BoyclassScore12 = file.map(x => {val line = x.split(" ");
(line(0) + "," + line(3) + "," + line(5).toInt)})
.filter(_.split(",")(0) == "12").filter(_.split(",")(1)=="男")
val BoyclassScore12Num = BoyclassScore12.count//6
val BoyclassScore12Sum= BoyclassScore12.map(y => {val row = y.split(",");row(2).toInt}).reduce(_+_)//330
val BoyperClass12 = BoyclassScore12Sum/BoyclassScore12Num//55
6.2 12班女生平均总成绩是多少?
val GirlclassScore12 = file.map(x => {val line = x.split(" ");
(line(0) + "," + line(3) + "," + line(5).toInt)})
.filter(_.split(",")(0) == "12").filter(_.split(",")(1)=="女")
val GirlclassScore12Num = GirlclassScore12.count//3
val GirlclassScore12Sum= GirlclassScore12.map(y => {val row = y.split(",");row(2).toInt}).reduce(_+_)//210
val GirlperClass12 = GirlclassScore12Sum/GirlclassScore12Num//70
6.3.0 13班平均成绩是多少?
val classScore13 = file.map(x => {val line = x.split(" ");
(line(0),line(5).toInt)}).filter(a =>(a._1 == "13"))
val perClass13 = classScore13.map(a => (a._1,(a._2,1)))
.reduceByKey((a,b) => (a._1+b._1,a._2+b._2))
.map(y => (y._1,y._2._1/y._2._2))//12,63
6.3.1 13班男生平均总成绩是多少?
val BoyclassScore13 = file.map(x => {val line = x.split(" ");
(line(0) + "," + line(3) + "," + line(5).toInt)})
.filter(_.split(",")(0) == "13").filter(_.split(",")(1)=="男")
val BoyclassScore13Num = BoyclassScore13.count//6
val BoyclassScore13Sum= BoyclassScore13.map(y => {val row =
y.split(",");row(2).toInt}).reduce(_+_)//350
val BoyperClass13 = BoyclassScore13Sum/BoyclassScore13Num//58
6.3.2 13班女生平均总成绩是多少?
val GirlclassScore13 = file.map(x => {val line = x.split(" "); (line(0) + "," +
line(3) + "," + line(5).toInt)})
.filter(_.split(",")(0) == "13").filter(_.split(",")(1)=="女")
val GirlclassScore13Num = GirlclassScore13.count//3
val GirlclassScore13Sum= GirlclassScore13.map(y => {val row = y.split(",");row(2).toInt}).reduce(_+_)//220
val GirlperClass13 = GirlclassScore13Sum/GirlclassScore13Num//73
7. 全校语文成绩最高分是多少?
val chineseLine = file.map(x => {val line = x.split(" ");
line(4)+ "," + line(5)})
val chineseMax = chineseLine.distinct
.filter(_.split(",")(0) == "chinese").max
7.1 12班语文成绩最低分是多少?
val chineseLine12 = file.map(x => {val line = x.split(" ");
line(0)+ "," + line(4)+ "," + line(5)})
val chineseMin12 = chineseLine12.distinct.
.filter(_.split(",")(0).toInt == 12).
.filter(_.split(",")(1) == "chinese")
.min.saveAsTextFile("file:///jar/result/chineseMin12")
val chineseMax = file.map(x => {val line = x.split(" "); (line(4),line(5).toInt)})
sc.makeRDD(Array(chineseMax.filter( a=>(a._1.equals("chinese")))
.map(a => (a._2)).max))
.saveAsTextFile("file:///jar/result/chineseMin12")
7.2 13班数学最高成绩是多少?
val mathLine13 = file.map(x => {val line = x.split(" ");
line(0)+ "," + line(4)+ "," + line(5)})
val mathMax13 = mathLine13.distinct.filter(_.split(",")(0).toInt == 13).filter(_.split(",")(1) == "math").max//mathMax13: String = 13,math,80
val mathLine13 = file.map(x => {val line = x.split(" ");
(line(0)+ "," + line(4),line(5).toInt)})
val mathMax13 = mathLine13.filter(a => (a._1.split(",")(1).equals("math")) && (a._1.split(",")(0).equals("13"))).max//mathMax13: (String, Int) = (13,math,80)
8. 总成绩大于150分的12班的女生有几个?
val sumScore12Line = file.map(x => {val line = x.split(" "); (line(0)+","+line(1)+","+line(3),line(5).toInt)})
val sumScore12Dayu150 = sumScore12Line.reduceByKey(_+_).filter(a => (a._2>150 &&
a._1.split(",")(0).equals("12") && a._1.split(",")(2).equals("女"))).count
9. 总成绩大于150分,且数学大于等于70,且年龄大于等于19岁的学生的平均成绩是多少?
val complex1 = file.map(x => {val line = x.split(" "); (line(0)+","+line(1)+","+line(3),line(5).toInt)})
val complex2 = file.map(x => {val line = x.split(" "); (line(0)+","+line(1)+","+line(3)+","+line(4),line(5).toInt)})
//过滤出总分大于150的,并求出平均成绩
val com1 = complex1.map(a => (a._1, (a._2, 1))).reduceByKey((a,b) => (a._1+b._1,a._2+b._2)).filter(a => (a._2._1>150))
.map(t => (t._1,t._2._1/t._2._2))
//过滤出 数学大于等于70,且年龄大于等于19岁的学生
val com2 = complex2.filter(a => {val line = a._1.split(","); line(3).equals("math") && a._2>70})
.map(a => {val line2 = a._1.split(","); (line2(0)+","+line2(1)+","+line2(2),a._2.toInt)})
(com1).join(com2).map(a =>(a._1,a._2._1))
方式2:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.RDD
object SparkT2 {
case class Person(classID:Int,name:String,age:Int, sex:String,keMu:String, score:Int)
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("SparkWordCount").setMaster("local")
//创建SparkContext,提交作业
//map(x => Person(x(0).toInt, x(1), x(2).toInt))
val sc = new SparkContext(conf)
val rdd1: RDD[Array[String]] = sc.textFile("D:\\spark/ksdata.txt").map(_.split(" "))
val rdd2: RDD[(String, String, String, String, String, String)] = rdd1.map(x => (x(0),x(1),x(2),x(3),x(4),x(5)))
//T1
/*
1. 一共有多少人参加考试?
1.1 一共有多少个小于20岁的人参加考试?
1.2 一共有多少个等于20岁的人参加考试?
1.3 一共有多少个大于20岁的人参加考试?
*/
val rdd3: Long = rdd2.groupBy(_._2).count()
println(rdd3)
//T1.3
val rdd4 = rdd2.filter(_._3.toInt<20).groupBy(_._2).count()
println(rdd4)
val rdd5= rdd2.filter(_._3.toInt == 20).groupBy(_._2).count()
println(rdd5)
val rdd6: RDD[(String, String, String, String, String, String)] = rdd2.filter(_._3.toInt>20)
rdd6.foreach(println)
val res6 = rdd6.groupBy(_._2).count()
println(res6)
/*
2. 一共有多个男生参加考试?
2.1 一共有多少个女生参加考试?
*/
val rdd7 = rdd2.filter(_._4.equals("男")).groupBy(_._2).count()
println(rdd7)
val rdd8 = rdd2.filter(_._4.equals("女")).groupBy(_._2).count()
println(rdd8)
/*
3. 12班有多少人参加考试?
3.1 13班有多少人参加考试?
*/
val rdd9 = rdd2.filter(_._1.toInt == 12).groupBy(_._2).count()
println(rdd9)
val rdd10 = rdd2.filter(_._1.toInt == 13).groupBy(_._2).count()
println(rdd10)
4. 语文科目的平均成绩是多少?
4.1 数学科目的平均成绩是多少?
4.2 英语科目的平均成绩是多少?
val rdd9_2: Long = rdd2.filter(_._5.equals("chinese")).count()
val rdd9_1: Int = rdd2.filter(_._5.equals("chinese")).map(x => x._6.toInt).reduce(_+_)
val rdd9 = rdd9_1/rdd9_2
println(rdd9)
val rdd10_2: Long = rdd2.filter(_._5.equals("english")).count()
val rdd10_1: Int = rdd2.filter(_._5.equals("english")).map(x => x._6.toInt).reduce(_+_)
val rdd10 = rdd10_1/rdd10_2
println(rdd10)
val rdd11_2: Long = rdd2.filter(_._5.equals("math")).count()
val rdd11_1: Int = rdd2.filter(_._5.equals("math")).map(x => x._6.toInt).reduce(_+_)
val rdd11 = rdd11_1/rdd11_2
println(rdd11)
// 5. 单个人平均成绩是多少?
val rdd12_1: RDD[(String, Iterable[(String, String, String, String, String, String)])] = rdd2.groupBy(_._2)
val rdd12_2 = rdd2.groupBy(_._5).count()
println(rdd12_2)
val rdd12: RDD[(String, Long)] = rdd12_1.mapValues(x => x.map(s => s._6.toInt).reduce(_+_)/rdd12_2)
rdd12.foreach(println)
6. 12班平均成绩是多少?
6.1 12班男生平均总成绩是多少?
6.2 12班女生平均总成绩是多少?
6.3 同理求13班相关成绩*/
//T6.1
val rdd13_1: Long = rdd2.filter(_._1.toInt==12).groupBy(_._2).count()
val rdd13_2 = rdd2.groupBy(_._5).count()
val rdd13_3: RDD[(String, Long)] = rdd2.filter(_._1.toInt==12).groupBy(_._1).mapValues(x => x.map(s => s._6.toInt).reduce(_+_)/(rdd13_1*rdd13_2))
rdd13_3.foreach(println)*/
//T6.2
val r_man_1 = rdd2.filter(_._1.toInt==12).filter(_._4.equals("男")).groupBy(_._2).count()
val r_man_2: Long = rdd2.filter(_._1.toInt==12).filter(_._4.equals("男")).map(s => s._6.toInt).reduce(_+_)/r_man_1
println(r_man_2)*/
//其他同理
7. 全校语文成绩最高分是多少?
7.1 12班语文成绩最低分是多少?
7.2 13班数学最高成绩是多少?*/
//T7.1
val rddt7: Array[(Int, String)] = rdd2.filter(_._5.equals("chinese")).map(x => (x._6.toInt,x._2)).top(1)
rdd10.foreach(print)
//T7.2
val r_man_2 = rdd2.filter(_._1.toInt==12).filter(_._5.equals("chinese")).map(x => (x._6.toInt,x._6)).takeOrdered(1)
r_man_2.foreach(println)
//T7.3
val r_math = rdd2.filter(_._1.toInt==13).filter(_._5.equals("math")).map(x => (x._6.toInt,x._2)).top(1)
r_math.foreach(println)
// 8. 总成绩大于150分的12班的女生有几个?
val rddt8: Long = rdd2.filter(x => if(x._1.toInt==12 && x._4.equals("女")) true else false).groupBy(_._2).mapValues(x => x.map(x => x._6.toInt).reduce(_+_)).map(_._2>150).count()
println(rddt8)
rddt8.foreach(println)
//总成绩大于150分,且数学大于等于70,且年龄大于等于20岁的学生的平均成绩是多少?
//-----------9. 总成绩大于150分,且数学大于等于70,且年龄大于等于19岁的学生的平均成绩是多少?--------------
//班级 科目 成绩 性别 姓名 age
val quan=lines.map(x=>{
val a = x.split(" ")
(a(0),a(4),a(5).toInt,a(3),a(1),a(2).toInt)
})
//班级 科目 成绩 性别 姓名 age
val mathsore1= quan.filter(_._2.equals("math")).filter(_._3>=70).filter(_._6>=19).map(x=>{
val a=1;
(x._5,a)
})
println(mathsore1.collect().toBuffer)
//ArrayBuffer((王芳,1), (王小芳,1))
val quan1= quan.map(x=>{
(x._5,x._3)
})
val rdd3=quan1.join(mathsore1)
println(rdd3.collect().toBuffer)
//ArrayBuffer((王芳,(70,1)), (王芳,(70,1)), (王芳,(70,1)), (王小芳,(70,1)), (王小芳,(80,1)), (王小芳,(70,1)))
val rdd4=rdd3.reduceByKey((a,b) => (a._1+b._1,a._2+b._2)).filter(_._2._1>150).map(x=>{
val a = (x._2._1/x._2._2).toDouble
val d = a.formatted("%.2f")
(x._1,d)
})
rdd4.foreach(println)
//(王芳,70.00)
//(王小芳,73.00)
}
}
方式2的第9问使用join的方式是比较简便
另附方式3,写的比较繁琐:建议看看就行
object classDemo {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf()
sparkConf.setAppName("Demo1")
.setMaster("local[*]")
val sc: SparkContext = new SparkContext(sparkConf)
val lines: RDD[String] = sc.textFile("G:\\测试文档\\calssDemo\\text1.txt")
//(1)统计一共有多少人参加考试
//以姓名分组
val name1: RDD[(String, String)] = lines.map(x => {
val a = x.split(" ")
(a(1), a(2))
})
val name2: RDD[(String, Iterable[(String, String)])] = name1.groupBy(_._1)
val counts = name2.count()
// println(counts) //6
//大于20的人数
val sex1 = lines.map(x=>{
val age = x.split(" ")
(age(1),age(2).toInt)
})
// println(sex1.distinct().filter(_._2> 20).count())
//小于20的人数
val sex2 = lines.map(x=>{
val age = x.split(" ")
(age(1),age(2).toInt)
})
// println(sex2.distinct().filter(_._2< 20).count())
//等于20的人数
val sex3 = lines.map(x=>{
val age = x.split(" ")
(age(1),age(2).toInt)
})
// println(sex3.distinct().filter(_._2= 20).count())
//第二题:使用分组的方式做求男女的数量
val sex= lines.map(x => {
val a = x.split(" ")
val id1=a(0) //班级
val id = a(1)//学生
val sexs = a(3)//年龄
(id1,id, sexs)
}).toArray()
//以性别进行分组
val group1: Map[String, Array[(String, String, String)]] = sex.distinct.groupBy(_._3)
//统计分组之后的数量
val sexnum: Map[String, Int] = group1.mapValues(_.length)
//println(sexnum.toBuffer) //ArrayBuffer((男,4), (女,2))
//第三题:统计各班多少人考试
val group2: Map[String, Array[(String, String, String)]] = sex.distinct.groupBy(_._1)
val sexnum2= group2.mapValues(_.length)
//println(sexnum2.toBuffer)//ArrayBuffer((12,3), (13,3))
//第四题:每一科的平均成绩
val danke: RDD[(String, Int)] = lines.map(x => {
val a = x.split(" ")
(a(4), a(5).toInt)
})
val danke1: RDD[(String, Iterable[(String, Int)])] = danke.groupBy(_._1)
val danke2: RDD[(String, String)] = danke1.mapValues(x => {
var a: Int = 0
var b: Double = 0
for (i <- x) {
a += (i._2)
b += 1
}
val c: Double = (a / b)
c.formatted("%.2f") //调用的时候可以使用doubue,但是返回的时候是字符串类型
})
//println(danke2.collect().toBuffer)
// ArrayBuffer((math,63.33), (english,63.33), (chinese,58.33))
//第5题:
//(1)以姓名分组
val name3: RDD[(String, String, String, String)] = lines.map(x => {
val a = x.split(" ")
(a(0), a(1), a(5), a(3))
})
val name4: RDD[(String, Iterable[(String, String, String, String)])] = name3.groupBy(_._2)
// println(name4.collect().toBuffer)
//ArrayBuffer((张大三,CompactBuffer((13,张大三,60,男), (13,张大三,60,男), (13,张大三,70,男))),
// (李大四,CompactBuffer((13,李大四,50,男), (13,李大四,60,男), (13,李大四,50,男))),
// (王芳,CompactBuffer((12,王芳,70,女), (12,王芳,70,女), (12,王芳,70,女))),
// (张三,CompactBuffer((12,张三,50,男), (12,张三,60,男), (12,张三,70,男))),
// (王小芳,CompactBuffer((13,王小芳,70,女), (13,王小芳,80,女), (13,王小芳,70,女))),
// (李四,CompactBuffer((12,李四,50,男), (12,李四,50,男), (12,李四,50,男))))
val score: RDD[(String, String)] = name4.mapValues(x => {
var a: Int = 0
var b: Double = 0
for (i <- x) {
a += (i._3).toInt
b += 1
}
val c: Double = (a / b)
c.formatted("%.2f") //调用的时候可以使用doubue,但是返回的时候是字符串类型
})
//println(score.collect().toBuffer)
//ArrayBuffer((张大三,63.33), (李大四,53.33), (王芳,70.00), (张三,60.00), (王小芳,73.33), (李四,50.00))
//6班级平均成绩
val score2: RDD[(String, List[String])] = name4.mapValues(x => {
var a: Int = 0
var b: Double = 0
var ids = ""
for (i <- x) {
a += (i._3).toInt
b += 1
ids = i._1
}
val c: Double = (a / b)
val d = c.formatted("%.2f")
val empt = List()
val list2 = d :: empt
val list3 = ids :: list2
//(d,ids)
list3
})
// println(score2.collect().toBuffer)
//ArrayBuffer((张大三,List(13, 63.33)), (李大四,List(13, 53.33)),
// (王芳,List(12, 70.00)), (张三,List(12, 60.00)), (王小芳,List(13, 73.33)), (李四,List(12, 50.00)))
val value1: RDD[(String, String)] = score2.map(x => {
val toArray: Array[String] = x._2.toArray
(toArray(0), toArray(1))
})
// println(value1.collect().toBuffer)
//ArrayBuffer((13,63.33), (13,53.33), (12,70.00), (12,60.00), (13,73.33), (12,50.00))
val id_score: RDD[(String, Iterable[(String, String)])] = value1.groupBy(_._1)
//println(id_score.collect().toBuffer)
//ArrayBuffer((13,CompactBuffer((13,63.33), (13,53.33), (13,73.33))), (12,CompactBuffer((12,70.00), (12,60.00), (12,50.00))))
val id_score1: RDD[(String, String)] = id_score.mapValues(x => {
var a: Double = 0
var b: Double = 0
for (i <- x) {
a += (i._2).toDouble
b += 1
}
val c: Double = (a / b)
c.formatted("%.2f") //调用的时候可以使用doubue,但是返回的时候是字符串类型
})
//println(id_score1.collect().toBuffer)
//ArrayBuffer((13,63.33), (12,60.00))
// 6.1 12班男生平均总成绩是多少?
// 6.2 12班女生平均总成绩是多少?
//-----------------------------------------------------------------------
val sex_score1: RDD[(String, List[String])] = name4.mapValues(x => {
var a: Int = 0
var b: Double = 0
var ids = ""
var ids1 = ""
for (i <- x) {
a += (i._3).toInt
b += 1
ids = i._1
ids1 = i._4
}
val c: Double = (a / b)
val d = c.formatted("%.2f")
val empt = List()
val list2 = d :: empt
val list3 = ids :: list2
val list4 = ids1 :: list3
//(d,ids)
list4
})
//println(sex_score1.collect().toBuffer)
//ArrayBuffer((张大三,List(男, 13, 63.33)), (李大四,List(男, 13, 53.33)), (王芳,List(女, 12, 70.00)),
// (张三,List(男, 12, 60.00)), (王小芳,List(女, 13, 73.33)), (李四,List(男, 12, 50.00)))
val sex_score2: RDD[(String, String, String)] = sex_score1.map(x => {
val array: Array[String] = x._2.toArray
(array(0), array(1),array(2))
})
val sex_score3: RDD[(String, Iterable[(String, String, String)])] = sex_score2.groupBy(_._2)
/* //: RDD[(String, List[(String, Int, Double)])]
val acb= sex_score3.mapValues(x=>(x.map(t=>(t._1,t._2.toInt,t._3.toDouble))).toList).mapValues(x=>{
val n1 = x.map(x1=>(x1._1,x1._2,x1._3,1))
val n2= n1.filter(_._1.equals("男")).reduce(_._3+_._3)//.reduce(_._3+_._3)
val v2= n1.filter(_._1.equals("女")).reduce((a,b)=>(a._3+b._3))
(n2,v2)
})
println(acb.collect().toBuffer)*/
val sex_score4: RDD[(String, (String, String))] = sex_score3.mapValues(x=>{
var a=0 //计算性别的人数
var a0=0 //计算性别的人数
var score:Double=0
var score1:Double=0
for(i<-x){
if(i._1.equals("男")) {
score += i._3.toDouble
a+=1
}
else {
score1 += i._3.toDouble
a0+=1
}
}
val a1="n"+(score/a).toString
val a2="v"+(score1/a0).toString
(a1,a2)
})
//println(sex_score4.collect().toBuffer)
//ArrayBuffer((13,(n58.33,v73.33)), (12,(n55.0,v70.0)))
//println("---------------7----------------")
//班级 科目 成绩 性别 姓名 age
/* val quan=lines.map(x=>{
val a = x.split(" ")
(a(0),a(4),a(5).toInt,a(3),a(1),a(2).toInt)
})
val kemu= quan.groupBy(_._2)
// println(kemu.collect().toBuffer)
//ArrayBuffer((math,CompactBuffer((12,math,60), (12,math,50), (12,math,70), (13,math,60), (13,math,60), (13,math,80))),
// (english,CompactBuffer((12,english,70), (12,english,50), (12,english,70), (13,english,70), (13,english,50), (13,english,70)))
//T7.1
val rddt7: Array[(Int, String)] = quan.filter(_._2.equals("chinese")).map(x => (x._3,x._2)).top(1)
//ArrayBuffer((70,chinese))
//println(rddt7.toBuffer)
//T7.2
val r_man_2 = quan.filter(_._1.toInt==12).filter(_._1.equals("chinese")).map(x => (x._3,x._2)).takeOrdered(1)
// r_man_2.foreach(println)
//T7.3
val r_math = quan.filter(_._1.toInt==13).filter(_._1.equals("math")).map(x => (x._3,x._2)).top(1)
// r_math.foreach(println)
//-----------8. 总成绩大于150分的12班的女生有几个?-----------------
val soer: RDD[(String, Int)] = quan.filter(_._4.equals("女")).filter(_._1.equals("12")).map(x=>{
(x._1,x._3)
})
val soer1=soer.reduceByKey(_+_).filter(a=>(a._2>150)).count()
// println(soer1)
*/
//-----------9. 总成绩大于150分,且数学大于等于70,且年龄大于等于19岁的学生的平均成绩是多少?--------------
//班级 科目 成绩 性别 姓名 age
val quan=lines.map(x=>{
val a = x.split(" ")
(a(0),a(4),a(5).toInt,a(3),a(1),a(2).toInt)
})
//班级 科目 成绩 性别 姓名 age
val mathsore1= quan.filter(_._2.equals("math")).filter(_._3>=70).filter(_._6>=19).map(x=>{
val a=1;
(x._5,a)
})
println(mathsore1.collect().toBuffer)
//ArrayBuffer((王芳,1), (王小芳,1))
val quan1= quan.map(x=>{
(x._5,x._3)
})
val rdd3=quan1.join(mathsore1)
println(rdd3.collect().toBuffer)
//ArrayBuffer((王芳,(70,1)), (王芳,(70,1)), (王芳,(70,1)), (王小芳,(70,1)), (王小芳,(80,1)), (王小芳,(70,1)))
val rdd4=rdd3.reduceByKey((a,b) => (a._1+b._1,a._2+b._2)).filter(_._2._1>150).map(x=>{
val a = (x._2._1/x._2._2).toDouble
val d = a.formatted("%.2f")
(x._1,d)
})
rdd4.foreach(println)
//(王芳,70.00)
//(王小芳,73.00)
/*
//总成绩大于150分,且数学大于等于70,且年龄大于等于20岁的学生的平均成绩是多少
//姓名、年龄、科目、分数
val res1 = tuples.map(x=>(x._2,x._3,x._5,x._6)).filter(_._2>=20)
val res2 = res1.map(x=>(x._1,x._4)).reduceByKey(_+_).filter(_._2>150)
val res3 = res1.filter(_._3.equals("math")).filter(_._4>=70).map(x=>(x._1))
val list = res3.collect().toList
for (i <- list) {
println(res2.filter(_._1.equals(i)).collect().toList)
}
*/
//println("-----------------完--------------------")
sc.stop()
}
}