Spark算子:RDD行动Action操作(2)–take、top、takeOrdered

关注我的微信公众号:pythonislover,领取python,大数据,SQL优化相关视频资料!~

Python大数据与SQL优化笔 QQ群:771686295

take

def take(num: Int): Array[T]

take用于获取RDD中从0到num-1下标的元素,不排序。

scala> var rdd1 = sc.makeRDD(Seq(10, 4, 2, 12, 3))

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[40] at makeRDD at :21

scala> rdd1.take(1)

res0: Array[Int] = Array(10)

scala> rdd1.take(2)

res1: Array[Int] = Array(10, 4)

top

def top(num: Int)(implicit ord: Ordering[T]): Array[T]

top函数用于从RDD中,按照默认(降序)或者指定的排序规则,返回前num个元素。

scala> var rdd1 = sc.makeRDD(Seq(10, 4, 2, 12, 3))

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[40] at makeRDD at :21

scala> rdd1.top(1)

res2: Array[Int] = Array(12)

scala> rdd1.top(2)

res3: Array[Int] = Array(12, 10)

//指定排序规则

scala> implicit val myOrd = implicitly[Ordering[Int]].reverse

myOrd:scala.math.Ordering[Int]=scala.math.Ordering$$anon$4@767499ef

scala> rdd1.top(1)

res4: Array[Int] = Array(2)

scala> rdd1.top(2)

res5: Array[Int] = Array(2, 3)

takeOrdered

def takeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T]

takeOrdered和top类似,只不过以和top相反的顺序返回元素。

scala> var rdd1 = sc.makeRDD(Seq(10, 4, 2, 12, 3))

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[40] at makeRDD at :21

scala> rdd1.top(1)

res4: Array[Int] = Array(2)

scala> rdd1.top(2)

res5: Array[Int] = Array(2, 3)

scala> rdd1.takeOrdered(1)

res6: Array[Int] = Array(12)

scala> rdd1.takeOrdered(2)

res7: Array[Int] = Array(12, 10)

猜你喜欢

转载自blog.csdn.net/yrg5101/article/details/88915052