About a .combineByKey operator
Function: the packet count and custom summation.
Features: for processing (key, value) types of data.
Implementation steps:
1. The data to be processed is initialized, and the number of conversion operations
2. Check whether the key is the first treatment, the first treatment is added, otherwise the merge partitions based on custom logic []
3. The combined packet, return results
Operator Code combat two .combineByKey
1 package big.data.analyse.scala.arithmetic 2 3 import org.apache.spark.sql.SparkSession 4 /** 5 * Created by zhen on 2019/9/7. 6 */ 7 object CombineByKey { 8 def main (args: Array[String]) { 9 val spark = SparkSession.builder().appName("CombineByKey").master("local[2]").getOrCreate() 10 val sc = spark.sparkContext 11 sc.setLogLevel("error") 12 13 val initialScores = Array((("hadoop", "R"), 1), (("hadoop", "java"), 1), 14 (( "Spark", "Scala"),. 1), (( "Spark", "R & lt"),. 1), (( "Spark", "Java"),. 1 )) 15 16 Val D1 = sc.parallelize (initialScores) . 17 18 is Val d1.map Result = (X => . (x._1._1, (x._1._2, x._2))) combineByKey ( . 19 (V: (String, Int)) => (v: (String, Int)), // initialization operation, when the key for the first time to perform some initialization and conversion operations 20 (c: (String, Int), v: (String, Int)) => (c._1 + "," + v._1, c._2 + v._2), // partition merge, merge non-first appeared 21 (c1: (String, Int), c2: (String, Int)) => (c1._1 + "," + c2._1 , c1._2 + c2._2)) // Packet Consolidation 22 .collect() 23 24 result.foreach(println) 25 } 26 }
Three .combineByKey operator execution result