Preface
Because of the package function processing of Rdd and Dataframe in spark, type conversion is often encountered. Today, we will record some common type conversions.
Array => Row
val arr = Array("aa/2/cc/10","xx/3/nn/30","xx/3/nn/20")
// val row = Row.fromSeq(arr)
val row = RowFactory.create(arr)
Row => Array
val a:Array[Any] = row.toSeq.toArray
Sometimes the array type T is limited, such as String. At this time, intermediate processing is needed
val a:Array[String] = row.toSeq.map(m => m.toString).toArray
Tuple => Array
val tuple = ((20201022,5060180989186180L,"[12, 15)"),288556)
tuple.productIterator.toArray
Object T to array can also use the above method.
Array => RDD
val rdd = sparkSession.sparkContext.parallelize(Array(tuple))
RDD => DataFrame
// 定义类
case class Person(name:String, age:Int)
Can pass RDD[Row]
val rdd = sparkSession.sparkContext.parallelize(Array(("tom",1),("luna",2))).map(row =>Row(row._1, row._2))
// 创建Schema
val schema=StructType(Array(
StructField("name",StringType,true),
StructField("age",IntegerType,true)
))
val df = sparkSession.createDataFrame(rdd,schema)
You can also go sparkSession.implicits._
directly to df through implicit conversion.
import sparkSession.implicits._
val df = sparkSession.sparkContext.parallelize(Array(("tom",1),("luna",2)))
.map(row =>Person(row._1, row._2)).toDF()
DataFrame => RDD
val rdd1 = df.rdd