版权声明:原创文章,转载请注明出处 https://blog.csdn.net/xianpanjia4616/article/details/84945043
UDF(User Defined Function):spark SQL中用户自定义函数,用法和spark SQL中的内置函数类似;是saprk SQL中内置函数无法满足要求,用户根据业务需求自定义的函数。hive中UDF的使用请看这里
下面看一个UDF在sparksql中的使用的简单demo:
package spark
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{StringType, StructField, StructType}
/**
* spark中udf的简单使用;
*/
object sparkUDF {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName("Spark SQL UDF Example")
.master("local[4]")
.getOrCreate()
import spark.sql
val sc = spark.sparkContext
val names = Array("jason", "jim", "jam","jj")
val nameRDD = sc.parallelize(names, 10)
val nameRowRDD = nameRDD.map(name => Row(name))
val structType = StructType(Array(StructField("name", StringType, true)))
val namesDF = spark.createDataFrame(nameRowRDD, structType)
namesDF.createTempView("test")
spark.udf.register("jason",(str:String)=>{
str + " hello jason"
})
sql("select name,jason(name) as jason from test").show()
}
}
执行后的结果为:
+-----+------------------+
| name| jason|
+-----+------------------+
|jason|jason hello jason|
| jim| jim hello jason|
| jam| jam hello jason|
| jj| jj hello jason|
+-----+------------------+
如果有写的不对的地方,欢迎大家指正,如果有什么疑问,可以加QQ群:340297350,谢谢