Spark读取MySQL中的数据为DataFrame

Spark读取JDBC中的数据(以MySQL为例)为DataFrame,有两种方式。

//聚合的时候默认分区是200,可以在此设置
val spark = SparkSession.builder().master("local").appName("schema")
      .config("spark.sql.shuffle.partitions",1).getOrCreate()

方式一

	val properties = new Properties()
    //设置用户名、密码
    properties.setProperty("user","root")
    properties.setProperty("password","123")
    //读取mysql中的person表
    val personDF= spark.read.jdbc("jdbc:mysql://192.168.16.11:3306/spark","person",properties)
        //多表关联查询,一定要给别名
//    val personDF= spark.read.jdbc("jdbc:mysql://192.168.16.11:3306/spark",
//      "(select person.id,person.name,person.age,score.score from person,score where person.id=score.id) T",properties)

方式二

map集合存着登录连接mysql的库,mysql启动的驱动,和用户名密码,具体哪张表。在option里仍进去map。

	val map = Map[String, String](
      "url" -> "jdbc:mysql://192.168.16.11:3306/spark",
      "driver" -> "com.mysql.jdbc.Driver",
      "user" -> "root",
      "password" -> "123",
      "dbtable" -> "score"
    )
    val dataFrame = spark.read.format("jdbc").options(map).load()

方式三

把启动所需的属性分别.option写出来

	val reader: DataFrameReader = spark.read.format("jdbc")
      .option("url", "jdbc:mysql://192.168.16.11:3306/spark")
      .option("driver", "com.mysql.jdbc.Driver")
      .option("user", "root")
      .option("password", "123")
      .option("dbtable", "score")
    val dataFrame = reader.load()

将以上两张表注册临时表,关联查询

    person.createOrReplaceTempView("person")
    score2.createOrReplaceTempView("score")
    val result = spark.sql("select person.id,person.name,person.age,score.score from person ,score where  person.id = score.id")

将关联查询后的结果保存到mysql表中:

result.write.mode(SaveMode.Append).jdbc("jdbc:mysql://192.168.16.111:3306/spark", "result", properties)
发布了197 篇原创文章 · 获赞 245 · 访问量 4万+

猜你喜欢

转载自blog.csdn.net/qq_36299025/article/details/97817989
今日推荐