df.write.mode(SaveMode.Append).jdbc(url, table, connectionProperties)
遇到异常
Hint: You will need to rewrite or cast the expression.
Position: 388 Call getNextException to see other errors in the batch.
at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:148)
at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:777)
at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1563)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:215)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:277)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:276)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1869)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1869)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.postgresql.util.PSQLException: ERROR: column "extdata" is of type jsonb but expression is of type character
Hint: You will need to rewrite or cast the expression.
Position: 388
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2477)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2190)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:300)
at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:774)
... 14 more
参考
https://github.com/apache/spark/pull/8948
https://github.com/apache/spark/commit/1be25157449335dca70ba37720a172efa1f90714
spark-shell 测试
spark-shell --jars $(echo /home/chenf/ExhaustEmission/DataMiningAnalysis-0.1-SNAPSHOT/target/jars/*.jar | tr ' ' ',')
import org.apache.spark.sql.{DataFrame, SQLContext, SaveMode}
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
case class Test(id:String,extdata:String)
val testData = sc.parallelize(List(Test("1","{"id":"1","data":"data1"}")))
val df = testData.toDF
val connectionProperties = new java.util.Properties()
connectionProperties.put("driver", "org.postgresql.Driver")
connectionProperties.put("user", "test")
connectionProperties.put("password", "test")
import org.apache.spark.sql.{DataFrame, SQLContext, SaveMode}
df.write.mode(SaveMode.Append).jdbc("jdbc:postgresql://localhost:5432/dev_test", "tb_test", connectionProperties)
不支持,抛出异常
自己写sql批量插入