通过spark sql创建HIVE的分区表

今天需要通过导入文本中的数据到HIVE数据库,而且因为预设该表的数据会比较大,所以采用分区表的设计方案。将表按地区和日期分区。在这个过程出现过一些BUG,记录以便后期查看。

 spark.sql("use oracledb")
 spark.sql("CREATE TABLE IF NOT EXISTS " + tablename + " (OBUID STRING, BUS_ID STRING,REVTIME STRING,OBUTIME STRING,LONGITUDE STRING,LATITUDE STRING,\
 GPSKEY STRING,DIRECTION STRING,SPEED STRING,RUNNING_NO STRING,DATA_SERIAL STRING,GPS_MILEAGE STRING,SATELLITE_COUNT STRING,ROUTE_CODE STRING,SERVICE STRING)\
  PARTITIONED BY(AREASTRING,OBUDATE STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ")
 
 spark.sql("set  hive.exec.dynamic.partition.mode = nonstrict")
 spark.sql("set  hive.exec.dynamic.partition = true")

# print("创建数据库完成")
 if addoroverwrite:
     # 追加
     spark.sql("INSERT INTO TABLE " + tablename + " PARTITION(AREA,OBUDATE) SELECT OBUID,BUS_ID, REVTIME, OBUTIME,LONGITUDE ,LATITUDE,GPSKEY,DIRECTION,SPEED,\
                 RUNNING_NO,DATA_SERIAL,GPS_MILEAGE, SATELLITE_COUNT ,ROUTE_CODE,SERVICE,'gz' AS AREA,SUBSTR(OBUTIME,1,10) AS OBUDATEFROM " + tablename + "_tmp")
z执行脚本后出现以下错误:

Partition spec {area=, obudate=, AREA=gz, OBUDATE=2017-01-} contains non-partition columns;

经过度娘,有提到分区表中大小写的BUG,于是修改脚本,将分区字段小写,执行成功。修改后的脚本:

 spark.sql("use oracledb")
 spark.sql("CREATE TABLE IF NOT EXISTS " + tablename + " (OBUID STRING, BUS_ID STRING,REVTIME STRING,OBUTIME STRING,LONGITUDE STRING,LATITUDE STRING,\
 GPSKEY STRING,DIRECTION STRING,SPEED STRING,RUNNING_NO STRING,DATA_SERIAL STRING,GPS_MILEAGE STRING,SATELLITE_COUNT STRING,ROUTE_CODE STRING,SERVICE STRING)\
  PARTITIONED BY(area STRING,obudate STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ")
 # 设置参数
 # hive > set  hive.exec.dynamic.partition.mode = nonstrict;
 # hive > set  hive.exec.dynamic.partition = true;
 spark.sql("set  hive.exec.dynamic.partition.mode = nonstrict")
 spark.sql("set  hive.exec.dynamic.partition = true")

# print("创建数据库完成")
 if addoroverwrite:
     # 追加
     spark.sql("INSERT INTO TABLE " + tablename + " PARTITION(area,obudate) SELECT OBUID,BUS_ID, REVTIME, OBUTIME,LONGITUDE ,LATITUDE,GPSKEY,DIRECTION,SPEED,\
                 RUNNING_NO,DATA_SERIAL,GPS_MILEAGE, SATELLITE_COUNT ,ROUTE_CODE,SERVICE,'gz' AS area ,SUBSTR(OBUTIME,1,10) AS obudate FROM " + tablename + "_tmp")

猜你喜欢

转载自blog.csdn.net/qq_39160721/article/details/80651256