Hive使用常见问题

1）内存溢出

map阶段
解决：一般存在MapJoin，设置参数set hive.auto.convert.join = false转成reduce端的Common Join。
shuffle阶段
解决：增加reduce数（set mapreduce.job.reduces=xxx）或调整放在内存里的最大片段所占百分比（set mapreduce.reduce.shuffle.memory.limit.percent=0.10）。
reduce阶段
解决：增加reduce数（set mapreduce.job.reduces=xxx）。如果存在数据倾斜，单纯增加reduce个数没有用，参考“Hive优化方法.ppt”进行数据倾斜优化。

2）执行动态分区HQL报错，报错信息类似如下：

org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x128x0x0x19x1x255 with properties {columns=reducesinkkey0,reducesinkkey1,reducesinkkey2,reducesinkkey3, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=++++++, columns.types=int,int,int,int,string,bigint}
解决：设置参数set hive.optimize.sort.dynamic.partition=false。

3）Hive创建文件数过多问题，报错信息类似如下：

total number of created files now is 100130, which exceeds 100000. Killing the job
解决：调大参数hive.exec.max.created.files。

4）Hive使用过多的变量替换，报错信息如下：

FAILED: IllegalStateException Variable substitution depth too large: 40
解决：调大参数hive.variable.substitute.depth。

5）select * 使用MR而不使用fetchtask

解决：set hive.fetch.task.conversion=minimal。

6）分区表分区字段是日期类型时限制分区条件使用to_unix_timestamp方法，如果使用unix_timestamp方法仍会扫全表。

7)设置job 的名称Causedby: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$PathComponentTooLongException): The maximum path component name limit of ...Could not find status of job:job_id...
这个问题是job name过长导致，一般任务运行成功，但看不到日志，historyserver提示任务not found。
解决：设置job 的名称set mapreduce.job.name=XXX;

8）使用动态分区时碰到如下报错信息：

Failed with exception Number of dynamic partitions created is 1191, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1191
参数hive.exec.max.dynamic.partitions限制了所允许的最大分区个数，默认值是1000。
解决：调大参数hive.exec.max.dynamic.partitions。
9）往rcfile格式的表insert数据时报“Caused by: java.lang.UnsupportedOperationException: Currently the writer can only accept BytesRefArrayWritable”
解决：修改rcfile 表 serde 属性
alter table table_name set serde 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe';
10）任务执行报“Caused by: java.io.IOException: Split metadata size exceeded 10000000. Aborting job XXX”
job.splitmetainfo文件大小超过默认值10000000（10M）。
解决：调大参数mapreduce.jobtracker.split.metainfo.maxsize。