Causes and solutions hive-staging file generated

Submitted by spark-sql, hive-sql, hue and other select or insert overwrite et sql when the hive, will produce the directory for temporary storage of the results, such as insert overwrite the results will be temporarily stored to the directory, completion of the task , copy the results to hive table.
Generating policy on the directory location may refer to the article: https://blog.csdn.net/zhoudetiankong/article/details/51800887 , mentioned in the article that can be modified to generate a location of the directory, reproduced below:
default configuration:
<Property >
    <name> hive.exec.stagingdir </ name>
    <value> the staging-.hive </ value>
</ Property>  

修改后:
    <property>
         <name>hive.exec.stagingdir</name>
         <value>/tmp/hive/.hive-staging</value>
    </property>

Test hive into force, sparksql does not work, should be sparksql the bug:   https://issues.apache.org/jira/browse/SPARK-1837


Hive-staging files are not automatically deleted in two cases: 1, task execution process abnormal 2, long time to keep the connection or session.
So modify hive.exec.stagingdir to a specific directory is still very necessary, if not to a particular output directory and the output directory to the table below, is difficult to detect these junk files, refer to:
HTTPS: // the WWW. aboutyun.com//forum.php/?mod=viewthread&tid=20657&extra=page%3D1&page=1&

 

Guess you like

Origin www.cnblogs.com/ucarinc/p/11831280.html