问题场景
集群之前启用了Kerberos
,但是后来为了使用方便,将相关的配置都去除了。在hive
上面执行SQL,比如select a from b where a.t ='1';
这句简单的SQL都会报错,报错提示如下:
Application application_1581349098902_0008 failed 2 times due to AM Container for appattempt_1581349098902_0008_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://namenode01:8088/proxy/application_1581349098902_0008/Then, click on links to logs of each attempt.
Diagnostics: Not able to initialize app directories in any of the configured local directories for app application_1581349098902_0008
Failing this attempt. Failing the application.
问题环境
CDH 5.15.1
问题原因
这个其实是一个权限问题,在yarn初始化作业的时候就报错了。是CDH
去除kerberos
的时候,未重新更新目录权限,如果之前在kerberos
环境下已经用过yarn
,已经存在对应的目录。去除kerberos
之后,未更新权限,导致新作业失败。未启用kerberos
前,/yarn/nm/usercache/test
目录权限为yarn:yarn
,启用后变成test:yarn
,权限不兼容,导致yarn
生成日志失败,故作业失败。
解决方案
-
查找
NodeManager
节点
-
获取
yarn.nodemanager.local-dirs
-
到
NodeManager
的yarn.nodemanager.local-dirs
,删除usercache
目录rm -rf /data01/yarn/nm/usercache/*
-
重启
yarn
组件