操作
spark2.4.5 hive 2.3.5 客户端
执行Spark 任务查询hive表时报NoClassDefFoundError 异常
异常日志
javax.jdo.JDOFatalInternalException: Unexpected exception caught.
at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1193)
...
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6891)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:164)
...
at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332)
at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
...
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at com.xxx.xxx.xxx.xxx(xxx.java:43)
at com.xxx.xxx.xxx.main(xxx.java:29)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
NestedThrowablesStackTrace:
java.lang.reflect.InvocationTargetException
...
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:659)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6891)
...
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:388)
at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332)
at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
...
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at xxx.xxx.xxx.xxx.xxx(xxx.java:43)
at xxx.xx.xxx.xxx.main(xxx.java:29)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.derby.jdbc.AutoloadedDriver40
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at java.sql.DriverManager.isDriverAllowed(DriverManager.java:556)
at java.sql.DriverManager.getConnection(DriverManager.java:661)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
... 101 more
异常分析
1、 查看异常堆栈
NoClassDefFoundError: Could not initialize class org.apache.derby.jdbc.AutoloadedDriver40
为初始化类异常(注意不是ClassNotFoundException),此类对应的包为derby*.jar
2、搜索相关包
find / -name derby*.jar 发现jdk 下面有此包,hive 下也有,并不缺少包。ClassNotFoundException是缺少包,NoClassDefFoundError 应该为版本不一致导致。
3、检查hive是否正常
执行hive命令,发现hive正常
4、再次查看堆栈信息,查看执行哪段业务代码抛出的异常
发现 com.xxx.xxx.xxx.xxx(xxx.java:43) 执行了 sparkSession.sql("use "+ dbname);
5、手工执行spark-shell命令,模拟代码动作
import spark.sql
sql("use dbname")
抛出类似异常,可以断定spark 任务依赖的jar存在问题
6、检查SPARK_HOME 下的jar
发现依赖的hive版本为1.2.2, 但实际运行环境的hive为 2.3.5
解决过程
1、替换SPARK_HOME 下的hive jar包,
2、使用spark sql 验证,以前异常消失,新异常为:java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
3、查资料发现spark2.4.5 依赖的hive版本为1.2.x 版本。hive升级到了2之后的版本,hive去掉了HIVE_STATS_JDBC_TIMEOUT这个参数。spark-sql代码依然调用hive的这个参数,这样就报错了。
4、如何解决spark 和hive 版本的一致性问题呢?
(1)修改spark 源代码,删除掉HIVE_STATS_JDBC_TIMEOUT,重新编译spark-hive的jar包
(2)spark 依旧使用hive 旧版本hive客户端,连接新版本的hive
5、spark 依旧使用hive 旧版本hive客户端方案:将spark所在服务器安装1.2.x版本hive客户端,hive-site.xml同步到spark和hadoop配置文件中
完毕,验证通过