Spark集群遇到的问题

1.SparkContext: Error initializing SparkContext

18/10/29 15:55:39 ERROR SparkContext: Error initializing SparkContext.
java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.

客户端运行时错误,原因:主机名为修改,或修改后未重启

2.spark on yarn

问题出现:在yarn集群上运行spark任务

ERROR client.TransportClient: Failed to send RPC 6600979308376699964 to /192.168.56.103:56283: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException

ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend

ERROR cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 6600979308376699964 to /192.168.56.103:56283: java.nio.channels.ClosedChannelException

Caused by: java.nio.channels.ClosedChannelException

Exception in thread "main" java.lang.IllegalStateException: Spark context stopped while waiting for backend

ERROR util.Utils: Uncaught exception in thread Yarn application state monitor
org.apache.spark.SparkException: Exception thrown in awaitResult

Caused by: java.io.IOException: Failed to send RPC 6600979308376699964 to /192.168.56.103:56283: java.nio.channels.ClosedChannelException

Caused by: java.nio.channels.ClosedChannelException

这个问题是由于内存不足导致的。在yarn-site.xml中添加一下信息

<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>

<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

注意:重启集群!!!

3.Spark集群启动失败

问题描述:/etc/profile文件中已经配置JAVA_HOME,并且已经source /etc/profile了。启动spark集群,所有的从节点都报以下错误

JAVA_HOME is not set

在spark-env.sh中添加JAVA_HOME配置信息
JAVA_HOME=/opt/java/jdk1.8.0_151
再重新启动spark集群即可。

4.Spark HA集群中两个master都是standby状态

问题描述:集群配置文件是从其他服务器上复制的,但是个人虚拟机上的主机名与服务器上的主机名不相同。启动spark HA集群后,发现两个master都是standby状态。通过查看master启动日志发现如下错误:

INFO zookeeper.ZooKeeper: Initiating client connection, connectString=node02:2181,node03:2181,node04:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@3a7078dc
ERROR imps.CuratorFrameworkImpl: Background exception was not retry-able or retry gave up
java.net.UnknownHostException: node02: Name or service not known

ERROR curator.ConnectionState: Connection timed out for connection string (node02:2181,node03:2181,node04:2181) and timeout (15000) / elapsed (15326)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss

ERROR curator.ConnectionState: Connection timed out for connection string (node02:2181,node03:2181,node04:2181) and timeout (15000) / elapsed (35375)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss

ERROR netty.Inbox: Ignoring error
java.net.UnknownHostException: node02: Name or service not known

因为主机名不同,并且spark-env.sh配置文件中没有修改SPARK_DAEMON_JAVA_OPTS的配置信息, 所以spark HA集群依赖的zookeeper集群是一个错误的。将SPARK_DAEMON_JAVA_OPTS配置信息中的zookeeper url修改成自己的zookeeper集群地址就ok了

猜你喜欢

转载自blog.csdn.net/love__guo/article/details/84141070
今日推荐