WARN YarnClusterScheduler: not a…

解决了pyspark找不到"python"的问题以后:
【新问题】
查看目标节点8042端口web ui中application list里面的container的logs,发现无法分配资源的老问题:
17/02/26 22:33:11 WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

【用IDE编程遇到过这个问题】
当时使用的编程语言是java,IDE是Netbeans,解决方案是添加如下红色语句:
SparkConf conf = new SparkConf().setAppName("WordCount");
conf.setMaster("spark://192.168.136.134:7077");
conf.set("spark.executor.memory", "512m");
指定内存数量后问题解决。
【但是这个方法在oozie提交pyspark任务无效,不知道为何】
【尝试一】在workflow.xml里面的spark-opt里面增加
--executor-memory 512m
参数,【结果】无效,依然提示分配不到资源

【悲剧】此时想着修改提交spark的要求内存的参数,那么先kill掉这个要求资源过多的job吧。
yarn application -kill application_1488177038506_0002回车
然后目标运算节点big2直接宕机重启了
big1节点直接黑屏
关闭spark命令执行后,发觉2个节点的worker都不在工作
no worker to stop

【尝试二】会不会是yarn资源不够的问题?此时暴力增加yarn的资源,yarn-site.xml添加如下语句:
(PS:这个节点的实际vmware配置的内存是4GB,物理内核2,每个虚拟2个,一共4个内核)
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>4096</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.cpu-vcore</name>
        <value>2</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-vcores</name>
        <value>2</value>
    </property>
【结果】问题解决了。


【分析】yarn可能是这么计算一个container的内存需要量的,把命令行运行后的spark任务所需要的内存量也计算上去了。因此要求512m其实跟spark里面spark-env.sh配置的内存大小无关,而跟yarn中每个计算节点yarn-site.xml配置好的内存大小有关。

【细节】之前配置了spark on yarn中的spark-yarn-shuffle.jar:
$HADOOP_HOME/lib/spark-yarn-shuffle.jar (注意不是拷贝到share里面,但是能用)

附录(出现无法分配资源时的bug日志):
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoopTmp/nm-local-dir/filecache/146/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/02/26 23:03:25 INFO ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
17/02/26 23:03:26 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1488178815770_0002_000001
17/02/26 23:03:26 INFO SecurityManager: Changing view acls to: hadoop
17/02/26 23:03:26 INFO SecurityManager: Changing modify acls to: hadoop
17/02/26 23:03:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
17/02/26 23:03:26 INFO ApplicationMaster: Starting the user application in a separate Thread
17/02/26 23:03:26 INFO ApplicationMaster: Waiting for spark context initialization
17/02/26 23:03:26 INFO ApplicationMaster: Waiting for spark context initialization ... 
17/02/26 23:03:36 INFO ApplicationMaster: Waiting for spark context initialization ... 
17/02/26 23:03:42 INFO SparkContext: Running Spark version 1.6.1
17/02/26 23:03:42 INFO SecurityManager: Changing view acls to: hadoop
17/02/26 23:03:42 INFO SecurityManager: Changing modify acls to: hadoop
17/02/26 23:03:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
17/02/26 23:03:43 INFO Utils: Successfully started service 'sparkDriver' on port 42645.
17/02/26 23:03:43 INFO Slf4jLogger: Slf4jLogger started
17/02/26 23:03:43 INFO Remoting: Starting remoting
17/02/26 23:03:43 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:45629]
17/02/26 23:03:43 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 45629.
17/02/26 23:03:43 INFO SparkEnv: Registering MapOutputTracker
17/02/26 23:03:44 INFO SparkEnv: Registering BlockManagerMaster
17/02/26 23:03:44 INFO DiskBlockManager: Created local directory at /home/hadoop/hadoopTmp/nm-local-dir/usercache/hadoop/appcache/application_1488178815770_0002/blockmgr-5ae09c86-2439-4eab-b195-392baed84a9c
17/02/26 23:03:44 INFO MemoryStore: MemoryStore started with capacity 457.9 MB
17/02/26 23:03:44 INFO SparkEnv: Registering OutputCommitCoordinator
17/02/26 23:03:44 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
17/02/26 23:03:44 INFO Utils: Successfully started service 'SparkUI' on port 38152.
17/02/26 23:03:44 INFO SparkUI: Started SparkUI at http://192.168.136.136:38152
17/02/26 23:03:44 INFO YarnClusterScheduler: Created YarnClusterScheduler
17/02/26 23:03:44 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46068.
17/02/26 23:03:44 INFO NettyBlockTransferService: Server created on 46068
17/02/26 23:03:44 INFO BlockManagerMaster: Trying to register BlockManager
17/02/26 23:03:44 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.136.136:46068 with 457.9 MB RAM, BlockManagerId(driver, 192.168.136.136, 46068)
17/02/26 23:03:44 INFO BlockManagerMaster: Registered BlockManager
17/02/26 23:03:44 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://[email protected]:42645)
17/02/26 23:03:44 INFO RMProxy: Connecting to ResourceManager at bigmaster/192.168.136.134:8030
17/02/26 23:03:45 INFO YarnRMClient: Registering the ApplicationMaster
17/02/26 23:03:45 INFO YarnAllocator: Will request 2 executor containers, each with 1 cores and 896 MB memory including 384 MB overhead
17/02/26 23:03:45 INFO YarnAllocator: Container request (host: Any, capability: 896, vCores:1>)
17/02/26 23:03:45 INFO YarnAllocator: Container request (host: Any, capability: 896, vCores:1>)
17/02/26 23:03:45 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
17/02/26 23:04:14 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
17/02/26 23:04:14 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done
17/02/26 23:04:15 INFO SparkContext: Starting job: reduce at spark1.py:16
17/02/26 23:04:15 INFO DAGScheduler: Got job 0 (reduce at spark1.py:16) with 2 output partitions
17/02/26 23:04:15 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at spark1.py:16)
17/02/26 23:04:15 INFO DAGScheduler: Parents of final stage: List()
17/02/26 23:04:15 INFO DAGScheduler: Missing parents: List()
17/02/26 23:04:15 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at reduce at spark1.py:16), which has no missing parents
17/02/26 23:04:15 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 5.2 KB, free 5.2 KB)
17/02/26 23:04:15 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.2 KB, free 8.4 KB)
17/02/26 23:04:15 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.136.136:46068 (size: 3.2 KB, free: 457.9 MB)
17/02/26 23:04:15 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/02/26 23:04:15 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[1] at reduce at spark1.py:16)
17/02/26 23:04:15 INFO YarnClusterScheduler: Adding task set 0.0 with 2 tasks
17/02/26 23:04:30 WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/02/26 23:04:45 WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/02/26 23:05:00 WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/02/26 23:05:15 WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/02/26 23:05:30 WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/02/26 23:05:45 WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/02/26 23:06:00 WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/02/26 23:06:15 WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

猜你喜欢

转载自blog.csdn.net/u010770993/article/details/70312484