YARN历史背景与概述(XXX on YARN)
在Hadoop(MapReduce)1.x时代,因为架构原因,容易出现单节点故障、节点压力大不容易扩展,随着集群扩大,客户端越来越多,JobTracker显然无法更好支撑,并且当有更多的大数据框架集群(如spark等),这些集群之间就会存在无法共享资源,并且集群利用率低(多集群,无法掌控让集群均衡且不间断的工作),当集群过多时,运维也变得困难起来。在这样的背景下,yarn诞生了。让所有的集群运作在一个平台上,各个集群可以实现资源共享,提升使用率。
架构图如下(关于MapReduce会在后续文章进行介绍):
YARN诞生与Hadoop2.x版本,它是一个统一的运作平台(资源调度平台),在它之前,各大数据集群框架只运行在自己的环境之上,有了它,大家可以在同一个作坊工作,这样资源协调和共享变得简单起来,如下图:
YARN架构
首先来一张yarn的架构图:
- Resource Manager(RM):
1.为整个集群提供服务(一主一备,但同一时间提供服务只有一个),负责集群资源的统一管理和调度
2.处理客户端请求:提交作业、杀死作业
3.监控NM,如果一旦一个NM挂了,会将该NM的运行任务告诉AM - NodeManager(NM)
1.整个集群有多个NM,每个负责本身的节点资源管理和使用
2.定时向RM汇报自身的节点资源使用情况以及健康状况
3.需要处理RM、AM的指令 - ApplicationMaster(AM)
1.每个应用程序对应一个(如spark、MR)
2.负责应用程序的管理,为应用程序向RM申请资源,分配给内部的task,然后启动或者停止task - Container:
1.封装了cpu、内存等资源的一个容器
2.相当于一个任务运行环境的抽象 - Client
对作业的指令操作(提交、查看、杀死)
YARN执行流程
首先来一张图,我们看一下yarn的整个执行流程
YARN环境搭建
参考hadoop分布式环境搭建,博主的一篇博客:基于阿里云服务器搭建hadoop集群,稍后整理资料之后开放
提交作业到YARN上执行
样例
自带的样例jar为(hadoop_home:你自己的hadoop路径):
/hadoop_home/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar
执行hadoop jar xx.jar ,然后根据提示,放入参数执行开始计算。
比如 hadoop jar xx.jar pi 2 3 (根据算法计算圆周率,后面参数:nMaps、 nSamples
如下图:
执行后,我们可以去看看我们hadoop的任务界面,新创建任务-》RUNNING》FINISHED,如果失败进入FAILED:
如果出现以下错误:Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
执行hadoop classpath,查看hadoop的class路径
vim yarn-site.xml,加入如下配置(注意查出来的路径是以:分割的,记得换成逗号):
<configuration>
<property>
<name>yarn.application.classpath</name>
<value>输入刚才返回的Hadoop classpath路径,记得以逗号分割</value>
</property>
</configuration>
计算历史
Starting Job
2019-05-04 22:34:51,217 INFO client.RMProxy: Connecting to ResourceManager at master/10.151.64.57:8032
2019-05-04 22:34:51,750 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1556980436145_0001
2019-05-04 22:34:52,286 INFO input.FileInputFormat: Total input files to process : 2
2019-05-04 22:34:53,363 INFO mapreduce.JobSubmitter: number of splits:2
2019-05-04 22:34:53,612 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1556980436145_0001
2019-05-04 22:34:53,613 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-05-04 22:34:53,825 INFO conf.Configuration: resource-types.xml not found
2019-05-04 22:34:53,825 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-05-04 22:34:54,093 INFO impl.YarnClientImpl: Submitted application application_1556980436145_0001
2019-05-04 22:34:54,131 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1556980436145_0001/
2019-05-04 22:34:54,132 INFO mapreduce.Job: Running job: job_1556980436145_0001
2019-05-04 22:35:03,428 INFO mapreduce.Job: Job job_1556980436145_0001 running in uber mode : false
2019-05-04 22:35:03,429 INFO mapreduce.Job: map 0% reduce 0%
2019-05-04 22:35:13,775 INFO mapreduce.Job: map 100% reduce 0%
2019-05-04 22:35:20,812 INFO mapreduce.Job: map 100% reduce 100%
2019-05-04 22:35:21,829 INFO mapreduce.Job: Job job_1556980436145_0001 completed successfully
2019-05-04 22:35:21,946 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=50
FILE: Number of bytes written=649437
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=522
HDFS: Number of bytes written=215
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=15019
Total time spent by all reduces in occupied slots (ms)=4844
Total time spent by all map tasks (ms)=15019
Total time spent by all reduce tasks (ms)=4844
Total vcore-milliseconds taken by all map tasks=15019
Total vcore-milliseconds taken by all reduce tasks=4844
Total megabyte-milliseconds taken by all map tasks=15379456
Total megabyte-milliseconds taken by all reduce tasks=4960256
Map-Reduce Framework
Map input records=2
Map output records=4
Map output bytes=36
Map output materialized bytes=56
Input split bytes=286
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=56
Reduce input records=4
Reduce output records=0
Spilled Records=8
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=394
CPU time spent (ms)=2200
Physical memory (bytes) snapshot=787853312
Virtual memory (bytes) snapshot=8417177600
Total committed heap usage (bytes)=655360000
Peak Map Physical memory (bytes)=294035456
Peak Map Virtual memory (bytes)=2808172544
Peak Reduce Physical memory (bytes)=209944576
Peak Reduce Virtual memory (bytes)=2808176640
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=236
File Output Format Counters
Bytes Written=97
Job Finished in 30.851 seconds
Estimated value of Pi is 4.00000000000000000000
从上面我们可以清晰的看到,我们提交了一个计算PI的作业,使用了2个map(number of splits:2)进行计算,得到最终计算结果是4.000000
这样,一段YARN的作业执行完毕。