大数据系列(一)hadoop生态圈基础知识后续之YARN

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/zl592886931/article/details/89792780

YARN历史背景与概述(XXX on YARN)

在Hadoop(MapReduce)1.x时代,因为架构原因,容易出现单节点故障、节点压力大不容易扩展,随着集群扩大,客户端越来越多,JobTracker显然无法更好支撑,并且当有更多的大数据框架集群(如spark等),这些集群之间就会存在无法共享资源,并且集群利用率低(多集群,无法掌控让集群均衡且不间断的工作),当集群过多时,运维也变得困难起来。在这样的背景下,yarn诞生了。让所有的集群运作在一个平台上,各个集群可以实现资源共享,提升使用率。
架构图如下(关于MapReduce会在后续文章进行介绍):
在这里插入图片描述
YARN诞生与Hadoop2.x版本,它是一个统一的运作平台(资源调度平台),在它之前,各大数据集群框架只运行在自己的环境之上,有了它,大家可以在同一个作坊工作,这样资源协调和共享变得简单起来,如下图:
在这里插入图片描述

YARN架构

首先来一张yarn的架构图:
在这里插入图片描述

  • Resource Manager(RM):
    1.为整个集群提供服务(一主一备,但同一时间提供服务只有一个),负责集群资源的统一管理和调度
    2.处理客户端请求:提交作业、杀死作业
    3.监控NM,如果一旦一个NM挂了,会将该NM的运行任务告诉AM
  • NodeManager(NM)
    1.整个集群有多个NM,每个负责本身的节点资源管理和使用
    2.定时向RM汇报自身的节点资源使用情况以及健康状况
    3.需要处理RM、AM的指令
  • ApplicationMaster(AM)
    1.每个应用程序对应一个(如spark、MR)
    2.负责应用程序的管理,为应用程序向RM申请资源,分配给内部的task,然后启动或者停止task
  • Container:
    1.封装了cpu、内存等资源的一个容器
    2.相当于一个任务运行环境的抽象
  • Client
    对作业的指令操作(提交、查看、杀死)

YARN执行流程

首先来一张图,我们看一下yarn的整个执行流程
在这里插入图片描述

YARN环境搭建

参考hadoop分布式环境搭建,博主的一篇博客:基于阿里云服务器搭建hadoop集群,稍后整理资料之后开放

提交作业到YARN上执行

样例

自带的样例jar为(hadoop_home:你自己的hadoop路径):

/hadoop_home/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar

执行hadoop jar xx.jar ,然后根据提示,放入参数执行开始计算。

比如 hadoop jar xx.jar pi 2 3 (根据算法计算圆周率,后面参数:nMaps、 nSamples

如下图:
在这里插入图片描述
执行后,我们可以去看看我们hadoop的任务界面,新创建任务-》RUNNING》FINISHED,如果失败进入FAILED:
在这里插入图片描述

如果出现以下错误:Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

执行hadoop classpath,查看hadoop的class路径
vim yarn-site.xml,加入如下配置(注意查出来的路径是以:分割的,记得换成逗号):

<configuration>
    <property>
        <name>yarn.application.classpath</name>
        <value>输入刚才返回的Hadoop classpath路径,记得以逗号分割</value>
    </property>
</configuration>

计算历史

Starting Job
2019-05-04 22:34:51,217 INFO client.RMProxy: Connecting to ResourceManager at master/10.151.64.57:8032
2019-05-04 22:34:51,750 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1556980436145_0001
2019-05-04 22:34:52,286 INFO input.FileInputFormat: Total input files to process : 2
2019-05-04 22:34:53,363 INFO mapreduce.JobSubmitter: number of splits:2
2019-05-04 22:34:53,612 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1556980436145_0001
2019-05-04 22:34:53,613 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-05-04 22:34:53,825 INFO conf.Configuration: resource-types.xml not found
2019-05-04 22:34:53,825 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-05-04 22:34:54,093 INFO impl.YarnClientImpl: Submitted application application_1556980436145_0001
2019-05-04 22:34:54,131 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1556980436145_0001/
2019-05-04 22:34:54,132 INFO mapreduce.Job: Running job: job_1556980436145_0001
2019-05-04 22:35:03,428 INFO mapreduce.Job: Job job_1556980436145_0001 running in uber mode : false
2019-05-04 22:35:03,429 INFO mapreduce.Job:  map 0% reduce 0%
2019-05-04 22:35:13,775 INFO mapreduce.Job:  map 100% reduce 0%
2019-05-04 22:35:20,812 INFO mapreduce.Job:  map 100% reduce 100%
2019-05-04 22:35:21,829 INFO mapreduce.Job: Job job_1556980436145_0001 completed successfully
2019-05-04 22:35:21,946 INFO mapreduce.Job: Counters: 53
	File System Counters
		FILE: Number of bytes read=50
		FILE: Number of bytes written=649437
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=522
		HDFS: Number of bytes written=215
		HDFS: Number of read operations=13
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Job Counters 
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=15019
		Total time spent by all reduces in occupied slots (ms)=4844
		Total time spent by all map tasks (ms)=15019
		Total time spent by all reduce tasks (ms)=4844
		Total vcore-milliseconds taken by all map tasks=15019
		Total vcore-milliseconds taken by all reduce tasks=4844
		Total megabyte-milliseconds taken by all map tasks=15379456
		Total megabyte-milliseconds taken by all reduce tasks=4960256
	Map-Reduce Framework
		Map input records=2
		Map output records=4
		Map output bytes=36
		Map output materialized bytes=56
		Input split bytes=286
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=56
		Reduce input records=4
		Reduce output records=0
		Spilled Records=8
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=394
		CPU time spent (ms)=2200
		Physical memory (bytes) snapshot=787853312
		Virtual memory (bytes) snapshot=8417177600
		Total committed heap usage (bytes)=655360000
		Peak Map Physical memory (bytes)=294035456
		Peak Map Virtual memory (bytes)=2808172544
		Peak Reduce Physical memory (bytes)=209944576
		Peak Reduce Virtual memory (bytes)=2808176640
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=236
	File Output Format Counters 
		Bytes Written=97
Job Finished in 30.851 seconds
Estimated value of Pi is 4.00000000000000000000

从上面我们可以清晰的看到,我们提交了一个计算PI的作业,使用了2个map(number of splits:2)进行计算,得到最终计算结果是4.000000
这样,一段YARN的作业执行完毕。

猜你喜欢

转载自blog.csdn.net/zl592886931/article/details/89792780
今日推荐