Spark集群环境搭建( Spark On YARN模式)

Spark On YARN模式的搭建比较简单,仅仅需要在YARN集群的”一个节点”上安装Spark即可,

该节点可作为提交Spark应用程序到YARN集群的客户端。Spark本身的Master节点和Worker节点不需要启动,Spark On Yarn集群的部署不依赖Standalone集群。

一:下载scala安装包

下载地址如下:
https://www.scala-lang.org/download/2.12.7.html

 

                                                      

执行以下命令安装scala

mkdir -p /home/hadoop/scala

解压scala-2.12.7.tgz安装包到安装目录scala

tar -zxvf  ~/tools/scala-2.12.7.tgz  -C  /home/hadoop/scala/

配置scala的环境变量

vi ~/.bash_profile

# scala

export SCALA_HOME=/home/hadoop/scala/scala-2.12.7

export PATH=$PATH:$SCALA_HOME/bin

 

source ~/.bash_profile

在任意目录执行: scala -version

Scala code runner version 2.12.7 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.

在任意目录执行scala进入命令行模式

 

scala> val str:String="yanghong"

str: String = yanghong

 

二:下载spark

下载地址如下:

http://spark.apache.org/downloads.html

在 Choose a Spark release 中选择自己的版本2.4.5

在 Choose a package type 中选择2.7 and later

点击 Download Spark 后面的tgz文件下载即可

 

                                                         

执行如下命令安装spark

mkdir -p /home/hadoop/spark

解压安装包spark-2.4.5-bin-hadoop2.7.tgz到~/spark安装目录

tar -zxvf spark-2.4.5-bin-hadoop2.7.tgz -C /home/hadoop/spark/

 

修改 spark-env.sh配置文件,添加如下的配置

export HADOOP_HOME=/home/hadoop/hadoop-ha/hadoop/hadoop-2.8.5

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

 

修改完毕后,即可运行Spark应用程序。例如,运行Spark自带的圆周率的例子(注意请求是Hadoop的HDFS和YARN启动),并且以Spark On YARN的cluster模式运行,命令如下:

bin/spark-submit \

--class org.apache.spark.examples.SparkPi \

--master yarn \

--deploy-mode cluster \

/home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.5.jar

日志信息如下

20/04/08 14:05:36 INFO yarn.Client: Submitting application application_1586223718291_0006 to ResourceManager

20/04/08 14:05:37 INFO impl.YarnClientImpl: Submitted application application_1586223718291_0006

20/04/08 14:05:38 INFO yarn.Client: Application report for application_1586223718291_0006 (state: ACCEPTED)

20/04/08 14:05:38 INFO yarn.Client:

 client token: N/A

 diagnostics: AM container is launched, waiting for AM container to Register with RM

 ApplicationMaster host: N/A

 ApplicationMaster RPC port: -1

 queue: default

 start time: 1586325937194

 final status: UNDEFINED

 tracking URL: http://centoshadoop1:8088/proxy/application_1586223718291_0006/

 user: hadoop

(state: ACCEPTED)

20/04/08 14:05:58 INFO yarn.Client: Application report for application_1586223718291_0006 (state: ACCEPTED)

20/04/08 14:05:59 INFO yarn.Client: Application report for application_1586223718291_0006

20/04/08 14:06:05 INFO yarn.Client:

 client token: N/A

 diagnostics: N/A

 ApplicationMaster host: centoshadoop3

 ApplicationMaster RPC port: 33564

 queue: default

 start time: 1586325937194

 final status: UNDEFINED

 tracking URL: http://centoshadoop1:8088/proxy/application_1586223718291_0006/

 user: hadoop

20/04/08 14:06:06 INFO yarn.Client: Application report for application_1586223718291_0006 (state: RUNNING)

20/04/08 14:06:07 INFO yarn.Client: Application report for application_1586223718291_0006 (state: RUNNING)

20/04/08 14:06:26 INFO yarn.Client: Application report for application_1586223718291_0006 (state: RUNNING)

20/04/08 14:06:27 INFO yarn.Client: Application report for application_1586223718291_0006 (state: FINISHED)

20/04/08 14:06:27 INFO yarn.Client:

 client token: N/A

 diagnostics: N/A

 ApplicationMaster host: centoshadoop3

 ApplicationMaster RPC port: 33564

 queue: default

 start time: 1586325937194

 final status: SUCCEEDED

 tracking URL: http://centoshadoop1:8088/proxy/application_1586223718291_0006/

 user: hadoop

20/04/08 14:06:27 INFO util.ShutdownHookManager: Shutdown hook called

20/04/08 14:06:27 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-299a795c-6432-47b3-86ec-c571ed324c58

20/04/08 14:06:27 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-6b89e823-ba7f-49b2-ae7b-2ead7a75691e

 

看看final status状态的变化,最终为succeeded 成功,可在该界面查看具体的执行计划

 

 

 

发布了74 篇原创文章 · 获赞 4 · 访问量 3197

猜你喜欢

转载自blog.csdn.net/u014635374/article/details/105386795