Hadoop 2.9.0 集群上安装Spark 2.4.0

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接: https://blog.csdn.net/daoxu_hjl/article/details/86437162

(转)参考: https://www.cnblogs.com/NextNight/p/6703362.html -->Nice

已有环境说明:

已安装hadoop 2.9.0 集群(安装过程见历史blog)
在这里插入图片描述
在这里插入图片描述

主节点操作:

1 安装Scala

1.1 安装包下载

Note: Starting version 2.0, Spark is built with Scala 2.11 by default.
Scala 2.10 users should download the Spark source package and build
with Scala 2.10 support.

因Spark 对Scala存在版本要求,安装Spark 2.4.0 下载 Scala 2.11即可

下载地址:https://www.scala-lang.org/download/2.11.12.html
选择网页下面的tgz下载

在这里插入图片描述

1.2 上传安装

上传至目录:/opt/nfs_share/software
创建scala 目录:mkdir -p /opt/scala
解压:
cd /opt/scala
tar -zxvf /opt/nfs_share/software/scala-2.11.12.tgz

1.3 配置环境变量

vim ~/.bash_profile
#末尾添加
#scala environment
export SCALA_HOME=/opt/scala/scala-2.11.12
export PATH=$PATH:$SCALA_HOME/bin

立即生效: source ~/.bash_profile

1.4 验证

scala -version

在这里插入图片描述

2 安装Spark

2.1 安装包下载

版本说明:http://spark.apache.org/docs/latest/index.html
Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.0
uses Scala 2.11. You will need to use a compatible Scala version
(2.11.x).

Note that support for Java 7, Python 2.6 and old Hadoop versions before 2.6.5 were removed as of Spark 2.2.0.
Support for Scala 2.10 was removed as of 2.3.0.
官网链接:http://spark.apache.org/downloads.html

镜像下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/spark/

下载:spark-2.4.0-bin-without-hadoop.tgz -->对任何hadoop版本都兼容

2.2 上传安装

上传至: /opt/nfs_share/software
创建安装目录:mkdir -p /opt/spark
解压:
cd /opt/spark
tar -zxvf /opt/nfs_share/software/spark-2.4.0-bin-without-hadoop.tgz

2.3 配置环境变量

vim ~/.bash_profile
#追加:
#spark environment 
export SPARK_HOME=/opt/spark/spark-2.4.0-bin-without-hadoop
export PATH=$PATH:$SPARK_HOME/bin

因为 S P A R K H O M E / s b i n SPARK_HOME/sbin目录下一些文件名和 HADOOP_HOME/bin目录下的文件名冲突,此处不添加

source ~/.bash_profile

2.4 修改Spark配置文件

2.4.1 spark-env.sh:

spark执行任务的环境配置,需要根据自己的机器配置来设置,内存和核心数配置的时候主要不要超出虚拟机的配置,尤其是存在默认值的配置需要仔细查看,修改
cd $SPARK_HOME/conf
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export SPARK_DIST_CLASSPATH=$(/opt/hadoop/hadoop-2.9.0/bin/hadoop classpath)
#----config
SPARK_LOCAL_DIRS=/opt/spark/spark-2.4.0-bin-without-hadoop/local #配置spark的local目录
SPARK_MASTER_IP=hdp-01 #master节点ip或hostname
SPARK_MASTER_WEBUI_PORT=8085 #web页面端口
#export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=4" #spark-shell启动使用核数
SPARK_WORKER_CORES=1 #Worker的cpu核数
SPARK_WORKER_MEMORY=512m #worker内存大小
SPARK_WORKER_DIR=/opt/spark/spark-2.4.0-bin-without-hadoop/worker #worker目录
SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.appDataTtl=604800" #worker自动清理及清理时间间隔
SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://hdp-01:9000/spark/history" #history server页面端口>、备份数、log日志在HDFS的位置
SPARK_LOG_DIR=/opt/spark/spark-2.4.0-bin-without-hadoop/logs #配置Spark的log日志
JAVA_HOME=/opt/java/jdk1.8.0_191 #配置java路径
SCALA_HOME=/opt/scala/scala-2.11.12 #配置scala路径
HADOOP_HOME=/opt/hadoop/hadoop-2.9.0/lib/native #配置hadoop的lib路径
HADOOP_CONF_DIR=/opt/hadoop/hadoop-2.9.0/etc/hadoop/ #配置hadoop的配置路径

2.4.2 spark-defaults.conf

cp spark-defaults.conf.template spark-defaults.conf
vim spark-defaults.conf
#设置
spark.master                     spark://hdp-01:7077
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://hdp-01:9000/spark/history
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.driver.memory              524m
spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

2.4.3 slaves

配置worker节点

cp slaves.template slaves
vim slaves
#设置
hdp-01
hdp-03
hdp-04

3 分发文件到其他节点

scala文件夹:

scp -r /opt/scala/ hadoop@hdp-03:/opt/
scp -r /opt/scala/ hadoop@hdp-04:/opt/

spark文件夹:

scp -r /opt/spark/ hadoop@hdp-03:/opt/
scp -r /opt/spark/ hadoop@hdp-04:/opt/

分发环境变量配置文件:

scp ~/.bash_profile hadoop@hdp-03:~/.bash_profile
scp ~/.bash_profile hadoop@hdp-04:~/.bash_profile 

ssh hdp-03
source ~/.bash_profile 
ssh hdp-04
source ~/.bash_profile 

4 启动服务

启动

cd $SPARK_HOME
sbin/start-all.sh 

查看服务

页面查看
访问192.168.1.126:8085(8085端口是设置在spark-env.sh中的SPARK_MASTER_WEBUI_PORT,可自行设置),结果如下:则说明成功了

spark示例运行: ./bin/run-example SparkPi 2>&1 | grep “Pi is roughly” 计算圆周率

猜你喜欢

转载自blog.csdn.net/daoxu_hjl/article/details/86437162
今日推荐