SPARK安装三:SPARK集群部署

使用2.3.0版本,因为公司生产环境是这个版本

一、下载安装

cd /opt
wget https://archive.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz
tar -xzvf spark-2.3.0-bin-hadoop2.7.tgz
rm -rf spark-2.3.0-bin-hadoop2.7.tgz


二、配置文件
spark相对于hadoop配置文件和配置项目都比较少,但是spark有5中运行模式,每种模式对应的配置和情况都不一样所以spark的重点是深入了解spark的5中运行模式

配置文件在$SPARK_HOME/conf下,需要配置3个文件

1.spark-env.sh

cp spark-env.sh.template spark-env.sh
vi spark-env.sh

编辑

export JAVA_HOME=/opt/jdk1.8.0_181
export HADOOP_CONF_DIR=/opt/hadoop-2.7.6/etc/hadoop
export YARN_CONF_DIR=/opt/hadoop-2.7.6/etc/hadoop
export SPARK_HOME=/opt/spark-2.3.0-bin-hadoop2.7
export SPARK_MASTER_HOST=pangu10
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://pangu10:9000/spark/log"
View Code

2.slaves

cp slaves.template slaves
vi slaves

编辑

pangu10
pangu11
pangu12
View Code

说明:如果是yarn模式,hadoop配置了slaves文件之后,spark就不需要配置了

3、spark-defaults.conf
HistoryServer用来查看SPARK运行时的计算过程

cp spark-defaults.conf.template spark-defaults.conf 
vi spark-defaults.conf

编辑

spark.master spark://pangu10:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://pangu10:9000/spark/log
spark.history.fs.logDirectory hdfs://pangu10:9000/spark/log
View Code

创建spark日志目录

hadoop fs -mkdir /spark
hadoop fs -mkdir /spark/log

四、环境变量

设置/etc/profile

export JAVA_HOME=/opt/jdk1.8.0_181
export SCALA_HOME=/opt/scala-2.12.6
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/opt/hadoop-2.7.6
export SPARK_HOME=/opt/spark-2.3.0-bin-hadoop2.7

export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin

猜你喜欢

转载自www.cnblogs.com/Netsharp/p/9781155.html