http://spark.apache.org/
1.下载文件并上传
spark-2.2.0-bin-hadoop2.7.tgz
解压:tar -zvxf spark-2.2.0-bin-hadoop2.7.tgz
2.准备4台机器
bigdata01,bigdata02,bigdata03,bigdata04
Master:bigdata01,bigdata02
Worker:bigdata01,bigdata02,bigdata03,bigdata04
3.修改配置文件
/root/training/spark-2.2.0-bin-hadoop2.7/conf
3.1 修改spark-env.sh,基本配置
mv spark-env.sh.template spark-env.sh
vim spark-env.sh
选择standalone模式
Options for the daemons used in the standalone deploy mode
export JAVA_HOME=export JAVA_HOME=/root/training/jdk1.8.0_144/( 可以使用改命令:r!which java)
export SPARK_MASTER_HOST=bigdata01
export SPARK_MASTER_PORT=7077
3.2 修改slaves,具体执行任务的节点
mv slaves.template slaves
vim slaves
bigdata01
bigdata02
bigdata03
bigdata04
3.3拷贝到其他机器
for i in {2..4};
do scp -r /root/training/spark-2.2.0-bin-hadoop2.7/ bigdata0$i:$PWD ;
done
for i in {2..4};do scp -r /root/training/spark-2.2.0-bin-hadoop2.7/ bigdata0$i:$PWD ; done
4. 启动shell,最好使用单独shell脚本(start-master.sh和start-slave.sh),本文只是简单搭建直接启动start-all.sh
如果没有免密码登录,配置一下免密码登录,否则每启动一台都需要输入密码
cd /root/training/spark-2.2.0-bin-hadoop2.7
sbin/start-all.sh
jps
只有01同时存在Master Worker,其他机器都为Worker
5.浏览器查看spark集群
http://bigdata01:8080/ (netty)
URL: spark://bigdata01:7077
REST URL: spark://bigdata01:6066 (cluster mode)
Alive Workers: 4
Cores in use: 4 Total, 0 Used 线程的数量
Memory in use: 4.0 GB Total, 0.0 B Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE
192.168.111.103:44524
44524:work和master通讯的端口