使用docker安装spark2.4.3

前置说明

在安装hbase之前, 安装了hadoop, 因为hbase的数据需要存放到hdfs中
spark也与hadoop有关联, 但是要理解spark仅仅用到hadoop的库, 并不依赖hadoop程序, 它不需要安装hadoop, spark仅依赖jdk.
spark有四大集群模式: standalone, mesos, yarn, k8s
根据数据量, 确定使用最简单的standalone模式.

下载

https://www.apache.org/dyn/closer.lua/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz

docker基础镜像

FROM ubuntu:16.04
COPY sources.list /etc/apt/
RUN apt update
RUN apt install -y vim tzdata
RUN rm /etc/localtime && ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
ENV TZ="Asia/Shanghai"

WORKDIR /
COPY jdk1.8.0_171 /jdk1.8.0_171
ENV JAVA_HOME=/jdk1.8.0_171
ENV PATH=$PATH:/jdk1.8.0_171/bin
RUN ln -s /jdk1.8.0_171/bin/java /usr/bin/java

安装spark

WORKDIR /spark
COPY spark-2.4.3-bin-hadoop2.7 .
ENV SPARK_HOME=/spark
ENV PATH=$PATH:/spark/bin

配置spark相关端口

mkdir -p /home/mo/sjfx-spark-data
cp spark-2.4.3-bin-hadoop2.7/conf -r /home/mo/sjfx-spark-data/config
mv /home/mo/sjfx-spark-data/config/spark-env.sh.template /home/mo/sjfx-spark-data/config/spark-env.sh

修改spark-env.sh, 增加

export SPARK_MASTER_PORT=5030
export SPARK_MASTER_WEBUI_PORT=5040
export SPARK_WORKER_PORT=5031
export SPARK_WORKER_WEBUI_PORT=5041

启动master

#/bin/sh
docker stop sjfxspark-master
docker rm sjfxspark-master
docker run -d --name sjfxspark-master --net=host \
  -v /home/mo/sjfx-spark-data/config:/spark/conf  \
  -v /home/mo/sjfx-spark-data/logs:/spark/logs  \
  -v /home/mo/sjfx-spark-data/work:/spark/work  \
  sjfxspark:v1 sh -c "/spark/sbin/start-master.sh && tail -f /dev/null"

可以查看web ui有没有显示了: http://192.168.1.26:5040

启动slave

#/bin/sh
docker stop sjfxspark-slave
docker rm sjfxspark-slave
docker run -d --name sjfxspark-slave --net=host \
  -v /home/mo/sjfx-spark-data/config:/spark/conf  \
  -v /home/mo/sjfx-spark-data/logs:/spark/logs  \
  -v /home/mo/sjfx-spark-data/work:/spark/work  \
  sjfxspark:v1 sh -c "/spark/sbin/start-slave.sh spark://192.168.1.26:5030 && tail -f /dev/null"

查看web ui : http://192.168.1.26:5041/
再次查看master web ui , 发现已经有work信息了

12820326-14cf0794c0a49edf.png
image.png

测试

./spark-2.4.3-bin-hadoop2.7/bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://192.168.1.26:5030 ./spark-2.4.3-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.3.jar 100

可以在终端上看到输出:
2019-06-06 11:34:56 INFO DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 3.886408 s
Pi is roughly 3.1414487141448713

转载于:https://www.jianshu.com/p/970b12b23eca

猜你喜欢

转载自blog.csdn.net/weixin_33895657/article/details/91162512