Spark 02 安装配置(环境搭建)、编译

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/lihaogn/article/details/82110344

1 环境搭建

1)下载解压软件包

  • 第一种方式:下载可执行tar包,直接解压
  • 第二种方式:下载源码包,编译后解压

2)配置环境变量

1.1 local模式

1)启动spark-shell

spark-shell --master local[2]
1.2 standalone模式

Spark Standalone模式的架构和Hadoop HDFS/YARN很类似:1 master + n worker

1)修改spark-2.1.0-bin-2.6.0-cdh5.7.0/conf/spark-env.sh

# 在最后进行配置
SPARK_MASTER_HOST=hadoop000
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1
# 配置解释
SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
SPARK_WORKER_CORES, to set the number of cores to use on this machine
SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
SPARK_WORKER_INSTANCES, to set the number of worker processes per node

2)启动

spark-2.1.0-bin-2.6.0-cdh5.7.0/sbin/start-all.sh
spark-shell --master spark://hadoop000:7077

2 编译

1)前提条件

软件条件:

The Maven-based build is the build of reference for Apache Spark. Building Spark using Maven requires Maven 3.3.9 or newer and Java 8+. Note that support for Java 7 was removed as of Spark 2.2.0.

设置:

You’ll need to configure Maven to use more memory than usual by setting MAVEN_OPTS:
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"

需要安装软件,并配置好环境变量:

  • jdk 8+
  • maven 3.3.9+
  • hadoop-2.6.0-cdh5.7.0.tar.gz
  • Scala-2.11.8

2)编译

修改文件 spark-2.2.0/dev/make-distribution.sh,将原文中的版本信息注释掉。然后在下方重写自己的软件版本。

# VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null | grep -v "INFO" | tail -n 1)
# SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | tail -n 1)
# SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | tail -n 1)
# SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | fgrep --count "<id>hive</id>";\
#     # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
#     # because we use "set -o pipefail"
#     echo -n)
VERSION=2.2.0
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1

修改文件 spark-2.2.0/pom.xml

# 添加如下
<repository>
  <id>cloudera</id>
  <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>

<repository>  
    <id>alimaven</id>  
    <name>aliyun maven</name>  
    <url>http://maven.aliyun.com/nexus/content/groups/public/</url>  
</repository>

编译命令

./dev/make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -Phive -Phive-thriftserver -Pyarn

等待编译结束

猜你喜欢

转载自blog.csdn.net/lihaogn/article/details/82110344
今日推荐