Compile Spark from source code, Hadoop 2.6.0-cdh5.7.0 version

1: Download the source code





2: Decompress spark-2.2.1.tgz

3: Configure the environment:


It means that the maven version must be at least 3.3.9, jdk 1.8 +

My environment:

  jdk1.8.0

 maven 3.3.9

scala 2.11

4: Enter the spark source directory and modify pom.xml

Add repositorie that supports CDH  

 
      <repository>
        <id>cloudera-releases</id>

<name>cdh</name>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
       
      </repository>
   
      
      
     


5: Execute compilation


mvn  -PCDH -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -DskipTests clean package




6 Compile and package:


1: Modify the spark version number in dev/make-distribution.sh scala version number, hadoop version number, activate hive 

VERSION=2.2.1
SCALA_VERSION=2.11    
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1
(The purpose of modifying this parameter is to speed up the packaging process)

Enter the spark-2.2.1 directory and execute
./dev/make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Phive -Phive-thriftserver -Pyarn - Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0

Successfully packaged after a long wait


Guess you like

Origin blog.csdn.net/cxkaa502401673/article/details/79035869