1: Download the source code
2: Decompress spark-2.2.1.tgz
3: Configure the environment:
It means that the maven version must be at least 3.3.9, jdk 1.8 +
My environment:
jdk1.8.0
maven 3.3.9
scala 2.11
4: Enter the spark source directory and modify pom.xml
Add repositorie that supports CDH
<repository>
<id>cloudera-releases</id>
<name>cdh</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
5: Execute compilation
mvn -PCDH -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -DskipTests clean package
6 Compile and package:
1: Modify the spark version number in dev/make-distribution.sh scala version number, hadoop version number, activate hive
VERSION=2.2.1
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1
(The purpose of modifying this parameter is to speed up the packaging process)
Enter the spark-2.2.1 directory and execute
./dev/make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Phive -Phive-thriftserver -Pyarn - Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0
Successfully packaged after a long wait