Apache Hadoop3.2.2与Spark3.0.0环境安装

目录

基础环境说明

JDK基础环境安装

下载并解压jdk8

设置环境变量

更新环境配置

Hadoop环境安装

下载并解压Hadoop3.2.2

 设置环境变量

更新环境配置

设置Hadoop JAVA_HOME

扫描二维码关注公众号,回复: 13033520 查看本文章

Hadoop 核心配置文件设置

Hadoop hdfs核心配置start-dfs.sh和stop-dfs.sh

 Hadoop yarn核心配置start-yarn.sh和stop-yarn.sh

ssh免密登录设置

启动Hadoop

jps进程查看

Hdfs和cluster访问

关闭Hadoop

Spark环境安装

scala安装

spark安装

Spark示例程序

完整环境配置


基础环境说明

系统环境:centos8

主机名:www.boonya.cn

vi /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 www.boonya.cn boonya.cn
::1   localhost localhost.localdomain localhost6 localhost6.localdomain6


 

JDK基础环境安装

下载并解压jdk8

cd /usr/local

wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u141-b15/336fa29ff2bb4ef291e347e091f7f4a7/jdk-8u141-linux-x64.tar.gz"

tar -zxvf jdk-8u141-linux-x64.tar.gz

mv jdk1.8.0_141 jdk

设置环境变量

##Java home
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

更新环境配置

source /etc/profile

Hadoop环境安装

此篇以安装Hadoop伪分布式为例

下载并解压Hadoop3.2.2

cd /usr/local

wget https://mirrors.bfsu.edu.cn/apache/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz

tar -zxvf hadoop-3.2.2.tar.gz

 设置环境变量

#hadoop home
export HADOOP_HOME=/usr/local/hadoop-3.2.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

更新环境配置

source /etc/profile

设置Hadoop JAVA_HOME

#hadoop-env.sh
vi hadoop-env.sh

export JAVA_HOME=/usr/local/jdk

Hadoop 核心配置文件设置

#core-site.xml
        <!-- 制定HDFS的老大(NameNode)的地址 -->
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://www.boonya.cn:9000</value>
        </property>
        <!-- 指定hadoop运行时产生文件的存储目录 -->
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/usr/local/hadoop-3.2.2/tmp</value>
        </property>
        
#hdfs-site.xml
        <!-- 指定HDFS副本的数量 -->
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        
# mapred-site.xml

        <!-- 指定mr运行在yarn上 -->
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        
#yarn-site.xml
        <!-- 指定YARN的老大(ResourceManager)的地址 -->
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>www.boonya.cn</value>
        </property>
        <!-- reducer获取数据的方式 -->
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>

Hadoop hdfs核心配置start-dfs.sh和stop-dfs.sh

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

 Hadoop yarn核心配置start-yarn.sh和stop-yarn.sh

YARN_RESOURCEMANAGER_USER=root
HDFS_DATANODE_SECURE_USER=yarn
YARN_NODEMANAGER_USER=root

ssh免密登录设置

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 0600 ~/.ssh/authorized_keys

启动Hadoop

jps进程查看

Hdfs和cluster访问

Hadoop 3.0 dfs访问地址端口为9870(老版本的为50070):

注意:此处需要开启防火墙添加响应hdfs端口9870,否则外部无法访问,设置完了之后需要关闭防火墙(8088是cluster集群管理端口)。

下面是为什么8088不能被外部访问的原因:

关闭Hadoop

Spark环境安装

spark依赖scala,所以需要先安装scala.

scala安装

Scala各个版本:https://www.scala-lang.org/download/all.html

这里以2.12.13为例:

cd /usr/local

wget https://downloads.lightbend.com/scala/2.12.13/scala-2.12.13.tgz

tar -zxvf scala-2.12.13.tgz

 设置环境变量

##scala home
export SCALA_HOME=/usr/local/scala-2.12.13
export PATH=.:$SCALA_HOME/bin:$PATH

更新环境配置

source /etc/profile

spark安装

spark各个版本:https://archive.apache.org/dist/spark/

下载并解压spark

wget https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz

tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz

 设置环境变量

#spark home
export SPARK_HOME=/usr/local/spark-3.0.0-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin

 设置启动模式

$ mv spark-defaults.conf.template spark-defaults.conf
$ mv slaves.template slaves
$ mv spark-env.sh.template spark-env.sh
 
 
#修改spark-defaults.conf启用yarn模式
spark.master     yarn

设置spark JAVA_HOME,修改spark-config.sh

export JAVA_HOME=/usr/local/jdk

启动执行:sbin/start-all.sh

[root@www sbin]# start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.master.Master-1-www.boonya.cn.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-www.boonya.cn.out
[root@www sbin]# tail -f -n 200 /usr/local/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-www.boonya.cn.out
Spark Command: /usr/local/jdk/bin/java -cp /usr/local/spark-3.0.0-bin-hadoop3.2/conf/:/usr/local/spark-3.0.0-bin-hadoop3.2/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://localhost:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/02/17 14:04:18 INFO Worker: Started daemon with process name: [email protected]
21/02/17 14:04:18 INFO SignalUtils: Registered signal handler for TERM
21/02/17 14:04:18 INFO SignalUtils: Registered signal handler for HUP
21/02/17 14:04:18 INFO SignalUtils: Registered signal handler for INT
21/02/17 14:04:18 WARN Utils: Your hostname, www.boonya.cn resolves to a loopback address: 127.0.0.1; using 192.168.0.120 instead (on interface enp0s3)
21/02/17 14:04:18 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/02/17 14:04:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/02/17 14:04:20 INFO SecurityManager: Changing view acls to: root
21/02/17 14:04:20 INFO SecurityManager: Changing modify acls to: root
21/02/17 14:04:20 INFO SecurityManager: Changing view acls groups to: 
21/02/17 14:04:20 INFO SecurityManager: Changing modify acls groups to: 
21/02/17 14:04:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
21/02/17 14:04:22 INFO Utils: Successfully started service 'sparkWorker' on port 45731.
21/02/17 14:04:23 INFO Worker: Starting Spark worker 192.168.0.120:45731 with 1 cores, 1024.0 MiB RAM
21/02/17 14:04:23 INFO Worker: Running Spark version 3.0.0
21/02/17 14:04:23 INFO Worker: Spark home: /usr/local/spark-3.0.0-bin-hadoop3.2
21/02/17 14:04:23 INFO ResourceUtils: ==============================================================
21/02/17 14:04:23 INFO ResourceUtils: Resources for spark.worker:

21/02/17 14:04:23 INFO ResourceUtils: ==============================================================
21/02/17 14:04:23 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
21/02/17 14:04:23 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://192.168.0.120:8081
21/02/17 14:04:23 INFO Worker: Connecting to master localhost:7077...
21/02/17 14:04:24 INFO TransportClientFactory: Successfully created connection to localhost/127.0.0.1:7077 after 194 ms (0 ms spent in bootstraps)
21/02/17 14:04:24 INFO Worker: Successfully registered with master spark://localhost:7077

spark UI管理界面地址:http://192.168.0.120:8081 (上面日志中已经给出)

Spark示例程序

Github spark项目:https://github.com/open-micro-services/springcloud/tree/master/demo-projects/sb-spark

以上是示例程序启动完成日志,启动如果有问题请调整参数配置。

集成spark UI 4040端口服务:

完整环境配置

##Java home
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

#hadoop home
export HADOOP_HOME=/usr/local/hadoop-3.2.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

##scala home
export SCALA_HOME=/usr/local/scala-2.12.13
export PATH=.:$SCALA_HOME/bin:$PATH

#spark home
export SPARK_HOME=/usr/local/spark-3.0.0-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin

参考文章:

Centos8防火墙设置

Spark运行环境的安装

Hadoop 3.2.0 安装

猜你喜欢

转载自blog.csdn.net/boonya/article/details/113833831