hadoop&spark集群搭建

1.准备工作

1.1 jdk下载安装

1.2 官网下载:

scala-2.10.4.tgz(支持spark)

hadoop-2.6.0.tar.gz

spark-1.6.0-bin-hadoop2.6.tgz

1.3 准备三台虚拟机

centos6.3

地址:172.16.100.01,172.16.100.02,172.16.100.03,新建用户:

useradd cluster

passwd cluster

修改三台机器的hosts,添加内容:

[root@master home]# vim /etc/hosts
172.16.100.01 master
172.16.100.02 slave1
172.16.100.03 slave2


2.实现ssh无密登录

ssh-keygen -t rsa(一路回车,图形输出表示成功)

cd /home/cluster/.ssh下面多出两个文件:

    私钥文件:id_raa

    公钥文件:id_rsa.pub

将三台虚拟机的公钥id_rsa_pub的内容放到authorized_key中:

在/home/cluster/.shh目录下执行:

    cat id_rsa.put >> authorized_keys

将authorized_keys放到另外两台虚拟机下执行相同命令,最后将存入了三台虚拟机公钥的authorized_keys文件存入到三台虚拟机中。

修改三台虚拟机的authorized_keys文件权限,chmod 644 authorized_keys

测试ssh之间是否互通,(相互测试是否通的,很重要)

# ssh 172.16.100.02

3.hadoop集群搭建

先在master主机上配置

1)把下载的hadoop-2.6.0.tar.gz解压到hadoop目录下

2)建立目录:

mkdir -p /home/cluster/hadoop/{pids,storage}
mkdir -p /home/cluster/hadoop/storage/{hdfs,tmp}
mkdir -p /home/cluster/hadoop/storage/hdfs/{name,data}

3)配置环境变量:vim /etc/profile (也可以修改当前用户的环境变量/home/cluster/.bashrc)

export HADOOP_HOME=/home/cluster/hadoop/hadoop-2.6.0  
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH  
source /etc/profile

4)修改配置文件core-site.xml

<configuration>  
        <property>  
                <name>hadoop.tmp.dir</name>  
                <value>file:/home/cluster/hadoop/storage/tmp</value>  
        </property>  
        <property>  
                <name>fs.defaultFS</name>  
                <value>hdfs://master:9000</value>  
        </property>  
        <property>  
                <name>io.file.buffer.size</name>  
                <value>131072</value>  
        </property>
		<property>  
                <name>hadoop.proxyuser.spark.hosts</name>  
                <value>*</value>  
        </property>
		<property>  
                <name>hadoop.proxyuser.spark.groups</name>  
                <value>*</value>  
        </property>
		<property>  
                <name>hadoop.native.lib</name>  
                <value>true</value>  
        </property>
<configuration>  
5)修改配置文件hdfs-site.xml
<configuration>  
		  <property>  
                <name>dfs.namenode.secondary.http-address</name>  
                <value>master:9001</value>  
        </property> 
        <property>  
                <name>dfs.namenode.name.dir</name>  
                <value>file:/home/cluster/hadoop/storage/hdfs/name</value>  
        </property>  
        <property>  
                <name>dfs.datanode.data.dir</name>  
                <value>file:/home/cluster/hadoop/storage/hdfs/data</value>  
        </property>  
        <property>  
                <name>dfs.replication</name>  
                <value>2</value>  
        </property>   
	    <property>  
	        <name>dfs.webhdfs.enabled</name>  
	        <value>true</value>  
	    </property>  
</configuration> 
6)修改配置文件mapred-site.xml
<configuration>  
        <property>  
                <name>mapreduce.framework.name</name>  
                <value>yarn</value>  
                <final>true</final>  
        </property>  
  
    <property>  
        <name>mapreduce.jobtracker.http.address</name>  
        <value>master:50030</value>  
    </property>  
    <property>  
        <name>mapreduce.jobhistory.address</name>  
        <value>master:10020</value>  
    </property>  
    <property>  
        <name>mapreduce.jobhistory.webapp.address</name>  
        <value>master:19888</value>  
    </property>  
        <property>  
                <name>mapred.job.tracker</name>  
                <value>http://master:9001</value>  
        </property>  
</configuration> 
7)修改配置文件yarn-site.xml
<configuration>  
 
        <property>  
                <name>yarn.resourcemanager.hostname</name>  
                <value> master</value>  
        </property>  
  
    <property>  
        <name>yarn.nodemanager.aux-services</name>  
        <value>mapreduce_shuffle</value>  
    </property>  
	 <property>  
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>  
    </property> 
    <property>  
        <name>yarn.resourcemanager.address</name>  
        <value> master:8032</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.scheduler.address</name>  
        <value> master:8030</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.resource-tracker.address</name>  
        <value> master:8031</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.admin.address</name>  
        <value> master:8033</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.webapp.address</name>  
        <value> master:8088</value>  
    </property>  
</configuration>  
8)vim  hadoop-env.sh  和yarn-env.sh  在开头添加如下环境变量(一定要添加切勿少了)

    export JAVA_HOME=jdk路径

9)配置slave

master
slave1
slave2
10)将整个hadoop文件夹复制到其他两台虚拟机下 scp命令:
scp -r hadoop 127.16.100.02:/home/cluster
scp -r hadoop 127.16.100.03:/home/cluster
并配置环境变量,关闭防火墙  service iptables stop  

11)验证

在hadoop目录下执行
bin/hdfs namenode –format (只执行一次)
sbin/start-dfs.sh     #启动HDFS
sbin/stop-dfs.sh     #停止HDFS
sbin/start-all.sh或者stop-all.sh
jps命令验证


HDFS管理页面http://10.10.4.124:50070

4.scala安装

1)解压
将下载的scala-2.10.4.tgz解压在/home/cluster/scala中
2)添加环境变量,并运行source /etc/profile使之生效

3)验证,直接输入scala

4)其他两台虚拟机重复操作

5.spark安装

1)解压

将解压的spark-1.6.0-bin-hadoop2.6.tgz解压在/home/cluster/spark中(tar命令)

修改名字:mv spark-1.6.0-bin-hadoop2.6 spark

2)添加环境变量
export SPARK_HOME=/home/cluster/spark/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

3)修改/home/cluster/spark/spark/conf目录下:spark-env.sh

mv spark-env.sh.template spark-env.sh

添加环境变量:



4) mv log4j.properties.template log4j.properties

5)mv slaves.template  slaves

编辑内容:

master
slave01
slave02

6)修改相关目录权限,否则无法启动

chmod -R 777 spark

7)其他两台虚拟机重复操作

8)启动spark和关闭集群

/home/cluster/spark/spark/sbin/start-all.sh

/home/cluster/spark/spark/sbin/stop-all.sh

jps命令查看启动进程情况

监控页面:主机ip:8080



 
 























猜你喜欢

转载自blog.csdn.net/weixin_41555736/article/details/80605534