CDH4.4-MRV1 HA 安装手册

 

编写不易,转载请注明(http://shihlei.iteye.com/blog/2066627)!

一 概述

 

    公司使用CDH4的环境,Job运行时环境选择的是MRV1,网络上搭建CDH4.4 HDFS ,MRV1 HA环境的资料非常少。尝试搭建,并将过程记录于《Hadoop_CDH4.4.0_MRV1_CDH4.2.2_安装手册_v0.2》;

 

二 规划

 

环境:

 

组件名

版本

说明

JRE

java version "1.7.0_25"

Java(TM) SE Runtime Environment (build 1.7.0_25-b15)

 

Hadoop

hadoop-2.0.0-cdh4.4.0.tar.gz

主程序包

(http://archive.cloudera.com/cdh4/cdh/4/)

MRV1

mr1-2.0.0-mr1-cdh4.2.2.tar.gz

对应版本:hadoop-2.0.0-mr1-cdh4.2.2

MRV1包

Zookeeper

zookeeper-3.4.5.tar.gz

NameNode,JobTracker HA自动热切使用的协调服务

 

主机:

 

IP

Host

部署模块

进程

8.8.8.11

Hadoop-NN-01

NameNode

JobTracker

NameNode

DFSZKFailoverController

JobTrackerHADaemon

MRZKFailoverController

8.8.8.12

Hadoop-NN-02

NameNode

JobTracker

NameNode

DFSZKFailoverController

JobTrackerHADaemon

MRZKFailoverController

8.8.8.13

Hadoop-DN-01

Zookeeper-01

DataNode

TaskTracker

Zookeeper

DataNode

TaskTracker

JournalNode

QuorumPeerMain

8.8.8.14

Hadoop-DN-02

Zookeeper-02

DataNode

TaskTracker

Zookeeper

DataNode

TaskTracker

JournalNode

QuorumPeerMain

8.8.8.15

Hadoop-DN-03

Zookeeper-03

DataNode

TaskTracker

Zookeeper

DataNode

TaskTracker

JournalNode

QuorumPeerMain

 

备注:

  • NameNode
  • JobTracker
  • DFSZKFC:DFS Zookeeper Failover Controller 激活Standby NameNode
  • MRZKFC:MR Zookeeper Failover Controller 激活 Standby JobTracker
  • DataNode
  • TaskTracker
  • JournalNode:NameNode共享editlog结点服务(如果使用NFS共享,则该进程和所有启动相关配置接可省略)。
  • QuorumPeerMain:Zookeeper主进程

目录:

 

名称

路径

$HADOOP_HOME

(MRV1)

/home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2

MRV1 Data

$ HADOOP_HOME/data

MRV1 Log

$ HADOOP_HOME/logs

$HADOOP_PREFIX

HDFS

/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0

HDFS Data

$ HADOOP_HOME/data

HDFS Log

$ HADOOP_HOME/logs

 

三 详细安装

 

(一) 环境准备

 

1)关闭防火墙:service iptables stop

 

2)安装JRE:略

 

3)安装Zookeeper:(见附件)

 

4)配置Host文件(root用户):vi /etc/hosts

 

内容:

8.8.8.11 Hadoop-NN-01

8.8.8.12 Hadoop-NN-02

8.8.8.13 Hadoop-DN-01 Zookeeper-01

8.8.8.14 Hadoop-DN-02 Zookeeper-02

8.8.8.15 Hadoop-DN-03 Zookeeper-03

 
5)配置SSH互信:ssh-keygen
 分发:

ssh-copy-id -i ~/.ssh/id_rsa.pub puppet@Hadoop-NN-02

ssh-copy-id -i ~/.ssh/id_rsa.pub puppet@Hadoop-DN-01

ssh-copy-id -i ~/.ssh/id_rsa.pub puppet@Hadoop-DN-02

ssh-copy-id -i ~/.ssh/id_rsa.pub puppet@Hadoop-DN-03

 

6)配置环境变量:vi ~/.bashrc

内容: 

#Hadoop CDH4

export HADOOP_HOME=/home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2

export HADOOP_PREFIX=/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0

export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin:$HADOOP_HOME/bin

 

 (二)HDFS

1)解压:tar -xvf hadoop-2.0.0-cdh4.4.0.tar.gz

 

2)配置hadoop-env.sh:vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh

添加:export JAVA_HOME=/usr/java/jdk1.7.0_25

 

3)配置core-site.xml:vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->
<configuration>
	<!--基本目录设置 -->
	<property>
		<!--默认FS,这里使用HA配置,填写的是hdfs nameservice逻辑名 -->
		<name>fs.defaultFS</name>
		<value>hdfs://mycluster</value>
	</property>
	<property>
		<!--配置临时目录 -->
		<name>hadoop.tmp.dir</name>
		<value>/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0/data/tmp</value>
	</property>

	<!--==============================Trash机制======================================= -->
	<property>
		<!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 -->
		<name>fs.trash.checkpoint.interval</name>
		<value>0</value>
	</property>
	<property>
		<!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 -->
		<name>fs.trash.interval</name>
		<value>1440</value>
	</property>
</configuration>

 

4)配置hdfs-site.xml:vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh

 

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<!--数据目录配置 -->
	<property>
		<!--namenode 存放元数据和编辑日志地址 -->
		<name>dfs.name.dir</name>
		<value>/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0/data/dfs/name</value>
	</property>
	<property>
		<name>dfs.name.edits.dir</name>
		<value>/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0/data/editlog</value>
	</property>
	<property>
		<!--datanode 存放块地址 -->
		<name>dfs.data.dir</name>
		<value>/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0/data/dfs/dn</value>
	</property>
	<property>
		<!--块副本数 -->
		<name>dfs.replication</name>
		<value>1</value>
	</property>

	<!--===============================HSFS HA======================================= -->
	<!--MR1设置名为 fs.default.name -->
	<property>
		<name>fs.default.name</name>
		<value>hdfs://mycluster</value>
	</property>
	<!--nameservices逻辑名 -->
	<property>
		<name>dfs.nameservices</name>
		<value>mycluster</value>
	</property>
	<property>
		<!--设置NameNode IDs 此版本最大只支持两个NameNode -->
		<name>dfs.ha.namenodes.mycluster</name>
		<value>nn1,nn2</value>
	</property>
	<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
	<property>
		<name>dfs.namenode.rpc-address.mycluster.nn1</name>
		<value>Hadoop-NN-01:8020</value>
	</property>
	<property>
		<name>dfs.namenode.rpc-address.mycluster.nn2</name>
		<value>Hadoop-NN-02:8020</value>
	</property>
	<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
	<property>
		<name>dfs.namenode.http-address.mycluster.nn1</name>
		<value>Hadoop-NN-01:50070</value>
	</property>
	<property>
		<name>dfs.namenode.http-address.mycluster.nn2</name>
		<value>Hadoop-NN-02:50070</value>
	</property>

	<!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
	<!--开启基于Zookeeper及ZKFC进程的自动备援设置,监视进程是否死掉 -->
	<property>
		<name>dfs.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>ha.zookeeper.quorum</name>
		<value>Zookeeper-01:2181,Zookeeper-02:2181,Zookeeper-03:2181</value>
	</property>
	<property>
		<!--指定ZooKeeper超时间隔,单位毫秒 -->
		<name>ha.zookeeper.session-timeout.ms</name>
		<value>2000</value>
	</property>

	<!--==================Namenode fencing:=============================================== -->
	<!--Failover后防止停掉的Namenode启动,造成两个服务 -->
	<property>
		<name>dfs.ha.fencing.methods</name>
		<value>sshfence</value>
	</property>
	<property>
		<name>dfs.ha.fencing.ssh.private-key-files</name>

	</property>

	<!--==================Namenode editlog同步 ============================================ -->
	<!--保证数据恢复 -->
	<property>
		<name>dfs.journalnode.http-address</name>
		<value>0.0.0.0:8480</value>
	</property>
	<property>
		<name>dfs.journalnode.rpc-address</name>
		<value>0.0.0.0:8485</value>
	</property>
	<property>
		<!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
		<!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
		<name>dfs.namenode.shared.edits.dir</name>
		<value>qjournal://Hadoop-DN-01:8485;Hadoop-DN-02:8485;Hadoop-DN-03:8485/mycluster</value>
	</property>
	<property>
		<!--JournalNode存放数据地址 -->
		<name>dfs.journalnode.edits.dir</name>
		<value>/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0/data/dfs/jn</value>
	</property>

	<!--==================DataNode editlog同步 ============================================ -->
	<property>
		<!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
		<name>dfs.client.failover.proxy.provider.mycluster</name>
		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
	</property>

</configuration>

 

5)配置salves:vi $HADOOP_HOME/etc/hadoop/slaves

 

Hadoop-DN-01

Hadoop-DN-02

Hadoop-DN-03

 

6)分发程序:

 

scp -r /home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0 puppet@Hadoop-NN-02: /home/puppet/hadoop/cdh4.4.0/

scp -r /home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0 puppet@Hadoop-DN-01: /home/puppet/hadoop/cdh4.4.0/

scp -r /home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0 puppet@Hadoop-DN-02: /home/puppet/hadoop/cdh4.4.0/

scp -r /home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0 puppet@Hadoop-DN-02: /home/puppet/hadoop/cdh4.4.0/

 

 7)启动JournalNode:

在各个JournalNode(Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03):hadoop-daemon.sh start journalnode

 

8)NameNode格式化:

结点Hadoop-NN-01:hdfs namenode -format

 

9)初始化zkfc:

结点Hadoop-NN-01:hdfs zkfc -formatZK

 

10)启动hdfs

命令所在目录:/home/puppet/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0/sbin/start-dfs.sh

 

11)验证:

进程:

a)NameNode

 

[puppet@BigData-01 ~]$ jps

4001 NameNode

4290 DFSZKFailoverController

4415 Jps

 

b)DataNode

 

[puppet@BigData-03 ~]$ jps

25918 QuorumPeerMain

19217 JournalNode

19143 DataNode

19351 Jps

 

页面:

 

a)ActiveNameNode:Hadoop-NN-01:50070




b)StandbyNameNode:Hadoop-NN-02:50070



 

(三)MapReduce

1)解压:tar -xvf mr1-2.0.0-mr1-cdh4.2.2.tar.gz

 

2)配置hadoop-env.sh:$HADOOP_HOME/conf/hadoop-env.sh

添加:export JAVA_HOME=/usr/java/jdk1.7.0_25

 

3)配置mapred-site.xml:$HADOOP_HOME/conf/mapred-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<!--数据目录配置 -->
	<property>
		<!--TaskTracker存放临时数据和Map中间输出数据目录,建议JBOP模式 -->
		<name>mapred.local.dir</name>
		<value>/home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2/data/mapred/local</value>
	</property>

	<!--===============================JobTracker HA======================================= -->
	<property>
		<!--JobTracker逻辑名,此处用于HA配置 -->
		<name>mapred.job.tracker</name>
		<value>logicaljt</value>
	</property>

	<property>
		<!--是否覆盖最近Active JobTracker运行的Jobs,默认False;HA:必须True -->
		<name>mapred.jobtracker.restart.recover</name>
		<value>true</value>
	</property>
	<property>
		<!--Job状态是否在HDFS上持久化,默认False;HA:必须True -->
		<name>mapred.job.tracker.persist.jobstatus.active</name>
		<value>true</value>
	</property>
	<property>
		<!--Job状态在HDFS上保留多少小时,默认0;HA:必须>0 -->
		<name>mapred.job.tracker.persist.jobstatus.hours</name>
		<value>1</value>
	</property>
	<property>
		<!--HDFS存储Job 状态位置,必须存在切mapred用户拥有 -->
		<name>mapred.job.tracker.persist.jobstatus.dir</name>
		<value>/home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2/data/jobsInfo</value>
	</property>

	<property>
		<name>mapred.jobtrackers.logicaljt</name>
		<value>jt1,jt2</value>
	</property>

	<!-- JobTracker HA: mapred.jobtracker.rpc-address.[nameservice ID] rpc 通信地址 -->
	<property>
		<name>mapred.jobtracker.rpc-address.logicaljt.jt1</name>
		<value>Hadoop-NN-01:8021</value>
	</property>
	<property>
		<name>mapred.jobtracker.rpc-address.logicaljt.jt2</name>
		<value>Hadoop-NN-02:8021</value>
	</property>

	<!-- JobTracker HA: mapred.job.tracker.http.address.[nameservice ID] http 通信地址 -->
	<property>
		<name>mapred.job.tracker.http.address.logicaljt.jt1</name>
		<value>Hadoop-NN-01:50030</value>
	</property>
	<property>
		<name>mapred.job.tracker.http.address.logicaljt.jt2</name>
		<value>Hadoop-NN-02:50030</value>
	</property>

	<!-- JobTracker HA: mapred.ha.jobtracker.rpc-address.[nameservice ID] HA 守护进程 rpc 通信地址 -->
	<property>
		<name>mapred.ha.jobtracker.rpc-address.logicaljt.jt1</name>
		<value>Hadoop-NN-01:8023</value>
	</property>
	<property>
		<name>mapred.ha.jobtracker.rpc-address.logicaljt.jt2</name>
		<value>Hadoop-NN-02:8023</value>
	</property>

	<!-- JobTracker HA: mapred.ha.jobtracker.http-redirect-address.[nameservice ID] http重定向地址 -->
	<property>
		<name>mapred.ha.jobtracker.http-redirect-address.logicaljt.jt1</name>
		<value>Hadoop-NN-01:50030</value>
	</property>
	<property>
		<name>mapred.ha.jobtracker.http-redirect-address.logicaljt.jt2</name>
		<value>Hadoop-NN-02:50030</value>
	</property>

	<!--==================TaskTracker Client 配置====================== -->
	<property>
		<!--TaskTracker,Client连接JobTracker识别选择Active JobTracker策略 -->
		<name>mapred.client.failover.proxy.provider.logicaljt</name>
		<value>org.apache.hadoop.mapred.ConfiguredFailoverProxyProvider</value>
	</property>
	<property>
		<!--TaskTracker,Client最大故障重试次数 -->
		<name>mapred.client.failover.max.attempts</name>
		<value>15</value>
	</property>

	<property>
		<!--TaskTracker,Client首次故障切换前的等待时间 -->
		<name>mapred.client.failover.sleep.base.millis</name>
		<value>500</value>
	</property>

	<property>
		<!--TaskTracker,Client两次故障切换之间等待最大时间 -->
		<name>mapred.client.failover.sleep.max.millis</name>
		<value>1500</value>
	</property>

	<property>
		<!--TaskTracker,Client两次故障切换之间重试最大次数 -->
		<name>mapred.client.failover.connection.retries</name>
		<value>0</value>
	</property>

	<property>
		<!--TaskTracker,Client两次故障切换之间重试超时时间 -->
		<name>mapred.client.failover.connection.retries.on.timeouts</name>
		<value>0</value>
	</property>

	<!--==================JobTracker auto failover base ZKFC and Zookeeper====================== -->
	<!--开启基于Zookeeper及MRZKFC进程的自动备援设置,监视进程是否死掉 -->
	<property>
		<name>mapred.ha.automatic-failover.enabled</name>
		<value>true</value>
	</property>

	<property>
		<name>mapred.ha.zkfc.port</name>
		<value>8018</value>
	</property>
	<property>
		<name>ha.zookeeper.quorum</name>
		<value>Zookeeper-01:2181,Zookeeper-02:2181,Zookeeper-03:2181</value>
	</property>
	<property>
		<!--指定ZooKeeper超时间隔,单位毫秒 -->
		<name>ha.zookeeper.session-timeout.ms</name>
		<value>2000</value>
	</property>

	<!--==================JobTracker fencing:=============================================== -->
	<!--Failover后防止停掉的JobTracker启动,造成两个服务 -->
	<property>
		<name>mapred.ha.fencing.methods</name>
		<value>sshfence</value>
	</property>
	<property>
		<name>mapred.ha.fencing.ssh.private-key-files</name>
		<value>/home/puppet/.ssh/id_rsa</value>
	</property>

	<!--==================TaskTracker fencing:=============================================== -->
	<property>
		<name>mapreduce.tasktracker.http.address</name>
		<value>0.0.0.0:50033</value>
	</property>
</configuration>

 

4) 配置core-site.xml,hdfs-site.xml:

将$HADOOP_PREFIX/etc/hadoop/core-site.xml,$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml拷贝到$HADOOP_HOME/conf下

 

5) 配置slaves:vi $HADOOP_HOME/conf/slaves

 

Hadoop-DN-01

Hadoop-DN-02

Hadoop-DN-03

 

6)分发程序

 

scp -r /home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2 puppet@Hadoop-NN-02:/home/puppet/hadoop/cdh4.2.2/

scp -r /home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2 puppet@Hadoop-DN-01: /home/puppet/hadoop/cdh4.2.2/

scp -r /home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2 puppet@Hadoop-DN-02: /home/puppet/hadoop/cdh4.2.2/

scp -r /home/puppet/hadoop/cdh4.2.2/hadoop-2.0.0-mr1-cdh4.2.2 puppet@Hadoop-DN-02: /home/puppet/hadoop/cdh4.2.2/

 

7)初始化MRZFCK:

结点Hadoop-NN-01

命令$HADOOP_HOME/hadoop mrzkfc -formatZK

 

8)启动MRV1:

命令位置:$HADOOP_HOME/bin

在Hadoop-NN-01,Hadoop-NN-02启动jobtrackerha:bin/hadoop-daemon.sh start jobtrackerha

在Hadoop-NN-01,Hadoop-NN-02启动mrzfck: bin/hadoop-daemon.sh start mrzkfc

在Hadoop-NN-01上执行脚本: bin/hadoop-daemons.sh start tasktracker

 

9)验证

 

进程:

1)JobTracker:Hadoop-NN-01,Hadoop-NN-02

 

[puppet@BigData-01 hadoop-2.0.0-mr1-cdh4.2.2]$ jps

27071 Jps

27051 MRZKFailoverController

26968 JobTrackerHADaemon

24707 NameNode

24993 DFSZKFailoverController

 

2)TaskTracker:Hadoop-DN-01,Hadoop-DN-02,Hadoop-DN-03

 

[puppet@BigData-03 bin]$ jps

26497 JournalNode

25918 QuorumPeerMain

27173 TaskTracker

27218 Jps

26423 DataNode

 

页面:

 

 

 

猜你喜欢

转载自shihlei.iteye.com/blog/2066627