Hadoop2.2安装笔记

安装环境:

Hyper-V 2008 R2,RHEL5.5,Hadoop2.2

 

计划:

搭建一主两从集群,实现分布式存储和mapreduce计算:

主(namenode+resourcemanager):16.158.49.120,h1.dssdev

从1(datanode+nodemanager):16.158.49.121,h2.dssdev

从2(datanode+nodemanager):16.158.49.123,h3.dssdev

 

步骤:

  • 设置代理服务器:
vim /etc/profile

>http_proxy=proxy.houston.hp.com:8080
>https_proxy=proxy.houston.hp.com:8080
>ftp_proxy=proxy.houston.hp.com:8080
>no_proxy=127.0.0.1,localhost
>export http_proxy https_proxy ftp_proxy no_proxy

source /etc/profile

 

  • 在h1.dssdev上新建账户:hadoop:
useradd hadoop
passwd hadoop

 

  • 修改hosts文件(否则会出现hadoop metrics.MetricsUtil: Unable to obtain hostName
vim /etc/hosts
>16.158.49.120	h1.dssdev h1

 

  • 卸载系统gcc的jdk,并安装官方jdk

查看自带的jdk::

rpm -qa | grep gcj

看到如下信息:

libgcj-4.1.2-44.el5
java-1.4.2-gcj-compat-1.4.2.0-40jpp.115

使用rpm -e --nodeps命令删除上面查找的内容:

rpm -e -–nodeps java-1.4.2-gcj-compat-1.4.2.0-40jpp.115

下载jdk1.6,并安装(不要安装jre,jre没有jps工具)

http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archive-downloads-javase6-419409.html#jdk-6u45-oth-JPR

 

  • 下载hadoop2.2.0二进制版本

http://www.apache.org/dyn/closer.cgi/hadoop/common/

修改tarball权限:

sudo chown hadoop:hadoop hadoop-2.2.0.tar.gz
sudo chmod 775 hadoop-2.2.0.tar.gz

解压至/usr/local/hadoop,并设置环境变量:

vim /etc/profile

>export JAVA_HOME=/usr/java/jdk1.6.0_45
>export HADOOP_HOME=/usr/local/hadoop/hadoop-2.2.0
>export HADOOP_DEV_HOME=$HADOOP_HOME
>export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}
>export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
>export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
>export HADOOP_PREFIX=${HADOOP_DEV_HOME}
>export YARN_HOME=${HADOOP_DEV_HOME}
>export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
>export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
>export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
>export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib
>export >PATH=$PATH:$JAVA_HOME/bin:$HADOOP_DEV_HOME/bin:$HADOOP_DEV_HOME/sbin

source /etc/profile
  • 配置hadoop节点
创建/usr/local/hadoop/tmp, /usr/local/hadoop/hdfs/name,/usr/local/hadoop/hdfs/data。 修改配置文件: core-site.xml:
<configuration>
	<property>
		<name>fs.default.name</name>
		<value>hdfs://h1.dssdev:9000</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/usr/local/hadoop/tmp</value>
	</property>
	<property>
		<name>hadoop.proxyuser.root.hosts</name>
		<value>16.158.49.120</value>
	</property>
	<property>
		<name>hadoop.proxyuser.root.groups</name>
		<value>*</value>
	</property>
</configuration>
 hdfs-site.xml:
<configuration>
	<property>
		<name>dfs.replication</name>
		<value>2</value>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/usr/local/hadoop/hdfs/name</value>
		<final>true</final>
	</property>
	<property>
		<name>dfs.dataname.data.dir</name>
		<value>file:/usr/local/hadoop/hdfs/data</value>
		<final>true</final>
	</property>
	<property>
		<name>dfs.federation.nameservice.id</name>
		<value>ns1</value>
	</property>
        <property>
                <name>dfs.namenode.backup.address.ns1</name>
                <value>16.158.49.120:50100</value>
        </property>
        <property>
                <name>dfs.namenode.backup.http-address.ns1</name>
                <value>16.158.49.120:50105</value>
        </property>
        <property>
                <name>dfs.federation.nameservices</name>
                <value>ns1</value>
        </property>
        <property>
                <name>dfs.namenode.rpc-address.ns1</name>
                <value>16.158.49.120:9000</value>
        </property>
        <property>
                <name>dfs.namenode.rpc-address.ns2</name>
                <value>16.158.49.120:9000</value>
        </property>
        <property>
                <name>dfs.namenode.http-address.ns1</name>
                <value>16.158.49.120:23001</value>
        </property>
        <property>
                <name>dfs.namenode.http-address.ns2</name>
                <value>16.158.49.120:13001</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address.ns1</name>
                <value>16.158.49.120:23002</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address.ns2</name>
                <value>16.158.49.120:23002</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address.ns1</name>
                <value>16.158.49.120:23003</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address.ns2</name>
                <value>16.158.49.120:23003</value>
        </property>
</configuration>
 mapred-site.xml:
<configuration>
	<property>
		<name>mapred.job.tracker</name>
		<value>h1.dssdev:9001</value>
	</property>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
</configuration>
 yarn-site.xml:
<configuration>
        <property>
                <name>yarn.resourcemanager.address</name>
                <value>16.158.49.120:18040</value>
        </property>
        <property>
                <name>yarn.resourcemanager.scheduler.address</name>
                <value>16.158.49.120:18030</value>
        </property>
        <property>
                <name>yarn.resourcemanager.webapp.address</name>
                <value>16.158.49.120:18088</value>
        </property>
        <property>
                <name>yarn.resourcemanager.resource-tracker.address</name>
                <value>16.158.49.120:18025</value>
        </property>
        <property>
                <name>yarn.resourcemanager.admin.address</name>
                <value>16.158.49.120:18141</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>
 master:
16.158.49.120
 slaves:
16.158.49.121
16.158.49.123

~~~~~~ 以上步骤在其他节点各重复操作一遍 ~~~~~~

  • 设置master对slave的无密码ssh链接
写道
SSH无密码原理简介 :
首先在h1.dssdev上生成一个密钥对,包括一个公钥和一个私钥,并将公钥复制到所有的slave(h2.dssdev&h3.dssdev) 上,然后当master通过SSH连接slave时,slave就会生成一个随机数并用master的公 钥对随机数进行加密,并发送给master。最后master收到加密数之后再用私钥解密,并将解密数回传给slave,slave确认解密数无误之后就允许master不输入密码进行连接了。

1、执行命令ssh-keygen -t rsa之后一路回车,查看刚生成的无密码钥对:cd .ssh后执行ll

2、把id_rsa.pub追加到授权的key里面去。执行命令cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys

3、修改权限:执行chmod 600 ~/.ssh/authorized_keys

4、确保cat /etc/ssh/sshd_config中存在如下内容

RSAAuthentication yes

PubkeyAuthentication yes

AuthorizedKeysFile      .ssh/authorized_keys

如需修改,则在修改后执行重启SSH服务命令使其生效:service sshd restart

5、将公钥复制到所有的 slave 机器上 :scp ~/.ssh/id_rsa.pub 192.168.1.203 : ~/    然后 输入 yes ,最后 输入 slave 机器的密 码

6、在slave机器上创建.ssh 文件夹:mkdir ~/.ssh 然后执行chmod 700 ~/.ssh(若文件夹以存在则不需要创建)

7、追加到授权文件authorized_keys执行命令:cat ~/id_rsa.pub >> ~/.ssh/authorized_keys然后执行chmod 600 ~/.ssh/authorized_keys

8、重复第4步

9、验证命令:在master机器上执行ssh 192.168.1.203发现主机名由hadoop1变成hadoop3即成功,最后删除id_rsa.pub文件:rm -r id_rsa.pub

 

  •  启动集群:

在master上执行:

 

cd /usr/local/hadoop/hadoop-2.2.0/sbin

./start-dfs.sh
或
./hadoop-deamon.sh start namenode
./hadoop-daemons.sh start datanode

./start-yarn.sh
或
./yarn-daemon.sh start resourcemanager
./yarn-daemons.sh start nodemanager

./mr-jobhistory-daemon.sh start historyserver
 

在master上执行jps:

 

29043 JobHistoryServer
2902 Jps
28625 NameNode
28761 ResourceManager
在slave上执行jps:

 

2869 NodeManager
24817 Jps
2710 DataNode
 
  • 验证
mkdir -p /usr/local/hadoop/hadoop-2.2.0/input

cat > input/file.txt
This is one line
This is another one

cd bin
./hdfs dfs -mkdir user/input
./hdfs dfs -copyFromLocal /usr/local/hadoop/hadoop-2.2.0/input/file.txt /user/input/
./hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar grep /user/input /user/output 'i'

./hdfs dfs -cat /user/output/*
  • web interface

1. http://master:50070/dfshealth.jsp
2. http://master:8088/cluster
3. http://master:19888/jobhistory

猜你喜欢

转载自dakeiskind.iteye.com/blog/2042136
今日推荐