Hadoop伪分布模式,是在一个单机上模拟Hadoop分布式环境,需要安装的包括:
- HDFS:包括NameNode和DataNode
- Yarn:是运行mapReducede容器,包括ResourceManager和NodeManager
准备
$ sudo apt-get install ssh 【已经安装了openssh,可以使用ssh,无需再次安装】
$ sudo apt-get install rsync 【这个似乎也无需安装】
无密码的ssh登录
$ ssh-keygen -t rsa -P ’’ -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
检查是否成功
$ ssh localhost //如成功,就可以无密码直接登陆
HDFS的启动
配置NDFS
【etc/hadoop/core-site.xml】
<configuration>
<!-- 配置NDFS的NameNode -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://191.8.2.45:9000</value>
</property>
<!-- hadoop的缺省nfsd(包括namenode和datanode)位于/tmp/hadoop-<username>,本例的缺省namenode为/tmp/hadoop-wei/dfs/name -->
<!-- 我们可以指定位置为/home/wei/hadoop/hadoop-2.9.0/tmp,则namenode的路径为/home/wei/hadoop/hadoop-2.9.0/tmp/dfs/name -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/wei/hadoop/hadoop-2.9.0/tmp</value>
</property>
</configuration>
【etc/hadoop/hdfs-site.xml】
<configuration>
<!-- 配置冗余度:缺省的冗余度为3,由于是单机版,设置为1。 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
格式化namenode
$ hdfs namenode -format
18/05/17 16:56:36 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = gsta005/191.8.2.45
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.9.0
STARTUP_MSG: classpath = /home/wei/hadoop/hadoop-2.9.0/etc/hadoop:... ... :/home/wei/hadoop/hadoop-2.9.0/contrib/capacity-scheduler/*.jar
STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 756ebc8394e473ac25feac05fa493f6d612e6c50; compiled by 'arsuresh' on 2017-11-13T23:15Z
STARTUP_MSG: java = 1.8.0_66
************************************************************/
... ...
18/05/17 16:56:37 INFO common.Storage: Storage directory /home/wei/hadoop/hadoop-2.9.0/tmp/dfs/name has been successfully formatted.
18/05/17 16:56:37 INFO namenode.FSImageFormatProtobuf: Saving image file /home/wei/hadoop/hadoop-2.9.0/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
18/05/17 16:56:37 INFO namenode.FSImageFormatProtobuf: Image file /home/wei/hadoop/hadoop-2.9.0/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
18/05/17 16:56:37 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
18/05/17 16:56:37 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at gsta005/191.8.2.45
************************************************************/
启动Namenode和Datanode的daemon
$ start-dfs.sh
Starting namenodes on [gsta005]
gsta005: starting namenode, logging to /home/gsta/wei/hadoop/hadoop-2.9.0/logs/hadoop-gsta-namenode-gsta005.out
localhost: starting datanode, logging to /home/gsta/wei/hadoop/hadoop-2.9.0/logs/hadoop-gsta-datanode-gsta005.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/gsta/wei/hadoop/hadoop-2.9.0/logs/hadoop-gsta-secondarynamenode-gsta005.out
如果执行的过程中报错: Error: JAVA_HOME is not set and could not be found.则在etc/hadoop/hadoop-env.sh进行设定
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/home/wei/jdk1.8.0_66
通过浏览器进行确认: http://191.8.2.45:50070/
关闭Namenode和Datanode的daemon: $ stop-dfs.shYarn的启动
Yarn的配置
【etc/hadoop/mapred-site.xml】该文件通过cp mapred-site.xml.template mapred-site.xml创建
<configuration>
<!-- 配置MapReduce运行的框架 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
【etc/hadoop/yarn-site.xml】
<configuration>
<!-- 配置NodeManager执行任务的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Yarn的启动
$ start-yarn.sh
进行验证
$ jps
15526 DataNode
16023 ResourceManager
15383 NameNode
16552 Jps
15802 SecondaryNameNode
16172 NodeManager
访问Yarn的页面:
http://191.8.2.45:8088