本文是我学习Hadoop搭建过程中的各种笔记,内容来自于各种公开的教程,起点非常低,从Linux基础开始,直至在PC上搭建Hadoop成功,是真正的从零开始。
感谢过程中帮助我的各位认识的和不认识的老师。
38、Hadoop的集群配置02:
[root@hadoop01 hadoop-2.7.1]# ll ./etc/hadoop/mapred-site.xml.template
-rw-r--r--. 1 10021 10021 758 Jun 29 2015 ./etc/hadoop/mapred-site.xml.template
mv ./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml
[[email protected]]#mv./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml
第四个配置文件: vi ./etc/hadoop/mapred-site.xml
< configuration >
<!--指定mapreduce运行框架-->
【mapreduce跑的框架在yarn之上】
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>ture</final>
</property >
<!--历史服务的通信地址-->
【mapreduce.jobhistory.address:历史服务的内部通信地址】
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
【默认的端口号:10020】
</property >
<!--历史服务的web ui地址-->
<property>
<name>mapreduce.jobhistory.webapp.address </name>
<value>hadoop01:19888</value>
</property >
第五个配置文件: vi ./etc/hadoop/yarn-site.xml
< configuration >
<!--指定rm所启动的服务主机名-->
【(rm: resourcemanager)resourcemanager要启动的节点】
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
【因为resourcemanager规划在hadoop01上面】
</property>
<!--指定mr的shuffle-->
【(mr: mapreduce)没有这个,下面运型mapreduce,会报错】
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--指定rm的内部通信地址-->
<property>
<name>yarn. .resourcemanager.address</name>
<value>hadoop01:8032</value>
</property>
<!--指定rm的scheduler内部通信地址-->
<property>
<name>yarn. .resourcemanager.scheduler.address</name>
<value>hadoop01:8030</value>
</property>
<!--指定rm的resource-tracker内部通信地址-->
<property>
<name>yarn. .resourcemanager.resource-tracker.address</name>
<value>hadoop01:8031</value>
</property>
<!--指定rm的admin内部通信地址-->
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop01:8033</value>
【803几开始的就是yarn的一些内部通信地址】
</property>
<!--指定rm的web ui监控地址-->
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop01:8033</value>
</property>
第六个配置文件::vi ./etc/hadoop/slaves
【slaves:奴隶的意思,老大找小弟就靠这个文件】
[root@hadoop01 hadoop-2.7.1]# vi ./etc/hadoop/slaves
删除:localhost
输入:
hadoop01
hadoop02
hadoop03
实际操作配置:
进入hadoop的目录:
[root@hadoop01 ~]# cd $HADOOP_HOME
[root@hadoop01 hadoop-2.7.1]#
配六个相关文件:
第1个:vi ./ect/hadoop/hadoop-env.sh
[root@hadoop01 hadoop-2.7.1]# vi ./ect/hadoop/hadoop-env.sh
之前已经配置过:export JAVA_HOME=/usr/local/jdk1.8.0_144/
第2个:vi ./ect/hadoop/core-site.xml
[root@hadoop01 hadoop-2.7.1]# vi ./ect/hadoop/core-site.xml (配置了三项:)
<configuration>
<!--配置hdfs文件系统的命名空间-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:9001</value>
</property >
<!--配置操作hdfs的缓冲大小-->
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<!--配置临时数据存储目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/bigdata/tmp</value>
</property>
</configuration>
第3个:vi ./ect/hadoop/hdfs-site.xml
[root@hadoop01 hadoop-2.7.1]# vi ./ect/hadoop/hdfs-site.xml
<configuration>
<!--配置副本数 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--块大小-->
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<!--hdfs的元数据存储的位置-->
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoopdata/dfs/name</value>
</property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoopdata/dfs/name</value>
</property>
<!--hdfs的数据存储的位置-->
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoopdata/dfs/data</value>
</property>
<!--hdfs的检测目录-->
<property>
<name>fs.checkpoint.dir</name>
<value>/home/hadoopdata/checkpoint/dfs/cname</value>
</property>
<!--hdfs的namenode的web ui地址-->
<property>
<name>dfs.http.address</name>
<value>hadoop01:50070</value>
</property>
<!--hdfs的snn的web ui地址-->
<property>
<name>dfs.secondary.http.address</name>
<value>hadoop01:50090</value>
</property>
<!--是否开启web操作hdfs -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>false</value>
</property>
<!--是否起用hdfs的权限(acl)-->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
第4个:vi ./etc/hadoop/mapred-site.xml
[root@hadoop01 hadoop-2.7.1]# ll ./etc/hadoop/mapred-site.xml.template
-rw-r--r--. 1 10021 10021 758 Jun 29 2015 ./etc/hadoop/mapred-site.xml.template
[root@hadoop01 hadoop-2.7.1]# mv ./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml
[root@hadoop01 hadoop-2.7.1]# vi ./etc/hadoop/mapred-site.xml
<configuration>
<!--指定mapreduce运行框架-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property >
<!--历史服务的通信地址-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<!--历史服务的web ui地址-->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>
</configuration>
第5个:vi ./etc/hadoop/yarn-site.xml
[root@hadoop01 hadoop-2.7.1]# vi ./etc/hadoop/yarn-site.xml
<configuration>
<!--指定resourcemanager所启动的服务主机名-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<!--指定mapreduce的shuffle-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--指定rm的内部通信地址-->
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop01:8032</value>
</property>
<!--指定rm的scheduler内部通信地址-->
<property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop01:8030</value>
</property>
<!--指定rm的resource-tracker内部通信地址-->
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop01:8031</value>
</property>
<!--指定rm的admin内部通信地址-->
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop01:8033</value>
</property>
<!--指定rm的web ui监控地址-->
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop01:8088</value>
</property>
</configuration>
第6个:vi ./etc/hadoop/slaves
[root@hadoop01 hadoop-2.7.1]# vi ./etc/hadoop/slaves
删除:localhost
输入:
hadoop01
hadoop02
hadoop03
规划、配置完成后远程分发到别的服务器上面,就是规划的三台服务器上面,使三台服务器都要有相同的配置。
远程分发
分发前,hadoop02,hadoop03上都有hadoop,删掉:rm –rf /usr/local/hahdoop-2.7.1
[root@hadoop02 ~]# rm -rf /usr/local/hadoop-2.7.1/
[root@hadoop03 ~]# rm -rf /usr/local/hadoop-2.7.1/
删除后,此时hadoop02,hadoop03都没有不再有和hadoop先关的东西
分别在hadoop02,hadoop03上which Hadoop,which hadoop都显示是没有的
[root@hadoop02 ~]#cd /usr/local/
[root@hadoop02 ~]#ll
[root@hadoop02 local]# which hadoop
/usr/bin/which: no hadoop in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/jdk1.8.0_144//bin:/usr/local/hadoop-2.7.1//bin:/usr/local/hadoop-2.7.1//sbin::/root/bin)
[root@hadoop03 local]# which hadoop
/usr/bin/which: no hadoop in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/jdk1.8.0_144//bin:/usr/local/hadoop-2.7.1//bin:/usr/local/hadoop-2.7.1//sbin::/root/bin)
39、Hadoop的集群的启动和测试:
远程分发:scp
远程分发别的服务器上面:
在hadoop01远程分发到hadoop02和hadoop03上:
scp –r . ./hadoop-2.7.1/ hadoop02:/usr/local
scp –r . ./hadoop-2.7.1/ hadoop03:/usr/local
[root@hadoop01 hadoop-2.7.1]# scp -r ../hadoop-2.7.1/ hadoop02:/usr/local/
[root@hadoop01 hadoop-2.7.1]# scp -r ../hadoop-2.7.1/ hadoop03:/usr/local/
(..)上上一次目录的hadoop-2.7.1,分发到hadoop02、hadoop03机子的usr/local目录
此时出现不能解决主机名的问题,丢掉了连接,原因是没有映射:
[root@hadoop01 hadoop-2.7.1]# scp -r ../hadoop-2.7.1/ hadoop02:/usr/local/
The authenticity of host 'hadoop02 (192.168.216.112)' can't be established.
RSA key fingerprint is 04:ae:11:51:c3:ac:4b:0d:9b:78:3c:c0:58:8e:82:04.
Are you sure you want to continue connecting (yes/no)?
Host key verification failed.
lost connection
[root@hadoop01 hadoop-2.7.1]# scp -r ../hadoop-2.7.1/ hadoop02:/usr/local/
ssh: connect to host hadoop02 port 22: Connection refused (没有映射)
添加映射:vi /etc/hosts
[root@hadoop01 hadoop-2.7.1]# vi /etc/hosts
192.168.216.111 hadoop01 www.hadoop01.com
192.168.216.112 hadoop02 www.hadoop02.com 添加
192.168.216.113 hadoop03 www.hadoop03.com 添加
添加的192.168.216.112. hadoop02 www.hadoop02.com和192.168.216.113. hadoop03 www.hadoop03.com 指:输hadoop02,它IP才会到112上,否则找不到
添加完映射,分发前,删除doc文档,便于传输,:rm -rf ./share/doc/
[root@hadoop01 ~]# cd /usr/local/
[root@hadoop01 local]# cd ./hadoop-2.7.1/
[root@hadoop01 hadoop-2.7.1]# ll ./share
total 8
drwxr-xr-x. 3 10021 10021 4096 Jun 29 2015 doc
drwxr-xr-x. 9 10021 10021 4096 Jun 29 2015 hadooprm –rf ./share/doc/
ll
doc,是一个学习文档,东西很杂,要删掉,便于传输,传输会快一点:
[root@hadoop01 hadoop-2.7.1]# rm -rf ./share/doc/
补充:若是第二次分发到同一台机子上是不需要的输yes ,直接输密码即可,因为只有第一次过去,才把主机名接到这台机子上)
添加完映射,删除doc后,分发到hadoop02上:
[root@hado op01 hadoop-2.7.1]#scp –r . ./hadoop-2.7.1/ hadoop02:/usr/local
问是否真的要过去,是否进入一下主机名,输入:yes 、输入密码:root
分发到hadoop03上:
[root@hadoop01 hadoop-2.7.1]# scp -r ../hadoop-2.7.1/ hadoop03:/usr/local/
The authenticity of host 'hadoop03 (192.168.216.113)' can't be established.
RSA key fingerprint is 58:0e:71:78:09:8c:54:ed:43:16:e3:71:eb:5c:20:57.
Are you sure you want to continue connecting (yes/no)? yes
root@hadoop03's password:
在hadoop02上面查看有没有hadoop-2.7.1:
[root@hadoop02 local]# ll
total 52
drwxrwxr-x. 9 1000 1000 4096 Apr 23 15:17 bashdb-4.4-0.93
………………
drwxr-xr-x. 9 root root 4096 Apr 25 10:43 hadoop-2.7.1
………………
drwxr-xr-x. 2 root root 4096 Sep 23 2011 src
[root@hadoop02 local]# which hadoop
/usr/local/hadoop-2.7.1/bin/hadoop
在hadoop03上面查看:which hadoop:
[root@hadoop03 local]# which hadoop
/usr/local/hadoop-2.7.1/bin/hadoop
第一次which hadoop,出现了下面这个:
[root@hadoop03 local]# which hadoop
/usr/local/hadoop-2.7.1/bin/hadoop
You have new mail in /var/spool/mail/root(您在/ var / spool / mail / root中有新邮件)
远程分发做好↑
Hadoop集群规划、配置完成,远程分发也做好,接下来可以启动Hadoop集群了,但是启动之前还需进行一些操作,启动之前,由于集群刚搭起来,hdfs是一个文件系统,需要先格式化。
启动之前,在namenode服务器上格式化,只需要一次格式化一次就可以了,下一次直接启动即可。
一定要在hadoop01(namenode)上面格式化,格式化命令:hadoop namenode-format
格式化后,启动NameNode、DataNode、ResourceManager、NodeManager节点。
查看hadoop-2.7.1下的home目录,目前没有hadoopdata目录以及bigdata临时目录:
[root@hadoop01 hadoop-2.7.1]# ll /home/
total 387564
drwx------. 5 aidon aidon 4096 Apr 22 00:49 aidon
drwxrwxr-x. 9 1000 1000 4096 Aug 5 2017 bashdb-4.4-0.93
-rw-r--r--. 1 root root 699632 Apr 23 11:50 bashdb-4.4-0.93.tar.bz2
-rw-r--r--. 1 root root 210606807 Apr 6 08:54 hadoop-2.7.1.tar.gz
drwxr-xr-x. 2 root root 4096 Apr 24 20:14 input
-rw-r--r--. 1 root root 185515842 Mar 29 23:03 jdk-8u144-linux-x64.tar.gz
drwxr-xr-x. 2 root root 4096 Apr 24 20:17 output
drwxr-xr-x. 2 root root 4096 Apr 24 18:28 shell
drwxr-xr-x. 3 root root 4096 Apr 21 17:14 test2
drwxr-xr-x. 2 root root 4096 Apr 21 17:16 test4
drwxr-xr-x. 2 root root 4096 Apr 21 17:16 test5
-rw-r--r--. 1 root root 441 Apr 21 23:45 test.tar
在hadoop01上面格式化:
[root@hadoop01 hadoop-2.7.1]# hadoop namenode –format
18/04/25 20:50:27 INFO common.Storage: Storage directory /home/hadoopdata/dfs/name
has been successfully formatted.
格式化后出现此句话:表明格式化已经成功,保证了元数据目录是新↑
Duplicate 克隆一个新窗口: 此时在克隆的新窗口查看home目录下有hadoopdata:
[root@hadoop01 ~]# ll /home/
total 387568
………………
drwxr-xr-x. 3 root root 4096 Apr 25 20:50 hadoopdata
………………
hadoopdata下面有一个dfs:
[root@hadoop01 ~]# ll /home/hadoopdata/
total 4
drwxr-xr-x. 3 root root 4096 Apr 25 20:50 dfs
dfs下面有一个name , name装的是元数据:
[root@hadoop01 ~]# ll /home/hadoopdata/dfs
total 4
drwxr-xr-x. 3 root root 4096 Apr 25 20:50 name
name下的current , 就是元数据:
[root@hadoop01 ~]# ll /home/hadoopdata/dfs/name/
total 4
drwxr-xr-x. 2 root root 4096 Apr 25 20:50 current
current里面装的就是元数据:(fsimage元数据)
[root@hadoop01 ~]# ll /home/hadoopdata/dfs/name/current/
total 16
-rw-r--r--. 1 root root 351 Apr 25 20:50 fsimage_0000000000000000000
-rw-r--r--. 1 root root 62 Apr 25 20:50 fsimage_0000000000000000000.md5
-rw-r--r--. 1 root root 2 Apr 25 20:50 seen_txid
-rw-r--r--. 1 root root 208 Apr 25 20:50 VERSION
启动服务:
格式化成功后,生成新的元数据目录,就可以正常启动集群(启动NameNode、DataNode、ResourceManager、NodeManager节点)
三种启动方式: |
|
全启动 |
start-all.sh |
模块启动 |
start-dfs.sh |
start-yarn.sh |
|
单个进程启动 注意:start/stop后面的全部要小写 |
hadoop-daemon.sh start/stop namenode 启动/停止 namenode |
hadoop-daemons.sh start/stop datanode 启动/停止 整个集群全部的datanode |
|
yarn-daemon.sh start/stop namenod |
|
yarn-daemons.sh start/stop datanode |
|
mr-jobhistory-daemon.sh start/stop historyserver |
启动符命令,所在位置:ll ./sbin/
[root@hadoop01 hadoop-2.7.1]# ll ./sbin/
total 120
-rwxr-xr-x. 1 10021 10021 2752 Jun 29 2015 distribute-exclude.sh
-rwxr-xr-x. 1 10021 10021 6452 Jun 29 2015 hadoop-daemon.sh
-rwxr-xr-x. 1 10021 10021 1360 Jun 29 2015 hadoop-daemons.sh
-rwxr-xr-x. 1 10021 10021 1640 Jun 29 2015 hdfs-config.cmd
-rwxr-xr-x. 1 10021 10021 1427 Jun 29 2015 hdfs-config.sh
-rwxr-xr-x. 1 10021 10021 2291 Jun 29 2015 httpfs.sh
-rwxr-xr-x. 1 10021 10021 3128 Jun 29 2015 kms.sh
-rwxr-xr-x. 1 10021 10021 4080 Jun 29 2015 mr-jobhistory-daemon.sh
-rwxr-xr-x. 1 10021 10021 1648 Jun 29 2015 refresh-namenodes.sh
-rwxr-xr-x. 1 10021 10021 2145 Jun 29 2015 slaves.sh
-rwxr-xr-x. 1 10021 10021 1779 Jun 29 2015 start-all.cmd
-rwxr-xr-x. 1 10021 10021 1471 Jun 29 2015 start-all.sh
-rwxr-xr-x. 1 10021 10021 1128 Jun 29 2015 start-balancer.sh
-rwxr-xr-x. 1 10021 10021 1401 Jun 29 2015 start-dfs.cmd
-rwxr-xr-x. 1 10021 10021 3734 Jun 29 2015 start-dfs.sh
-rwxr-xr-x. 1 10021 10021 1357 Jun 29 2015 start-secure-dns.sh
-rwxr-xr-x. 1 10021 10021 1571 Jun 29 2015 start-yarn.cmd
-rwxr-xr-x. 1 10021 10021 1347 Jun 29 2015 start-yarn.sh
-rwxr-xr-x. 1 10021 10021 1770 Jun 29 2015 stop-all.cmd
-rwxr-xr-x. 1 10021 10021 1462 Jun 29 2015 stop-all.sh
-rwxr-xr-x. 1 10021 10021 1179 Jun 29 2015 stop-balancer.sh
-rwxr-xr-x. 1 10021 10021 1455 Jun 29 2015 stop-dfs.cmd
-rwxr-xr-x. 1 10021 10021 3206 Jun 29 2015 stop-dfs.sh
-rwxr-xr-x. 1 10021 10021 1340 Jun 29 2015 stop-secure-dns.sh
-rwxr-xr-x. 1 10021 10021 1642 Jun 29 2015 stop-yarn.cmd
-rwxr-xr-x. 1 10021 10021 1340 Jun 29 2015 stop-yarn.sh
-rwxr-xr-x. 1 10021 10021 4295 Jun 29 2015 yarn-daemon.sh
-rwxr-xr-x. 1 10021 10021 1353 Jun 29 2015 yarn-daemons.sh
使用模块启动集群:
此时在hadoop02的home下面仍然没有hadoopdata目录,因为hadoop02上没有元数据,只有01上有元数据; hadoop01、hadoop02、hadoop03上面有datanode,表明有真正数据内容,只有正在去写数据时hadoop02、hadoop03才有目录出现。
在hadoop01上启动: ./sbin/start-dfs.sh
[root@hadoop01 hadoop-2.7.1]# ./sbin/start-dfs.sh
由于没有配置ssh免密登录,要不断回答是否接受主机,输入yes和密码(root)
18/04/26 09:14:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop01]
root@hadoop01's password:
hadoop01: namenode running as process 12853. Stop it first.
root@hadoop02's password: root@hadoop01's password: root@hadoop03's password:
hadoop02: datanode running as process 12790. Stop it first.
root
hadoop01: starting datanode, logging to /usr/local/hadoop-2.7.1/logs/hadoop-root-datanode-hadoop01.out
root
hadoop03: starting datanode, logging to /usr/local/hadoop-2.7.1/logs/hadoop-root-datanode-hadoop03.out
root
Starting secondary namenodes [hadoop01]
root@hadoop01's password:
hadoop01: secondarynamenode running as process 13052. Stop it first.
18/04/26 09:15:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
在hadoop01、hadoop02、hadoop03上分别查看进程 :jps(看进程)
查看hadoop01的进程:
[root@hadoop01 hadoop-2.7.1]# jps
13376 DataNode
12853 NameNode
13612 Jps
13052 SecondaryNameNode
查看hadoop02的进程:
[root@hadoop02 local]# jps
13027 Jps
12790 DataNode
查看hadoop03的进程:
[root@hadoop03 local]# jps
12925 DataNode
12989 Jps
测试(分几步):
1. 查看进程是否按照规划启动起来
2. 查看对应模块的web ui监控是否正常:
http:// 192.168.216.111:50070
192.168.216.111:50070(用web,UI来看,用namenode的IP就行了)
3. 上传和下载文件(测试hdfs),跑一个MapReduce的作业(测yarn集群)
hdfs的模块的测试:
查看hdfs文件系统的根目录下有无东西,现在是没有的:
[root@hadoop01 hadoop-2.7.1]# hdfs dfs -ls /(这个hdfs dfs –ls /命令,以后会讲)
18/04/26 09:26:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
查看本地Linux系统,hadoop-2.7.1下面有哪些目录和文件:
[root@hadoop01 hadoop-2.7.1]# ll
total 56
drwxr-xr-x. 2 10021 10021 4096 Jun 29 2015 bin
drwxr-xr-x. 3 10021 10021 4096 Apr 25 10:15 etc
drwxr-xr-x. 2 10021 10021 4096 Jun 29 2015 include
drwxr-xr-x. 3 10021 10021 4096 Jun 29 2015 lib
drwxr-xr-x. 2 10021 10021 4096 Jun 29 2015 libexec
-rw-r--r--. 1 10021 10021 15429 Jun 29 2015 LICENSE.txt
drwxr-xr-x. 2 root root 4096 Apr 26 09:15 logs
-rw-r--r--. 1 10021 10021 101 Jun 29 2015 NOTICE.txt
-rw-r--r--. 1 10021 10021 1366 Jun 29 2015 README.txt
drwxr-xr-x. 2 10021 10021 4096 Jun 29 2015 sbin
drwxr-xr-x. 3 10021 10021 4096 Apr 25 10:36 share
上传,在网页上:http://192.168.216.111:50070/
把本地Linux系统的文件READM.txt上传到hdfs文件系统的根目录下面,名字就是以前的名字 README.txt:
[root@hadoop01 hadoop-2.7.1]# hdfs dfs -put ./README.txt /
18/04/26 09:29:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
查看hdfs文件系统的根目录,现在有了:
[root@hadoop01 hadoop-2.7.1]# hdfs dfs -ls /
18/04/26 09:31:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 3 root supergroup 1366 2018-04-26 09:30 /README.txt
在hdfs文件系统,读从本地Linux系统上传过来的README.txt文件:(注意:hdfs文件系统,从根目录开始,没有相对目录,不要打点)
[root@hadoop01 hadoop-2.7.1]# hdfs dfs -cat /README.txt
18/04/26 09:35:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
For the latest information about Hadoop, please visit our website at:
http://hadoop.apache.org/core/
and our wiki, at:
http://wiki.apache.org/hadoop/
This distribution includes cryptographic software. The country in
which you currently reside may have restrictions on the import,
possession, use, and/or re-export to another country, of
encryption software. BEFORE using any encryption software, please
check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to
see if this is permitted. See <http://www.wassenaar.org/> for more
information.
The U.S. Government Department of Commerce, Bureau of Industry and
Security (BIS), has classified this software as Export Commodity
Control Number (ECCN) 5D002.C.1, which includes information security
software using or performing cryptographic functions with asymmetric
algorithms. The form and manner of this Apache Software Foundation
distribution makes it eligible for export under the License Exception
ENC Technology Software Unrestricted (TSU) exception (see the BIS
Export Administration Regulations, Section 740.13) for both object
code and source code.
The following provides more details on the included cryptographic
software:
Hadoop Core uses the SSL libraries from the Jetty project written
by mortbay.org.
文件成功读出,hdfs模块集群搭建好↑
yarn的模块的测试:
启动yarn start-yarn.sh
[root@hadoop01 hadoop-2.7.1]# start-yarn.sh
hadoop01查看jps:
[root@hadoop01 hadoop-2.7.1]# jps
13376 DataNode
14465 Jps
13939 ResourceManager
12853 NameNode
14229 NodeManager
13052 SecondaryNameNode
hadoop02查看jps:
[root@hadoop02 local]# jps
12790 DataNode
13110 NodeManager
13257 Jps
hadoop03查看jps:
[root@hadoop03 local]# jps
12925 DataNode
13214 Jps
13071 NodeManager
web ui监控,在网页上:http://192.168.216.111:8088
yarn启动后,跑一个mapreduce的作业:
[跑一个作业,是在测试:启动的yarn能否应用到集群上]
yarn jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount / README.txt /out/00
用默认的jar跑一个作业wordcount(对某个文件单词出现的频率进行统计);输入数据的目录,此时如果是集群在跑数据时输入的数据一定是hdfs文件系统的数据,hdfs文件系统,刚才上传了一个README.txt;输出:out
[root@hadoop01 hadoop-2.7.1]# yarn jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /README.txt /out/00
18/04/26 09:57:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/04/26 09:57:43 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.216.111:8032
18/04/26 09:57:46 INFO input.FileInputFormat: Total input paths to process : 1
18/04/26 09:57:46 INFO mapreduce.JobSubmitter: number of splits:1
18/04/26 09:57:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1524706682045_0001
18/04/26 09:57:48 INFO impl.YarnClientImpl: Submitted application application_1524706682045_0001
18/04/26 09:57:48 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1524706682045_0001/
18/04/26 09:57:48 INFO mapreduce.Job: Running job: job_1524706682045_0001
18/04/26 09:58:16 INFO mapreduce.Job: Job job_1524706682045_0001 running in uber mode : false
18/04/26 09:58:16 INFO mapreduce.Job: map 0% reduce 0% (这个能跑出来,证明yarn集群也搭建好了)
18/04/26 09:58:40 INFO mapreduce.Job: map 100% reduce 0% (map端100%执行完,等待reduce端的执行)
18/04/26 09:58:59 INFO mapreduce.Job: map 100% reduce 100% (reduce端执行完成后整个作业就执行完成了)
………………
根目录下之前没有out,查看是否创建出来:
[root@hadoop01 hadoop-2.7.1]# hdfs dfs -ls /out
18/04/26 10:11:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x - root supergroup 0 2018-04-26 09:58 /out/00
查看out里面的内容:
[root@hadoop01 hadoop-2.7.1]# hdfs dfs -ls /out/00
18/04/26 10:00:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 3 root supergroup 0 2018-04-26 09:58 /out/00/_SUCCESS
-rw-r--r-- 3 root supergroup 1306 2018-04-26 09:58 /out/00/part-r-00000
_SUCCESS :成功标志文件 part-r-00000 :结果文件
读part-r-00000结果文件:
[root@hadoop01 hadoop-2.7.1]# hdfs dfs -cat /out/00/part-r-00000
18/04/26 10:01:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
(BIS), 1
(ECCN) 1
(TSU) 1
(see 1
5D002.C.1, 1
740.13) 1
<http://www.wassenaar.org/> 1
Administration 1
Apache 1
BEFORE 1
BIS 1
Bureau 1
Commerce, 1
Commodity 1
Control 1
Core 1
Department 1
ENC 1
Exception 1
Export 2
For 1
Foundation 1
Government 1
Hadoop 1
Hadoop, 1
Industry 1
Jetty 1
License 1
Number 1
Regulations, 1
SSL 1
Section 1
Security 1
See 1
Software 2
Technology 1
The 4
This 1
U.S. 1
Unrestricted 1
about 1
algorithms. 1
and 6
and/or 1
another 1
any 1
as 1
asymmetric 1
at: 2
both 1
by 1
check 1
classified 1
code 1
code. 1
concerning 1
country 1
country's 1
country, 1
cryptographic 3
currently 1
details 1
distribution 2
eligible 1
encryption 3
exception 1
export 1
following 1
for 3
form 1
from 1
functions 1
has 1
have 1
http://hadoop.apache.org/core/ 1
http://wiki.apache.org/hadoop/ 1
if 1
import, 2
in 1
included 1
includes 2
information 2
information. 1
is 1
it 1
latest 1
laws, 1
libraries 1
makes 1
manner 1
may 1
more 2
mortbay.org. 1
object 1
of 5
on 2
or 2
our 2
performing 1
permitted. 1
please 2
policies 1
possession, 2
project 1
provides 1
re-export 2
regulations 1
reside 1
restrictions 1
security 1
see 1
software 2
software, 2
software. 2
software: 1
source 1
the 8
this 3
to 2
under 1
use, 2
uses 1
using 2
visit 1
website 1
which 2
wiki, 1
with 1
written 1
you 1
your 1
此时把上传的hdfs文件系统的README.txt文件,里面的每一个单词出现的频率都统计了出来。
hdfs、yarn模块的启动,集群的测试OK↑