大数据技术之Hadoop(分布式集群搭建与HDFS命令)
1.分布式集群搭建
1.1集群部署规划
1.2集群的搭建
1.2.1 创建三台新虚拟机并配置好网络(上篇博客中讲过)
1.2.2 创建三个新的hadoop111、hadoop112、hadoop113(注意IP不要一样)
1.2.3 三台连接到Xshell(注意:IP名字对应)
1.2.4 三台分别下载vim:
yum install vim
1.2.5分别修改三台主机名(hadoop111、hadoop112、hadoop113)并重启
vim /etc/hostname
reboot
1.2.6配置映射(三台都要配置)(ip+主机名)
vim /etc/hosts
1.2.7其中一台配置免密并拷贝给其他两台
ssh-keygen -t rsa在根目录下
复制公钥:
ssh-copy–id hadoop111
ssh-copy–id hadoop112
ssh-copy–id hadoop113
(都在一个里)
1.2.8三台分别创建目录hadoop、java
cd /usr/local (进入目录)
ll(查看目录信息)
mkdir hadoop
mkdir java (创建目录)
(也可以一起,点工具下的发送键到所有会话,弄完关闭)
1.2.9在主机进行hadoop配置
cd hadhoop/ (切换到Hadoop目录下)
ll(查看目录信息)
tar –zxvfhadoop -2.9.2.tar.gz (解压Hadoop)
rm –rf hadoop -2.9.2.tar.gz (删除Hadoop解压包)
cd ../java (到java目录下)
tar –zxvf jdk-8uzll-linux-x64.tar.gz解压java)
rm –rf jdk-8uzll-linux-x64.tar.gz (删除java解压包)
cd /usr/local/hadoop/hadoop-2.9.21/etc/hadoop/ (进入Hadoop目录下)
hadoop–env.sh
vim hadoop–env.sh
第25行改Java路径
export JAVA_HOME=/usr/local/java/jdk1.8.0_211/
core –site.xml
vim core –site.xml
<!--指定HADOOP所使用的文件系统schema(URI),HDFS的老大(NameNode)的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.221.111:9000</value>
</property>
<!--指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/hadoop-2.9.2/tmp</value>
</property>
hdfs–site.xml
vim hdfs–site.xml
<!--指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop113:50090</value>
</property>
<!--获取文件夹写入权限-->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
slaves(同时启动)
vim slaves在/etc/hadoop/目录下进行
1.2.10在主机进行环境变量配置
vim /etc/profile (进入vim编辑器改路径)
export JAVA_HOME=/usr/local/java/jdk1.8.0_211
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.9.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
配置文件生效:source /etc/profile
1.2.11复制主机的hadoop到另外两台
(到hadoop目录下)/usr/local/hadoop
scp -r hadoop-2.9.2/ hadoop112:$PWD
scp -r hadoop-2.9.2/ hadoop113:$PWD
1.2.12复制主机的java到另外两台
(到java目录下)/usr/local/java
scp -r jdk1.8.0_211/ hadoop112:$PWD
scp -r jdk1.8.0_211/ hadoop113:$PWD
1.2.13复制主机的环境变量到另外两台(分别生效)
scp-r /etc/profile hadoop112:/etc/在根目录执行
scp-r /etc/profile hadoop113:/etc/ 在根目录下执行
再生效环境变量 source /etc/profile 在根目录下执行
1.2.14在主机进行格式化与启动
格式化: hadoop namenode –format根目录下执行
启动: start-dfs.sh
1.2.15在三台分别输入jps验证是否成功
主机namenode、datanode、jps
第二台datanode、jps
第三台SecondaryNamenode、datanode、jps
1.2.16 谷歌访问http://192.168.222.111:50070/dfshealth.html#tab-datanode
2.HDFS概述与命令
2.1HDFS产出背景及定义
2.2HDFS优缺点
2.3HDFS组成架构
2.4 HDFS文件块大小
2.5 HDFS命令
2.5.1 hadoop fs
[root@hadoop111 hadoop]# hadoop fs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] <path> ...]
[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] [-x] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-truncate [-w] <length> <path> ...]
[-usage [cmd ...]]
Generic options supported are:
-conf <configuration file> specify an application configuration file
-D <property=value> define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port> specify a ResourceManager
-files <file1,...> specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...> specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...> specify a comma-separated list of archives to be unarchived on the compute machines
The general command line syntax is:
command [genericOptions] [commandOptions]
2.5.2 -help:输出这个命令参数
[root@hadoop111 hadoop]# hadoop -help
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
credential interact with credential providers
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
[root@hadoop111 hadoop]# hadoop -help
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
credential interact with credential providers
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
2.5.3 -ls: 显示目录信息
[root@hadoop111 hadoop]# hadoop fs -ls /
2.5.4 -mkdir:在HDFS上创建目录
[root@hadoop111 hadoop]# hadoop fs -mkdir -p /sanguo/shuguo
2.5.5 -moveFromLocal:从本地剪切粘贴到HDFS
[root@hadoop111 hadoop]# hadoop fs -moveFromLocal ./kongming.txt /sanguo/shuguo
2.5.6 -appendToFile:追加一个文件到已经存在的文件末尾
[root@hadoop111 hadoop]# hadoop fs -appendToFile liubei.txt /sanguo/shuguo/kongming.txt
2.5.7 -cat:显示文件内容
[root@hadoop111 hadoop]# hadoop fs -cat /sanguo/shuguo/kongming.txt
2.5.8 -chgrp 、-chmod、-chown:Linux文件系统中的用法一样,修改文件所属权限
[root@hadoop111 hadoop]# hadoop fs -chmod 666 /sanguo/shuguo/kongming.txt
[root@hadoop111 hadoop]# hadoop fs -chown atguigu:atguigu /sanguo/shuguo/kongming.txt
2.5.9 -copyFromLocal:从本地文件系统中拷贝文件到HDFS路径去
[root@hadoop111 hadoop]# hadoop fs -copyFromLocal README.txt /
2.5.10 -copyToLocal:从HDFS拷贝到本地
[root@hadoop111 hadoop]# hadoop fs -copyToLocal /sanguo/shuguo/kongming.txt ./
2.5.11 -cp :从HDFS的一个路径拷贝到HDFS的另一个路径
[root@hadoop111 hadoop]# hadoop fs -cp /sanguo/shuguo/kongming.txt /zhuge.txt
2.5.12 -mv:在HDFS目录中移动文件
[root@hadoop111 hadoop]# hadoop fs -mv /zhuge.txt /sanguo/shuguo/
2.5.13 -get:等同于copyToLocal,就是从HDFS下载文件到本地
[root@hadoop111 hadoop]# hadoop fs -get /sanguo/shuguo/kongming.txt ./
2.5.14 -getmerge:合并下载多个文件,比如HDFS的目录 /user/atguigu/test下有多个文件:log.1, log.2,log.3,…
[root@hadoop111 hadoop]# hadoop fs -getmerge /user/atguigu/test/* ./zaiyiqi.txt
2.5.15 -put:等同于copyFromLocal
[root@hadoop111 hadoop]# hadoop fs -put ./zaiyiqi.txt /user/atguigu/test/
2.5.16 -tail:显示一个文件的末尾
[root@hadoop111 hadoop]# hadoop fs -tail /sanguo/shuguo/kongming.txt
2.5.17 -rm:删除文件
[root@hadoop111 hadoop]# hadoop fs -rm /user/atguigu/test/jinlian2.txt
2.5.18 -rmdir:删除空目录
[root@hadoop111 hadoop]# hadoop fs -rmdir /test
2.5.19 -du统计文件夹的大小信息
[root@hadoop111 hadoop]# hadoop fs -du -h /user/atguigu/test
2.5.20 -setrep:设置HDFS中文件的副本数量
[root@hadoop111 hadoop]# hadoop fs -setrep 10 /sanguo/shuguo/kongming.txt
这里设置的副本数只是记录在NameNode的元数据中,是否真的会有这么多副本,还得看DataNode的数量。因为目前只有3台设备,最多也就3个副本,只有节点数的增加到10台时,副本数才能达到10。