大数据技术之Hadoop(分布式集群搭建与HDFS命令)

大数据技术之Hadoop(分布式集群搭建与HDFS命令)

1.分布式集群搭建

1.1集群部署规划
集群部署规划1.2集群的搭建
1.2.1 创建三台新虚拟机并配置好网络(上篇博客中讲过)
1.2.2 创建三个新的hadoop111、hadoop112、hadoop113(注意IP不要一样)
1.2.3 三台连接到Xshell(注意:IP名字对应)
1.2.4 三台分别下载vim:

          yum install vim

1.2.5分别修改三台主机名(hadoop111、hadoop112、hadoop113)并重启

          vim /etc/hostname 
          reboot

新主机名
1.2.6配置映射(三台都要配置)(ip+主机名)

         vim /etc/hosts

映射1.2.7其中一台配置免密并拷贝给其他两台

         ssh-keygen -t rsa在根目录下
         复制公钥:
           ssh-copy–id hadoop111
           ssh-copy–id hadoop112
           ssh-copy–id hadoop113
        (都在一个里)

免密原理
1.2.8三台分别创建目录hadoop、java

         cd /usr/local  (进入目录)
         ll(查看目录信息)
         mkdir hadoop
         mkdir java      (创建目录)
        (也可以一起,点工具下的发送键到所有会话,弄完关闭)

1.2.9在主机进行hadoop配置

         cd hadhoop/     (切换到Hadoop目录下)
         ll(查看目录信息)
         tar –zxvfhadoop -2.9.2.tar.gz (解压Hadoop)
         rm –rf hadoop -2.9.2.tar.gz (删除Hadoop解压包)
         cd ../java (到java目录下)
         tar –zxvf jdk-8uzll-linux-x64.tar.gz解压java)
         rm –rf jdk-8uzll-linux-x64.tar.gz (删除java解压包)
         cd /usr/local/hadoop/hadoop-2.9.21/etc/hadoop/  (进入Hadoop目录下)

hadoop–env.sh

         vim hadoop–env.sh
         第25行改Java路径
         export JAVA_HOME=/usr/local/java/jdk1.8.0_211/

core –site.xml

         vim core –site.xml
<!--指定HADOOP所使用的文件系统schema(URI),HDFS的老大(NameNode)的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.221.111:9000</value>
</property>
<!--指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/hadoop-2.9.2/tmp</value>
</property>

hdfs–site.xml

         vim hdfs–site.xml
<!--指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop113:50090</value>
</property>
<!--获取文件夹写入权限-->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>

slaves(同时启动)

         vim slaves在/etc/hadoop/目录下进行

slaves

扫描二维码关注公众号,回复: 8967364 查看本文章

1.2.10在主机进行环境变量配置

         vim /etc/profile (进入vim编辑器改路径)
         export JAVA_HOME=/usr/local/java/jdk1.8.0_211
         export HADOOP_HOME=/usr/local/hadoop/hadoop-2.9.2
         export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
         配置文件生效:source /etc/profile

1.2.11复制主机的hadoop到另外两台

          (到hadoop目录下)/usr/local/hadoop
          scp -r hadoop-2.9.2/  hadoop112:$PWD
          scp -r hadoop-2.9.2/  hadoop113:$PWD

1.2.12复制主机的java到另外两台

         (到java目录下)/usr/local/java
          scp -r jdk1.8.0_211/  hadoop112:$PWD
          scp -r jdk1.8.0_211/  hadoop113:$PWD

1.2.13复制主机的环境变量到另外两台(分别生效)

         scp-r /etc/profile hadoop112:/etc/在根目录执行
         scp-r /etc/profile hadoop113:/etc/ 在根目录下执行
         再生效环境变量 source  /etc/profile    在根目录下执行

1.2.14在主机进行格式化与启动

         格式化: hadoop namenode –format根目录下执行
         启动: start-dfs.sh

1.2.15在三台分别输入jps验证是否成功
主机namenode、datanode、jps
第一台
第二台datanode、jps
第二台
第三台SecondaryNamenode、datanode、jps
第三台
1.2.16 谷歌访问http://192.168.222.111:50070/dfshealth.html#tab-datanode
192.168.222.111:50070

2.HDFS概述与命令

2.1HDFS产出背景及定义
HDFS概述
2.2HDFS优缺点
优点
缺点
2.3HDFS组成架构
HDFS组成架构HDFS组成架构
2.4 HDFS文件块大小
HDFS文件块大小
HDFS文件块大小
2.5 HDFS命令
2.5.1 hadoop fs

[root@hadoop111 hadoop]# hadoop fs
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] <path> ...]
	[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] [-x] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-truncate [-w] <length> <path> ...]
	[-usage [cmd ...]]

Generic options supported are:
-conf <configuration file>        specify an application configuration file
-D <property=value>               define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>  specify a ResourceManager
-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]

2.5.2 -help:输出这个命令参数

[root@hadoop111 hadoop]# hadoop -help
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
                       Hadoop jar and the required libraries
  credential           interact with credential providers
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.
[root@hadoop111 hadoop]# hadoop -help
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
                       Hadoop jar and the required libraries
  credential           interact with credential providers
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

2.5.3 -ls: 显示目录信息

       [root@hadoop111 hadoop]# hadoop fs -ls /

2.5.4 -mkdir:在HDFS上创建目录

       [root@hadoop111 hadoop]# hadoop fs -mkdir -p /sanguo/shuguo

2.5.5 -moveFromLocal:从本地剪切粘贴到HDFS

       [root@hadoop111 hadoop]# hadoop fs  -moveFromLocal  ./kongming.txt  /sanguo/shuguo

2.5.6 -appendToFile:追加一个文件到已经存在的文件末尾

       [root@hadoop111 hadoop]# hadoop fs -appendToFile liubei.txt /sanguo/shuguo/kongming.txt

2.5.7 -cat:显示文件内容

       [root@hadoop111 hadoop]# hadoop fs -cat /sanguo/shuguo/kongming.txt

2.5.8 -chgrp 、-chmod、-chown:Linux文件系统中的用法一样,修改文件所属权限

       [root@hadoop111 hadoop]# hadoop fs  -chmod  666  /sanguo/shuguo/kongming.txt
       [root@hadoop111 hadoop]# hadoop fs  -chown  atguigu:atguigu   /sanguo/shuguo/kongming.txt

2.5.9 -copyFromLocal:从本地文件系统中拷贝文件到HDFS路径去

       [root@hadoop111 hadoop]# hadoop fs -copyFromLocal README.txt /

2.5.10 -copyToLocal:从HDFS拷贝到本地

       [root@hadoop111 hadoop]# hadoop fs -copyToLocal /sanguo/shuguo/kongming.txt ./

2.5.11 -cp :从HDFS的一个路径拷贝到HDFS的另一个路径

       [root@hadoop111 hadoop]# hadoop fs -cp /sanguo/shuguo/kongming.txt /zhuge.txt

2.5.12 -mv:在HDFS目录中移动文件

      [root@hadoop111 hadoop]# hadoop fs -mv /zhuge.txt /sanguo/shuguo/

2.5.13 -get:等同于copyToLocal,就是从HDFS下载文件到本地

      [root@hadoop111 hadoop]# hadoop fs -get /sanguo/shuguo/kongming.txt ./

2.5.14 -getmerge:合并下载多个文件,比如HDFS的目录 /user/atguigu/test下有多个文件:log.1, log.2,log.3,…

      [root@hadoop111 hadoop]# hadoop fs -getmerge /user/atguigu/test/* ./zaiyiqi.txt

2.5.15 -put:等同于copyFromLocal

      [root@hadoop111 hadoop]# hadoop fs -put ./zaiyiqi.txt /user/atguigu/test/

2.5.16 -tail:显示一个文件的末尾

      [root@hadoop111 hadoop]# hadoop fs -tail /sanguo/shuguo/kongming.txt

2.5.17 -rm:删除文件

      [root@hadoop111 hadoop]# hadoop fs -rm /user/atguigu/test/jinlian2.txt

2.5.18 -rmdir:删除空目录

      [root@hadoop111 hadoop]# hadoop fs -rmdir /test

2.5.19 -du统计文件夹的大小信息

      [root@hadoop111 hadoop]# hadoop fs -du  -h /user/atguigu/test

2.5.20 -setrep:设置HDFS中文件的副本数量

      [root@hadoop111 hadoop]# hadoop fs -setrep 10 /sanguo/shuguo/kongming.txt
      这里设置的副本数只是记录在NameNode的元数据中,是否真的会有这么多副本,还得看DataNode的数量。因为目前只有3台设备,最多也就3个副本,只有节点数的增加到10台时,副本数才能达到10。
发布了5 篇原创文章 · 获赞 2 · 访问量 326

猜你喜欢

转载自blog.csdn.net/weixin_45553177/article/details/104159594
今日推荐