Hadoop学习笔记:Hadoop安装(伪分布式)

Hadoop的伪分布式安装,参考:http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation

配置文件

# 这里列出了Hadoop全部的配置文件,无论是伪分布式还是完全分布式实际都是通过这些配置文件实现
# $HADOOP_HOME/etc/hadoop/
├── capacity-scheduler.xml
├── configuration.xsl
├── container-executor.cfg
├── core-site.xml
├── hadoop-env.cmd
├── hadoop-env.sh
├── hadoop-metrics2.properties
├── hadoop-metrics.properties
├── hadoop-policy.xml
├── hdfs-site.xml
├── httpfs-env.sh
├── httpfs-log4j.properties
├── httpfs-signature.secret
├── httpfs-site.xml
├── kms-acls.xml
├── kms-env.sh
├── kms-log4j.properties
├── kms-site.xml
├── log4j.properties
├── mapred-env.cmd
├── mapred-env.sh
├── mapred-queues.xml.template
├── mapred-site.xml.template
├── slaves
├── ssl-client.xml.example
├── ssl-server.xml.example
├── yarn-env.sh
└── yarn-site.xml

配置HDFS

# etc/hadoop/core-site.xml
<configuration>
    <!-- 在/etc/hosts中将主机名映射为主机IP -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://v108.zlikun.com:9000</value>
    </property>
    <!-- 该配置是Hadoop文件系统的基本配置,默认位置在/tmp/{$user}目录下,在很多Linux发行版中系统一旦重启,该目录下的文件就会丢失,从而需要重新格式化,因此有必须配置一个固定目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/var/hadoop/tmp</value>
    </property>
    <!-- 关闭文件系统权限验证,仅用于非生产环境 -->
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
</configuration>

# etc/hadoop/hdfs-site.xml
<configuration>
    <!-- 配置数据副本数,默认值:3,这里配置为1表示不使用副本 -->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <!-- NameNode节点RPC通讯地址 -->
    <property>
        <name>dfs.namenode.rpc-address</name>
        <value>v108.zlikun.com:9000</value>
    </property>
    <!-- 使本机之外可以访问到HDFS -->
    <property>
        <name>dfs.namenode.rpc-bind-host</name>
        <value>0.0.0.0</value>
    </property>
</configuration>

# 上述配置全部配置项参考:
# http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-common/core-default.xml
# http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

# Hadoop集群启动之后,NameNode是通过SSH来启动和停止各个节点上的各种守护进程的,所以在节点之间执行指令的时候不能有密码
# 配置SSH免密登录
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

# 执行格式化,注意输出日志中出现 ` Storage directory /var/hadoop/tmp/dfs/name has been successfully formatted.` 语句时,说明格式化成功
$ bin/hdfs namenode -format
18/01/30 08:50:38 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = v108.zlikun.com/192.168.1.108
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.5
STARTUP_MSG:   classpath = /opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:/opt/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/opt/hadoop/share/hadoop/common/lib/curator-framework-2.7.1.jar:/opt/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/opt/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/opt/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/opt/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:/opt/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/opt/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/opt/hadoop/share/hadoop/common/lib/hadoop-auth-2.7.5.jar:/opt/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/opt/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/opt/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/opt/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/opt/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/opt/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:/opt/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/opt/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:/opt/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/opt/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/opt/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/opt/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/opt/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/opt/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/opt/hadoop/share/hadoop/common/lib/activation-1.1.jar:/opt/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/opt/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/opt/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/opt/hadoop/share/hadoop/common/lib/hadoop-annotations-2.7.5.jar:/opt/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/opt/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/opt/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:/opt/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/opt/hadoop/share/hadoop/common/lib/jsch-0.1.54.jar:/opt/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/opt/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/opt/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/opt/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/opt/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/opt/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/opt/hadoop/share/hadoop/common/lib/xz-1.0.jar:/opt/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/opt/hadoop/share/hadoop/common/lib/jetty-sslengine-6.1.26.jar:/opt/hadoop/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/opt/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/opt/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/opt/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:/opt/hadoop/share/hadoop/common/lib/junit-4.11.jar:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/opt/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/opt/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/opt/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:/opt/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/opt/hadoop/share/hadoop/common/lib/asm-3.2.jar:/opt/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/opt/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/opt/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/opt/hadoop/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/opt/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/opt/hadoop/share/hadoop/common/hadoop-common-2.7.5.jar:/opt/hadoop/share/hadoop/common/hadoop-common-2.7.5-tests.jar:/opt/hadoop/share/hadoop/common/hadoop-nfs-2.7.5.jar:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/opt/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/opt/hadoop/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/opt/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/opt/hadoop/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/opt/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/opt/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/opt/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/opt/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/opt/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/opt/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/opt/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/opt/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/opt/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/opt/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/opt/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/opt/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/opt/hadoop/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/opt/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/opt/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/opt/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/opt/hadoop/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/opt/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/opt/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.5.jar:/opt/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.5-tests.jar:/opt/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/opt/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/opt/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/opt/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/opt/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/opt/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/opt/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/opt/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/opt/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/opt/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/opt/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/opt/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/opt/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/opt/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/opt/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/opt/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/opt/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/opt/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/opt/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/opt/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/opt/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/opt/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/opt/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/opt/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/opt/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/opt/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/opt/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/opt/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/opt/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/opt/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/opt/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/opt/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/opt/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/opt/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/opt/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/opt/hadoop/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/opt/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/opt/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/opt/hadoop/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/opt/hadoop/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/opt/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/opt/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/opt/hadoop/share/hadoop/mapreduce/lib/xz-1.0.jar:/opt/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/opt/hadoop/share/hadoop/mapreduce/lib/junit-4.11.jar:/opt/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/opt/hadoop/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/opt/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/opt/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.5-tests.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.5.jar:/opt/hadoop/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = https://[email protected]/repos/asf/hadoop.git -r 18065c2b6806ed4aa6a3187d77cbe21bb3dba075; compiled by 'kshvachk' on 2017-12-16T01:06Z
STARTUP_MSG:   java = 1.8.0_151
************************************************************/
18/01/30 08:50:38 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
18/01/30 08:50:38 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-a8cec172-6f1b-4aa5-8f73-f99ad2bb29b2
18/01/30 08:50:38 INFO namenode.FSNamesystem: No KeyProvider found.
18/01/30 08:50:38 INFO namenode.FSNamesystem: fsLock is fair: true
18/01/30 08:50:38 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
18/01/30 08:50:39 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
18/01/30 08:50:39 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
18/01/30 08:50:39 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
18/01/30 08:50:39 INFO blockmanagement.BlockManager: The block deletion will start around 2018 Jan 30 08:50:39
18/01/30 08:50:39 INFO util.GSet: Computing capacity for map BlocksMap
18/01/30 08:50:39 INFO util.GSet: VM type       = 64-bit
18/01/30 08:50:39 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
18/01/30 08:50:39 INFO util.GSet: capacity      = 2^21 = 2097152 entries
18/01/30 08:50:39 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
18/01/30 08:50:39 INFO blockmanagement.BlockManager: defaultReplication         = 1
18/01/30 08:50:39 INFO blockmanagement.BlockManager: maxReplication             = 512
18/01/30 08:50:39 INFO blockmanagement.BlockManager: minReplication             = 1
18/01/30 08:50:39 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
18/01/30 08:50:39 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
18/01/30 08:50:39 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
18/01/30 08:50:39 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
18/01/30 08:50:39 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)
18/01/30 08:50:39 INFO namenode.FSNamesystem: supergroup          = supergroup
18/01/30 08:50:39 INFO namenode.FSNamesystem: isPermissionEnabled = true
18/01/30 08:50:39 INFO namenode.FSNamesystem: HA Enabled: false
18/01/30 08:50:39 INFO namenode.FSNamesystem: Append Enabled: true
18/01/30 08:50:39 INFO util.GSet: Computing capacity for map INodeMap
18/01/30 08:50:39 INFO util.GSet: VM type       = 64-bit
18/01/30 08:50:39 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
18/01/30 08:50:39 INFO util.GSet: capacity      = 2^20 = 1048576 entries
18/01/30 08:50:39 INFO namenode.FSDirectory: ACLs enabled? false
18/01/30 08:50:39 INFO namenode.FSDirectory: XAttrs enabled? true
18/01/30 08:50:39 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
18/01/30 08:50:39 INFO namenode.NameNode: Caching file names occuring more than 10 times
18/01/30 08:50:39 INFO util.GSet: Computing capacity for map cachedBlocks
18/01/30 08:50:39 INFO util.GSet: VM type       = 64-bit
18/01/30 08:50:39 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
18/01/30 08:50:39 INFO util.GSet: capacity      = 2^18 = 262144 entries
18/01/30 08:50:39 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
18/01/30 08:50:39 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
18/01/30 08:50:39 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
18/01/30 08:50:39 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
18/01/30 08:50:39 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
18/01/30 08:50:39 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
18/01/30 08:50:39 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
18/01/30 08:50:39 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
18/01/30 08:50:39 INFO util.GSet: Computing capacity for map NameNodeRetryCache
18/01/30 08:50:39 INFO util.GSet: VM type       = 64-bit
18/01/30 08:50:39 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
18/01/30 08:50:39 INFO util.GSet: capacity      = 2^15 = 32768 entries
18/01/30 08:50:39 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1709412250-192.168.1.108-1517320239405
18/01/30 08:50:39 INFO common.Storage: Storage directory /var/hadoop/tmp/dfs/name has been successfully formatted.
18/01/30 08:50:39 INFO namenode.FSImageFormatProtobuf: Saving image file /var/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
18/01/30 08:50:39 INFO namenode.FSImageFormatProtobuf: Image file /var/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
18/01/30 08:50:39 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
18/01/30 08:50:39 INFO util.ExitUtil: Exiting with status 0
18/01/30 08:50:39 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at v108.zlikun.com/192.168.1.108
************************************************************/

# 查看格式化的目录
$ ls -l /var/hadoop/tmp/
total 0
drwxr-xr-x. 3 root root 18 Jan 30 08:50 dfs

运行HDFS

# 启动HDFS,启动后应有三个进程 ( 因为是伪分布式,所以各个节点都在同一台机器上 )
$ sbin/start-dfs.sh
$ jps
9091 DataNode
9242 SecondaryNameNode
8973 NameNode

# 此时应可以在在浏览器中通过URL访问HDFS信息
# http://192.168.1.108:50070/

# 如果访问不到,可能是防火墙禁用了50070的端口访问,这里选择关闭防火墙 ( 生产环境不要这样做 )
$ firewall-cmd --state
running
$ systemctl stop firewalld
$ firewall-cmd --state    
not running
# 这里直接禁用掉防火墙  ( 开机时不会自启动 )
$ systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

# 在HDFS中创建一个目录 ( 这里创建一个用户目录,Hadoop中用户目录是/user目录,这里直接使用的是root帐号 )
$ bin/hdfs dfs -mkdir -p /user/root
# 上传一个本地文件到HDFS中
$ bin/hdfs dfs -put input/lang.txt lang.txt
# 查看上传后的文件
$ bin/hdfs dfs -ls /user/root
Found 1 items
-rw-r--r--   1 root supergroup         59 2018-01-30 09:01 /user/root/lang.txt

# 运行词频统计程序,这次统计的文件位于HDFS中
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar wordcount lang.txt output
18/01/30 09:06:01 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/01/30 09:06:01 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/01/30 09:06:01 INFO input.FileInputFormat: Total input paths to process : 1
18/01/30 09:06:01 INFO mapreduce.JobSubmitter: number of splits:1
18/01/30 09:06:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local147097953_0001
18/01/30 09:06:02 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/01/30 09:06:02 INFO mapreduce.Job: Running job: job_local147097953_0001
18/01/30 09:06:02 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/01/30 09:06:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/01/30 09:06:02 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/01/30 09:06:02 INFO mapred.LocalJobRunner: Waiting for map tasks
18/01/30 09:06:02 INFO mapred.LocalJobRunner: Starting task: attempt_local147097953_0001_m_000000_0
18/01/30 09:06:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/01/30 09:06:02 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/01/30 09:06:02 INFO mapred.MapTask: Processing split: hdfs://v108.zlikun.com:9000/user/root/lang.txt:0+59
18/01/30 09:06:02 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/01/30 09:06:02 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/01/30 09:06:02 INFO mapred.MapTask: soft limit at 83886080
18/01/30 09:06:02 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/01/30 09:06:02 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/01/30 09:06:02 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/01/30 09:06:02 INFO mapred.LocalJobRunner: 
18/01/30 09:06:02 INFO mapred.MapTask: Starting flush of map output
18/01/30 09:06:02 INFO mapred.MapTask: Spilling map output
18/01/30 09:06:02 INFO mapred.MapTask: bufstart = 0; bufend = 99; bufvoid = 104857600
18/01/30 09:06:02 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214360(104857440); length = 37/6553600
18/01/30 09:06:02 INFO mapred.MapTask: Finished spill 0
18/01/30 09:06:02 INFO mapred.Task: Task:attempt_local147097953_0001_m_000000_0 is done. And is in the process of committing
18/01/30 09:06:02 INFO mapred.LocalJobRunner: map
18/01/30 09:06:02 INFO mapred.Task: Task 'attempt_local147097953_0001_m_000000_0' done.
18/01/30 09:06:02 INFO mapred.Task: Final Counters for attempt_local147097953_0001_m_000000_0: Counters: 23
        File System Counters
                FILE: Number of bytes read=296004
                FILE: Number of bytes written=586165
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=59
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=5
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=1
        Map-Reduce Framework
                Map input records=1
                Map output records=10
                Map output bytes=99
                Map output materialized bytes=92
                Input split bytes=111
                Combine input records=10
                Combine output records=7
                Spilled Records=7
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=18
                Total committed heap usage (bytes)=165744640
        File Input Format Counters 
                Bytes Read=59
18/01/30 09:06:02 INFO mapred.LocalJobRunner: Finishing task: attempt_local147097953_0001_m_000000_0
18/01/30 09:06:02 INFO mapred.LocalJobRunner: map task executor complete.
18/01/30 09:06:02 INFO mapred.LocalJobRunner: Waiting for reduce tasks
18/01/30 09:06:02 INFO mapred.LocalJobRunner: Starting task: attempt_local147097953_0001_r_000000_0
18/01/30 09:06:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/01/30 09:06:02 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/01/30 09:06:02 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@8fe6e1c
18/01/30 09:06:02 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
18/01/30 09:06:02 INFO reduce.EventFetcher: attempt_local147097953_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
18/01/30 09:06:02 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local147097953_0001_m_000000_0 decomp: 88 len: 92 to MEMORY
18/01/30 09:06:02 INFO reduce.InMemoryMapOutput: Read 88 bytes from map-output for attempt_local147097953_0001_m_000000_0
18/01/30 09:06:02 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 88, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->88
18/01/30 09:06:02 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
18/01/30 09:06:02 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/01/30 09:06:02 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
18/01/30 09:06:02 INFO mapred.Merger: Merging 1 sorted segments
18/01/30 09:06:02 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes
18/01/30 09:06:02 INFO reduce.MergeManagerImpl: Merged 1 segments, 88 bytes to disk to satisfy reduce memory limit
18/01/30 09:06:02 INFO reduce.MergeManagerImpl: Merging 1 files, 92 bytes from disk
18/01/30 09:06:02 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
18/01/30 09:06:02 INFO mapred.Merger: Merging 1 sorted segments
18/01/30 09:06:02 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
        at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
18/01/30 09:06:02 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes
18/01/30 09:06:02 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/01/30 09:06:02 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
18/01/30 09:06:02 INFO mapred.Task: Task:attempt_local147097953_0001_r_000000_0 is done. And is in the process of committing
18/01/30 09:06:02 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/01/30 09:06:02 INFO mapred.Task: Task attempt_local147097953_0001_r_000000_0 is allowed to commit now
18/01/30 09:06:02 INFO output.FileOutputCommitter: Saved output of task 'attempt_local147097953_0001_r_000000_0' to hdfs://v108.zlikun.com:9000/user/root/output/_temporary/0/task_local147097953_0001_r_000000
18/01/30 09:06:02 INFO mapred.LocalJobRunner: reduce > reduce
18/01/30 09:06:02 INFO mapred.Task: Task 'attempt_local147097953_0001_r_000000_0' done.
18/01/30 09:06:02 INFO mapred.Task: Final Counters for attempt_local147097953_0001_r_000000_0: Counters: 29
        File System Counters
                FILE: Number of bytes read=296220
                FILE: Number of bytes written=586257
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=59
                HDFS: Number of bytes written=58
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
        Map-Reduce Framework
                Combine input records=0
                Combine output records=0
                Reduce input groups=7
                Reduce shuffle bytes=92
                Reduce input records=7
                Reduce output records=7
                Spilled Records=7
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=5
                Total committed heap usage (bytes)=165744640
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Output Format Counters 
                Bytes Written=58
18/01/30 09:06:02 INFO mapred.LocalJobRunner: Finishing task: attempt_local147097953_0001_r_000000_0
18/01/30 09:06:02 INFO mapred.LocalJobRunner: reduce task executor complete.
18/01/30 09:06:03 INFO mapreduce.Job: Job job_local147097953_0001 running in uber mode : false
18/01/30 09:06:03 INFO mapreduce.Job:  map 100% reduce 100%
18/01/30 09:06:03 INFO mapreduce.Job: Job job_local147097953_0001 completed successfully
18/01/30 09:06:03 INFO mapreduce.Job: Counters: 35
        File System Counters
                FILE: Number of bytes read=592224
                FILE: Number of bytes written=1172422
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=118
                HDFS: Number of bytes written=58
                HDFS: Number of read operations=13
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=4
        Map-Reduce Framework
                Map input records=1
                Map output records=10
                Map output bytes=99
                Map output materialized bytes=92
                Input split bytes=111
                Combine input records=10
                Combine output records=7
                Reduce input groups=7
                Reduce shuffle bytes=92
                Reduce input records=7
                Reduce output records=7
                Spilled Records=14
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=23
                Total committed heap usage (bytes)=331489280
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=59
        File Output Format Counters 
                Bytes Written=58

# 查看统计结果
$ bin/hdfs dfs -cat output/*
erlang  1
golang  1
java    3
javascript      1
lua     1
ruby    1
rust    2

# 停止HDFS
$ sbin/stop-dfs.sh 
Stopping namenodes on [v108.zlikun.com]
v108.zlikun.com: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode

配置YARN

# 复制 mapred-site.xml.template 为 mapred-site.xml
# etc/hadoop/mapred-site.xml ,配置 mapreduce任务由YARN来调度
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

# etc/hadoop/yarn-site.xml
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <!-- hadoop 2.7.5,默认:8192MB内存,这里调整为4096MB内存(不能小于1024MB) -->
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>4096</value>
    </property>
    <!-- hadoop 2.7.5,默认:8核,这里调整为2核,我是虚拟机,我给分配了2 核 -->
    <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>2</value>
    </property>
    <!-- 开启日志 -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <!-- 日志保留时间,单位:秒,这里是7天 -->
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
    <!-- 日志目录,注意:实际日志是存放于HDFS里的 -->
    <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/tmp/logs</value>
    </property>
</configuration>

# 上述全部配置参考:
# http://hadoop.apache.org/docs/r2.7.5/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
# http://hadoop.apache.org/docs/r2.7.5/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

运行YARN

# 启动HDFS和YARN
$ sbin/start-dfs.sh
$ sbin/start-yarn.sh

# 查看进程 ( 应有5个进程 )
$ jps
10977 NameNode
11523 NodeManager
11101 DataNode
11262 SecondaryNameNode
11407 ResourceManager

# 同样,YARN可以通过浏览器来访问其状态信息
# http://192.168.1.108:8088/cluster

# 删除之前生成的文件 ( MapReduce程序输出的目录不能是系统已存在的目录 ),下面将重新执行词频统计程序
$ bin/hdfs dfs -rm -r output
18/01/30 09:10:18 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted output

# 重新运行词频统计程序,这次由YARN来调度执行
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar wordcount lang.txt output
18/01/30 10:18:28 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/01/30 10:18:29 INFO input.FileInputFormat: Total input paths to process : 1
18/01/30 10:18:29 INFO mapreduce.JobSubmitter: number of splits:1
18/01/30 10:18:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517321368440_0002
18/01/30 10:18:29 INFO impl.YarnClientImpl: Submitted application application_1517321368440_0002
18/01/30 10:18:29 INFO mapreduce.Job: The url to track the job: http://v108.zlikun.com:8088/proxy/application_1517321368440_0002/
18/01/30 10:18:29 INFO mapreduce.Job: Running job: job_1517321368440_0002
18/01/30 10:18:38 INFO mapreduce.Job: Job job_1517321368440_0002 running in uber mode : false
18/01/30 10:18:38 INFO mapreduce.Job:  map 0% reduce 0%
18/01/30 10:18:44 INFO mapreduce.Job:  map 100% reduce 0%
18/01/30 10:18:49 INFO mapreduce.Job:  map 100% reduce 100%
18/01/30 10:18:50 INFO mapreduce.Job: Job job_1517321368440_0002 completed successfully
18/01/30 10:18:50 INFO mapreduce.Job: Counters: 49
        File System Counters
                ... ...
        Job Counters 
                ... ...
        Map-Reduce Framework
                Map input records=1
                Map output records=10
                Map output bytes=99
                Map output materialized bytes=92
                Input split bytes=111
                Combine input records=10
                Combine output records=7
                Reduce input groups=7
                Reduce shuffle bytes=92
                Reduce input records=7
                Reduce output records=7
                Spilled Records=14
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=146
                CPU time spent (ms)=1900
                Physical memory (bytes) snapshot=328867840
                Virtual memory (bytes) snapshot=4159520768
                Total committed heap usage (bytes)=219676672
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=59
        File Output Format Counters 
                Bytes Written=58

界面预览

HDFS 管理界面
HDFS 管理界面
HDFS 文件浏览器


YARN 监控界面

由于是笔记性质的博客,所以写了很多注释,其中有谬误之处,请读者留言指出,我好修改。

猜你喜欢

转载自my.oschina.net/zhanglikun/blog/1615849