Hadoop安装杂记(1)

一、Hadoop基础

1、伪分布式模型(单节点)

1.1 配置centos7默认JDK1.7的环境变量

[root@master1 ~]# vim /etc/profile.d/java.sh
i
export JAVA_HOME=/usr

[root@master1 ~]# source /etc/profile.d/java.sh

安装jdk-devl包:
[root@master1 ~]# yum install java-1.7.0-openjdk-devel.x86_64

1.2 创建hadoop目录,并将hadoop展开至目录

[root@master1 ~]# mkdir /bdapps
[root@master1 ~]# tar xf hadoop-2.6.2.tar.gz -C /bdapps/

[root@master1 ~]# cd /bdapps/
创建软链接:
[root@master1 bdapps]# ln -sv hadoop-2.6.2 hadoop

1.2 设置hadoop环境变量

[root@master1 hadoop]# vim /etc/profile.d/hadoop.sh

export HADOOP_PREFIX=/bdapps/hadoop
export PATH=$PATH:${HADOOP_PREFIX}/bin:${HADOOP_PREFIX}/sbin
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
export HADOOP_MAPPERD_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}

重载文件:
[root@master1 ~]# source /etc/profile.d/hadoop.sh

1.3 创建运行Hadoop进程的用户和相关目录

创建组
[root@master1 ~]# groupadd hadoop
创建用户,划入hadoop组
[root@master1 ~]# useradd -g hadoop yarn
[root@master1 ~]# useradd -g hadoop hdfs
[root@master1 ~]# useradd -g hadoop mapred

创建数据目录:
[root@master1 ~]# mkdir -pv /data/hadoop/hdfs/{nn,snn,dn}
数据目录授权:
[root@master1 ~]# chown -R hdfs:hadoop /data/hadoop/hdfs
[root@master1 ~]# ll /data/hadoop/hdfs
total 0
drwxr-xr-x 2 hdfs hadoop 6 Apr 19 08:44 dn
drwxr-xr-x 2 hdfs hadoop 6 Apr 19 08:44 nn
drwxr-xr-x 2 hdfs hadoop 6 Apr 19 08:44 snn

创建日志目录并配置用户权限(在安装目录下配置):
[root@master1 ~]# cd /bdapps/hadoop
[root@master1 hadoop]# mkdir logs
[root@master1 hadoop]# chmod g+w logs/
[root@master1 hadoop]# chown -R yarn:hadoop logs
[root@master1 hadoop]# ll | grep log
drwxrwxr-x 2 yarn  hadoop     6 Apr 19 08:47 logs

修改安装目录属主属组
[root@master1 hadoop]# chown -R yarn:hadoop ./*

1.4 配置hadoop

配置NS:
[root@master1 hadoop]# pwd
/bdapps/hadoop/etc/hadoop
[root@master1 hadoop]# vim core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:8020</value>
        <final>true</final>
    </property>
</configuration>

配置hdfs相关属性:
[root@master1 hadoop]# vim hdfs-site.xml 

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///data/hadoop/hdfs/nn</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///data/hadoop/hdfs/dn</value>
    </property>
    <property>
        <name>fs.checkpoint.dir</name>
        <value>file:///data/hadoop/hdfs/snn</value>
    </property>
    <property>
        <name>fs.checkpoint.edits.dir</name>
        <value>file:///data/hadoop/hdfs/snn</value>
    </property>
</configuration>

配置mapred(MapReduce)
[root@master1 hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@master1 hadoop]# vim mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

配置yarn:
[root@master1 hadoop]# vim yarn-site.xml 

<configuration>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>localhost:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>localhost:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>localhost:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>localhost:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>localhost:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>localhost:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>10.201.106.131:8088</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    </property>
</configuration>

1.5 定义从节点,伪分布模式默认从节点是自己,不用定义

[root@master1 hadoop]# cat slaves 
localhost

1.6 格式化HDFS

切换hdfs用户:
[root@master1 ~]# su - hdfs

hdfs命令查看帮助:
[hdfs@master1 ~]$ hdfs --help

格式化:
[hdfs@master1 ~]$ hdfs namenode -format
查看:
[hdfs@master1 ~]$ ls /data/hadoop/hdfs/nn/current/
fsimage_0000000000000000000      seen_txid
fsimage_0000000000000000000.md5  VERSION

1.7 启动hadoop

1.7.1 mapreduce相关启动

以hdfs用户启动相关进程:
启动名称节点:
[hdfs@master1 ~]$ hadoop-daemon.sh start namenode

查看java进程:
[hdfs@master1 ~]$ jps
9127 NameNode
9220 Jps
查看详细java进程信息:
[hdfs@master1 ~]$ jps -v

启动辅助名称节点:
[hdfs@master1 ~]$ hadoop-daemon.sh start secondarynamenode

启动data节点:
[hdfs@master1 ~]$ hadoop-daemon.sh start datanode

远程上传文件测试:
[hdfs@master1 ~]$ hdfs dfs -mkdir /test
[hdfs@master1 ~]$ hdfs dfs -put /etc/fstab /test/fstab
[hdfs@master1 ~]$ hdfs dfs -ls /test
Found 1 items
-rw-r--r--   1 hdfs supergroup       1065 2018-04-20 15:04 /test/fstab

这个就是刚才上传fstab文件
[hdfs@master1 ~]$ cat /data/hadoop/hdfs/dn/current/BP-908063675-10.201.106.131-1524136482474/current/finalized/subdir0/subdir0/blk_1073741825

本地宿主机存放数据的目录(文件系统):
[hdfs@master1 ~]$ ls /data/hadoop/hdfs/dn/current/
BP-908063675-10.201.106.131-1524136482474  VERSION

1.7.2 yarn集群启动

切换到yarn用户:
[root@master1 ~]# su - yarn

启动resourcemanager:
[yarn@master1 ~]$ yarn-daemon.sh start resourcemanager

启动nodemanager:
[yarn@master1 ~]$ yarn-daemon.sh start nodemanager

1.8 查看hadoop状态

浏览器访问:http://10.201.106.131:50070

Hadoop安装杂记(1)

浏览器访问:http://10.201.106.131:8088

Hadoop安装杂记(1)

1.9 hadoop上提交程序并运行

1.9.1 运行mapreduce测试程序

切换用户:
[root@master1 mapreduce]# su - hdfs

运行测试程序:
[hdfs@master1 ~]$ yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar 

统计单词个数:
[hdfs@master1 ~]$ yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /test/fstab /test/fstab.out
查看统计结果:
[hdfs@master1 ~]$ hdfs dfs -cat /test/fstab.out/part-r-00000

猜你喜欢

转载自blog.51cto.com/zhongle21/2106524