centos7安装hadoop3.0.3和jdk1.8的伪分布式模式

centos7安装hadoop3.0.3和jdk1.8的伪分布式模式

添加普通用户hadoop

useradd hadoop
passwd hadoop
1

给hadoop用户sudo权限

chmod u+w /etc/sudoers
vi /etc/sudoers
添加
hadoop ALL=(ALL) ALL
或者
hadoop ALL=(root) NOPASSWD:ALL

切换到hadoop用户

su - hadoop

安装到/home/hadoop/hadoop3.03目录

sudo mkidr /home/hadoop/hadoop3.03
tar -zxvf hadoop-3.0.3.tar.gz
mv hadoop-3.0.3 hadoop3.03

安装到/home/hadoop/java/jdk1.8

tar -zxvf jdk-8u172-linux-x64.gz
mv jdk_1.8.0.172 jdk1.8

配置环境变量

vi /etc/profile

##java
export JAVA_HOME=/home/hadoop/java/jdk1.8
export PATH=$PATH:$JAVA_HOME/bin

##hadoop
export HADOOP_HOME=/home/hadoop/hadoop3.03
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

验证

echo $JAVA_HOME
echo $HADOOP_HOME

配置 hadoop-env.sh、mapred-env.sh、yarn-env.sh文件的JAVA_HOME参数

export JAVA_HOME=/home/hadoop/java/jdk1.8

配置core-site.xml

hadoop-localhost为主机名称,/opt/data/tmp要先创建好目录

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
      <name>fs.defaultFS</name>
      <value>hdfs://hadoop-localhost:8020</value>
      <description>HDFS的URI,文件系统://namenode标识:端口号</description> 
  </property>
  <property>
      <name>hadoop.tmp.dir</name>
      <value>/opt/data/tmp</value>
      <description>namenode上本地的hadoop临时文件夹</description> 
  </property>
</configuration>

hadoop.tmp.dir配置的是Hadoop临时目录,比如HDFS的NameNode数据默认都存放这个目录下,查看*-default.xml等默认配置文件,就可以看到很多依赖${hadoop.tmp.dir}的配置。

默认的hadoop.tmp.dir是/tmp/hadoop-${user.name},此时有个问题就是NameNode会将HDFS的元数据存储在这个/tmp目录下,如果操作系统重启了,系统会清空/tmp目录下的东西,导致NameNode元数据丢失,是个非常严重的问题,所有我们应该修改这个路径。

sudo mkdir -p /opt/data/tmp

将临时目录的所有者修改为hadoop
sudo chown –R hadoop:hadoop /opt/data/tmp

配置hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration> 
<property>
      <name>dfs.name.dir</name>
      <value>/opt/data/tmp/dfs/name</value>
      <description>namenode上存储hdfs名字空间元数据</description>
  </property>
  <property>
      <name>dfs.data.dir</name>
      <value>/opt/data/tmp/dfs/data</value>
      <description>datanode上数据块的物理存储位置</description>
  </property> 
  <!--设置hdfs副本数量-->
  <property>
      <name>dfs.replication</name>
      <value>1</value>
  </property>
</configuration>

格式化HDFS

sudo chown -R hadoop:hadoop /opt/data
hdfs namenode –format

查看NameNode格式化后的目录
$ ll /opt/data/tmp/dfs/name/current

启动NameNode
sbin/hadoop-daemon.sh start namenode

启动DataNode
sbin/hadoop-daemon.sh start datanode

启动SecondaryNameNode
sbin/hadoop-daemon.sh start secondarynamenode

JPS命令查看是否已经启动成功,有结果就是启动成功了
$ jps

HDFS上测试创建目录、上传、下载文件

[hadoop@hadoop-localhost hadoop3.03]#
创建目录
bin/hdfs dfs -mkdir /demo1

上传
bin/hdfs dfs -put etc/hadoop/core-site.xml /demo1

读取HDFS上的文件内容
bin/hdfs dfs -cat /demo1/core-site.xml

从HDFS上下载文件到本地
bin/hdfs dfs -get /demo1/core-site.xml

查看hdfs的web页面

hdfs 2.X版本的web页面端口号为50070
http://192.168.145.129:50070

hdfs 3.X版本的web页面端口号为9870
http://192.168.145.129:9870/dfshealth.html#tab-overview

配置、启动YARN

配置mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<!-- 指定mr运行在yarn上 -->
<!-- ${full path of your hadoop distribution directory} -->
<configuration>
   <property>  
      <name>mapreduce.framework.name</name>  
      <value>yarn</value>  
  </property>  
   <property>
    <name>yarn.app.mapreduce.am.env</name>
    <value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop3.03</value>
    </property>
    <property>
      <name>mapreduce.map.env</name>
      <value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop3.03</value>
    </property>
    <property>
      <name>mapreduce.reduce.env</name>
      <value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop3.03</value>
    </property>
</configuration>

配置yarn-site.xml

arn.nodemanager.aux-services配置了yarn的默认混洗方式,选择为mapreduce的默认混洗算法。

yarn.resourcemanager.hostname指定了Resourcemanager运行在哪个节点上。

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- 指定YARN的老大(ResourceManager)的地址 -->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>  
    <name>yarn.nodemanager.aux-services</name>  
    <value>mapreduce_shuffle</value>  
</property>   
<!-- reducer获取数据的方式 -->
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>hadoop-localhost</value>
</property>
</configuration>

启动Resourcemanager

sbin/yarn-daemon.sh start resourcemanager

启动nodemanager

sbin/yarn-daemon.sh start nodemanager

也可执行批处理文件启动服务
启动hdfs 和yarn
sbin/start-dfs.sh
sbin/start-yarn.sh

sbin/start-all.sh

YARN的Web页面

YARN的Web客户端端口号是8088,通过http://192.168.145.129:8088/可以查看。

运行MapReduce Job

创建测试用的Input文件
bin/hdfs dfs -mkdir -p /wordcountdemo/input

wc.input文件内容为:

hadoop mapreduce hive
hbase spark storm
sqoop hadoop hive
spark hadoop

将wc.input文件上传到HDFS的/wordcountdemo/input目录中:
bin/hdfs dfs -put /opt/data/wc.input /wordcountdemo/input

运行WordCount MapReduce Job

[hadoop@hadoop-localhost hadoop3.03]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar wordcount /wordcountdemo/input /wordcountdemo/output

2018-07-03 19:38:23,956 INFO client.RMProxy: Connecting to ResourceManager at hadoop-localhost/192.168.145.129:8032
2018-07-03 19:38:24,565 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1530615244194_0002
2018-07-03 19:38:24,879 INFO input.FileInputFormat: Total input files to process : 1
2018-07-03 19:38:25,784 INFO mapreduce.JobSubmitter: number of splits:1
2018-07-03 19:38:25,841 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-07-03 19:38:26,314 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1530615244194_0002
2018-07-03 19:38:26,315 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-07-03 19:38:26,466 INFO conf.Configuration: resource-types.xml not found
2018-07-03 19:38:26,466 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-07-03 19:38:26,547 INFO impl.YarnClientImpl: Submitted application application_1530615244194_0002
2018-07-03 19:38:26,590 INFO mapreduce.Job: The url to track the job: http://hadoop-localhost:8088/proxy/application_1530615244194_0002/
2018-07-03 19:38:26,590 INFO mapreduce.Job: Running job: job_1530615244194_0002
2018-07-03 19:38:35,985 INFO mapreduce.Job: Job job_1530615244194_0002 running in uber mode : false
2018-07-03 19:38:35,988 INFO mapreduce.Job:  map 0% reduce 0%
2018-07-03 19:38:42,310 INFO mapreduce.Job:  map 100% reduce 0%
2018-07-03 19:38:47,402 INFO mapreduce.Job:  map 100% reduce 100%
2018-07-03 19:38:49,469 INFO mapreduce.Job: Job job_1530615244194_0002 completed successfully
2018-07-03 19:38:49,579 INFO mapreduce.Job: Counters: 53
    File System Counters
        FILE: Number of bytes read=94
        FILE: Number of bytes written=403931
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=195
        HDFS: Number of bytes written=60
        HDFS: Number of read operations=8
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=4573
        Total time spent by all reduces in occupied slots (ms)=2981
        Total time spent by all map tasks (ms)=4573
        Total time spent by all reduce tasks (ms)=2981
        Total vcore-milliseconds taken by all map tasks=4573
        Total vcore-milliseconds taken by all reduce tasks=2981
        Total megabyte-milliseconds taken by all map tasks=4682752
        Total megabyte-milliseconds taken by all reduce tasks=3052544
    Map-Reduce Framework
        Map input records=4
        Map output records=11
        Map output bytes=115
        Map output materialized bytes=94
        Input split bytes=122
        Combine input records=11
        Combine output records=7
        Reduce input groups=7
        Reduce shuffle bytes=94
        Reduce input records=7
        Reduce output records=7
        Spilled Records=14
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=171
        CPU time spent (ms)=1630
        Physical memory (bytes) snapshot=332750848
        Virtual memory (bytes) snapshot=5473169408
        Total committed heap usage (bytes)=165810176
        Peak Map Physical memory (bytes)=214093824
        Peak Map Virtual memory (bytes)=2733207552
        Peak Reduce Physical memory (bytes)=118657024
        Peak Reduce Virtual memory (bytes)=2739961856
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=73
    File Output Format Counters 
        Bytes Written=60
[hadoop@hadoop-localhost hadoop3.03]$

输出统计结果为:

[hadoop@hadoop-localhost hadoop3.03]$ bin/hdfs dfs -cat /wordcountdemo/output/part-r-00000
hadoop  3
hbase   1
hive    2
mapreduce   1
spark   2
sqoop   1
storm   1
[hadoop@hadoop-localhost hadoop3.03]$ 

结果是按照键值排好序的

停止Hadoop

sbin/hadoop-daemon.sh stop namenode
sbin/hadoop-daemon.sh stop datanode
sbin/yarn-daemon.sh stop resourcemanager
sbin/yarn-daemon.sh stop nodemanager

全部停止批处理文件
sbin/stop_yarn.sh
sbin/stop_dfs.sh

sbin/stop_all.sh

HDFS模块简介

HDFS负责大数据的存储,通过将大文件分块后进行分布式存储方式,突破了服务器硬盘大小的限制,解决了单台机器无法存储大文件的问题,HDFS是个相对独立的模块,可以为YARN提供服务,也可以为HBase等其他模块提供服务。

YARN模块简介

YARN是一个通用的资源协同和任务调度框架,是为了解决Hadoop1.x中MapReduce里NameNode负载太大和其他问题而创建的一个框架。

YARN是个通用框架,不止可以运行MapReduce,还可以运行Spark、Storm等其他计算框架。

MapReduce模块简介

MapReduce是一个计算框架,它给出了一种数据处理的方式,即通过Map阶段、Reduce阶段来分布式地流式处理数据。它只适用于大数据的离线处理,对实时性要求很高的应用不适用。

—-the—–end—-

猜你喜欢

转载自blog.csdn.net/hsg77/article/details/80904101
今日推荐