【大数据入门二——yarn和mapreduce】

连续几天夜里加餐，让我想起了新兵连的夜训，在你成为合格战士之前，你必须经历新兵连的过程，，，，其实每个行业都有一个属于它自己的新兵连，不经历此处的磨练，你难以在这个行业立足，我承认先天的资本，但我更相信后天的努力，也许有的人奋斗一生都没有达到他人的起点，我为他人荒废人生而感到可耻，为此人奋斗一生而感到幸福，我们即使渺小，我也要努力绽放，苔花如米小，也学牡丹开！
————————————————前言：送给在所有岗位上努力拼搏的你
1.入门
HDFS 存储
MapReduce 计算
Spark Flink
Yarn 资源作业调度

伪分布式部署
要求环境配置文件参数文件 ssh无密码启动

jps命令
[hadoop@hadoop002 ~]$ jps
28288 NameNode NN
27120 Jps
28410 DataNode DN
28575 SecondaryNameNode SNN

1.MapReduce job on Yarn
[hadoop@hadoop002 hadoop]$ cp mapred-site.xml.template mapred-site.xml
[hadoop@hadoop002 hadoop]$

Configure parameters as follows:
etc/hadoop/mapred-site.xml:

mapreduce.framework.name yarn etc/hadoop/yarn-site.xml: yarn.nodemanager.aux-services mapreduce_shuffle Start ResourceManager daemon and NodeManager daemon: $ sbin/start-yarn.sh

open web:------------

3.运行MR JOB
Linux 文件存储系统 mkdir ls
HDFS 分布式文件存储系统
-format
hdfs dfs -???

Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/
Copy the input files into the distributed filesystem:
$ bin/hdfs dfs -put etc/hadoop input
Run some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar grep input output ‘dfs[a-z.]+’
Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:

$ bin/hdfs dfs -get output output
$ cat output/*
or

View the output files on the distributed filesystem:

$ bin/hdfs dfs -cat output/*

扫描二维码关注公众号，回复： 4144916 查看本文章

bin/hdfs dfs -mkdir /user/hadoop/input
bin/hdfs dfs -put etc/hadoop/core-site.xml /user/hadoop/input

bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar
grep
/user/hadoop/input
/user/hadoop/output
‘fs[a-z.]+’

4.HDFS三个进程启动以hadoop002启动
NN: core-site.xml fs.defaultFS参数
DN: slaves
SNN:

dfs.namenode.secondary.http-address hadoop001:50090 dfs.namenode.secondary.https-address hadoop001:50091

5.jps
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ jps
16188 DataNode
16379 SecondaryNameNode
16566 Jps
16094 NameNode
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$

5.1 位置
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ which jps
/usr/java/jdk1.7.0_80/bin/jps
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$

5.2 其他用户
[root@hadoop002 ~]# jps
16188 – process information unavailable
16607 Jps
16379 – process information unavailable
16094 – process information unavailable
[root@hadoop002 ~]#

[root@hadoop002 ~]# useradd jepson
[root@hadoop002 ~]# su - jepson
[jepson@hadoop002 ~]$ jps
16664 Jps
[jepson@hadoop002 ~]$

process information unavailable
真正可用的

[root@hadoop002 ~]# kill -9 16094
[root@hadoop002 ~]#
[root@hadoop002 ~]# jps
16188 – process information unavailable
16379 – process information unavailable
16702 Jps
16094 – process information unavailable
[root@hadoop002 ~]#
[root@hadoop002 ~]# ps -ef|grep 16094
root 16722 16590 0 22:19 pts/4 00:00:00 grep 16094
[root@hadoop002 ~]#
process information unavailable
真正不可用的

正确的做法: process information unavailable
1.找到进程号 pid
2.ps -ef|grep pid 是否存在
3.假如存在，
第二步是可以知道哪个用户运行这个进程，
su - 用户，进去查看

假如删除rm -f /tmp/hsperfdata_${user}/pid文件
进程不挂，但是jps命令不显示了，所依赖的脚本都会有问题

4.假如不存在，怎样清空残留信息
rm -f /tmp/hsperfdata_${user}/pid文件

6.补充命令
ssh root@ip -p 22
ssh root IP地址 date

rz sz

两个Linux系统怎样传输呢？
hadoop000–>hadoop002
[ruoze@hadoop000 ~]$ scp test.log root IP地址:/tmp/
将当前的Linux系统文件 scp到远程的机器上

hadoop000<–hadoop002
[ruoze@hadoop002 ~]$ scp test.log root@hadoop000:/tmp/

但是 hadoop002属于生产机器你不可登陆
scp root IP地址:/tmp/test.log /tmp/rz.log

但是: 生产上绝对不可能给你密码

ssh多台机器互相信任关系

坑:
scp 传输 pub文件
/etc/hosts文件里面配置多台机器的ip和name

这里是新兵连，这里是教导队，这里是集训队，这里是你开始脱变的起点，从不拒绝，从不害怕每一次磨砺的过程，因为这个过程会让你知道，兵到兵王有多大的差距，过程不好受，舒服的话早就烂大街，他也失去了它应有的价值！
————————————————结束语：送给各行各业努力向兵王奋斗的你

【大数据入门二——yarn和mapreduce】

猜你喜欢