一.安装hadoop
(这里需要有一定的docker知识)
1.安装镜像
我们抛弃了传统的vm方式,使用docker来安装部署hadoop.首先要准备一个镜像,可以使用Dockerfile构建一个合适自己的镜像,或者可以在共有仓库中找一个具有hadoop环境的镜像来使用也可以。由于我是配置的阿里云的加速器,所以在阿里云的仓库中找了一个具有hadoop环境的镜像。hadoop镜像地址
使用命令拉到本地
docker pull registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop
下载完成之后,通过docker images 可以查看到该镜像:
registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop latest 44c31aee79de 24 months ago 927MB
2.创建容器
有了镜像之后,我们根据该镜像创建三个容器,分别是一个Master用来作为hadoop集群的namenode,剩下两个Slave用来作为datanode。
可以使用命令:
docker run -i -t --name Master -h Master registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop /bin/bash
命令中的-h为该容器设置了主机名,这里设置为Master,否则创建容器之后是一串随机的数字和字母不好辨认使用。–name指定了容器的名字,也是为了方便区分使用。
建立好Master节点的容器之后,再分别使用两次该命令,创建两个Slave节点。稍微改变其中的参数即可:
例如创建Slave1节点:
docker run -i -t --name Slave1 -h Slave1 registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop /bin/bash
这样集群的基本环境就准备好了。
3.配置ssh (为了多台机器无密码互相访问)
首先在master将SSH运行起来。
/etc/init.d/ssh start
然后生成秘钥,保存到authorized_keys中。
root@Master:/# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): /root/.ssh/id_rsa already exists. Overwrite (y/n)? y Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 47:43:2d:03:fa:93:ce:00:8a:3f:4d:bb:67:09:65:df root@Master The key's randomart image is: +---[RSA 2048]----+ | .... | | . .o . | | . . oo | | . . .o. o . | |. . .o..S.. | | . o.. +.oE | | o o. .o | | . .+ | | .o | +-----------------+ root@Master:/# cd root@Master:~# cd .ssh/ root@Master:~/.ssh# cat id_rsa id_rsa id_rsa.pub root@Master:~/.ssh# cat id_rsa id_rsa id_rsa.pub root@Master:~/.ssh# cat id_rsa.pub > authorized_keys root@Master:~/.ssh# cat authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDQ2SQCcb1Jxf4hpVlbWkCpywmR4r0zeAuzQ6AUBFDHqfXgMQj/ku5s0xtbjjQWlKShvAiV1wLSwuAHfzKfKG0RFhcAXJDTBrDQnN9lBKeEcD1XlBHwtm8N+37bQWSXDLimx1UsNWkUy3rPrB55kV8ofTztI+sZMoOcS34Scib3hjmrbAMC3O0naJkBSaTch7KyX9q3kzxmMixShgBdxPAp6qLIzluIxtI/GXV7M9lfY1+KOWfD01jzHH2sAWpeIrefmEtLwNlv4YRuk0uHRaXOmnfyNj2K4WhTEhJpYdK9KJ3jNXiZZNGKZt3UGDSowPMHfmmUC+Jr3imr3YyP9VFh root@Master
接着在两个Slave上也执行同样的操作
接着使用ip addr查看ip
root@Master:~/.ssh# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1 link/ipip 0.0.0.0 brd 0.0.0.0 3: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN group default qlen 1 link/tunnel6 :: brd :: 6: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0 valid_lft forever preferred_lft forever
然后在/etc/hosts里面把所有机器的ip都配置好
172.17.0.2 Master 172.17.0.3 Slave1 172.17.0.4 Slave2
把master的公钥copy到从节点
.
ssh-copy-id -i Slave2 ssh-copy-id -i Slave1
接着执行ssh Slave1就可以成功了,exit退出ssh
4.配置hadoop
hadoop-env.sh:修改有关java的环境(镜像里面已经帮我们配置好了)
export JAVA_HOME=/opt/tools/jdk1.8.0_77
core-site.xml(定位文件系统的nameNode)
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://Master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/hadoop/tmp</value> </property> </configuration>
mapred-site.xml(定位JobTraker所在主节点)
<configuration> <property> <name>mapred.job.tracker</name> <value>Master:9001</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
hdfs-site.xml(hdfs配置)
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/hadoop/data</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/hadoop/name</value> </property> </configuration>
yarn-site.xml
<configuration> <property> <name>yarn.resourcemanager.address</name> <value>Master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>Master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>Master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>Master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>Master:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
slaves文件
把各个节点都加进去
Slave1 Slave2
然后将这些文件通过scp发送到各个Slave节点上覆盖原来的文件:
scp core-site.xml hadoop-env.sh hdfs-site.xml mapred-site.xml yarn-site.xml Slave1:/opt/tools/hadoop/etc/hadoop/
5.运行hadoop
首先要格式化hadoop
hadoop namenode -format
然后启动hadoop并用jps查看进程
。
root@Master:/opt/tools/hadoop/sbin# start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [Master] Master: starting namenode, logging to /opt/tools/hadoop-2.7.2/logs/hadoop-root-namenode-Master.out Slave2: starting datanode, logging to /opt/tools/hadoop-2.7.2/logs/hadoop-root-datanode-Slave2.out Slave1: starting datanode, logging to /opt/tools/hadoop-2.7.2/logs/hadoop-root-datanode-Slave1.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /opt/tools/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-Master.out starting yarn daemons starting resourcemanager, logging to /opt/tools/hadoop/logs/yarn--resourcemanager-Master.out Slave2: starting nodemanager, logging to /opt/tools/hadoop-2.7.2/logs/yarn-root-nodemanager-Slave2.out Slave1: starting nodemanager, logging to /opt/tools/hadoop-2.7.2/logs/yarn-root-nodemanager-Slave1.out root@Master:/opt/tools/hadoop/sbin# jps 2097 ResourceManager 1747 NameNode 1941 SecondaryNameNode 873 NodeManager 2346 Jps 463 DataNode
使用命令 hadoop dfsadmin -report 查看各个节点的状态
DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Configured Capacity: 188176871424 (175.25 GB) Present Capacity: 168504856576 (156.93 GB) DFS Remaining: 168504778752 (156.93 GB) DFS Used: 77824 (76 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (3): Name: 172.17.0.2:50010 (Master) Hostname: Master Decommission Status : Normal Configured Capacity: 62725623808 (58.42 GB) DFS Used: 28672 (28 KB) Non DFS Used: 6557335552 (6.11 GB) DFS Remaining: 56168259584 (52.31 GB) DFS Used%: 0.00% DFS Remaining%: 89.55% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sat Apr 14 14:35:46 UTC 2018 Name: 172.17.0.4:50010 (Slave2) Hostname: Slave2 Decommission Status : Normal Configured Capacity: 62725623808 (58.42 GB) DFS Used: 24576 (24 KB) Non DFS Used: 6557339648 (6.11 GB) DFS Remaining: 56168259584 (52.31 GB) DFS Used%: 0.00% DFS Remaining%: 89.55% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sat Apr 14 14:35:48 UTC 2018 Name: 172.17.0.3:50010 (Slave1) Hostname: Slave1 Decommission Status : Normal Configured Capacity: 62725623808 (58.42 GB) DFS Used: 24576 (24 KB) Non DFS Used: 6557339648 (6.11 GB) DFS Remaining: 56168259584 (52.31 GB) DFS Used%: 0.00% DFS Remaining%: 89.55% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sat Apr 14 14:35:48 UTC 2018