hadoop Pseudo-Distributed Operation
下载java jdk安装包
curl -O http://download.oracle.com/otn-pub/java/jdk/8u171-b11/512cd62ec5174c3487ac17c61aaa89e8/jdk-8u171-linux-x64.rpm?AuthParam=1525399927_bf1d984d06bb3ec51e18b1e0cd30a4b7
下载hadoop安装包
curl -O http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz
基础环境配置
- 禁止ipv6网络
vi /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
- 修改主机名
hostnamectl set-hostname master.hadoop
- 关闭防火墙
systemctl status firewalld
systemctl stop firewalld
systemctl disable firewalld
systemctl status iptables
- selinux
sestatus # 查看状态
setenforce 0 # 临时关闭selinux
vi /etc/selinux/config # 永久关闭
SELINUX=disabled
- 安装jdk
rpm -ivh jdk-8u171-linux-x64.rpm
- 配置ssh信任
ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# 取消公钥验证(针对单个用户)
vi ~/.ssh/config
host localhost
StrictHostKeyChecking no
host 0.0.0.0
StrictHostKeyChecking no
host *hadoop*
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
# 如果是全部用户
vi /etc/ssh/ssh_config
StrictHostKeyChecking ask ---> StrictHostKeyChecking no
部署hadoop
- 部署hadoop,这里部署到/usr/local
1. 解压安装包至/usr/local
tar xvxf hadoop-3.1.0.tar.gz -C /usr/local/
cd /usr/local
mv hadoop-3.1.0 hadoop # 方便操作
2. 配置相关环境变量
在/etc/profile文件追加以下内容
vi /etc/profile
export JAVA_HOME=/usr/java/default
export HADOOP_HOME=/usr/local/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
source /etc/profile
3. 修改hadoopq启动文件中JAVA_HOME
vi /usr/local/hadooop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/default
4. 修改关键配置文件
1. etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
2. etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
5.运行hadoop
1. 格式化文件系统
bin/hdfs namenode -format
2. 启动nameNode和dataNode
sbin/start-dfs.sh
如果遇到如下错误, 缺少用户定义:
[root@master hadoop]# start-dfs.sh
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [master.hadoop]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.、
解决方案:
vim sbin/start-dfs.sh
vim sbin/stop-dfs.sh
顶部空白处
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
3. web界面查看nameNode
默认地址: http://localhost:9870
4. 创建用于执行mapreduce jobs的目录
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>
5. 上传文件至分布式文件系统
$ bin/hdfs dfs -mkdir input
$ bin/hdfs dfs -put etc/hadoop/*.xml input
6. 运行样例
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep input output 'dfs[a-z]+'
6. 从分布式文件系统下载输出的文件
$ bin/hdfs dfs -get output output
$ cat output/*
or
hdfs dfs -cat output/*
6. 停止nameNode和dataNode
$ sbin/stop-dfs.sh
7. 单节点yarn方式运行
完成上述第五节的1、2、3、4步后,进行如下操作
1. 配置文件etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
2. 配置文件etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
3. 启动ResourceManager daemon and NodeManage进程
start-yarn.sh
出现如下报错:
[root@master hadoop]# start-yarn.sh
Starting resourcemanagers on []
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.
解决方案:
$ vim sbin/start-yarn.sh
$ vim sbin/stop-yarn.sh
顶部添加
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
4. web界面查看resurceManager服务
5. . 运行样例
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep input output ‘dfs[a-z]+’
6. 停止yarn
$ sbin/stop-yarn.sh