hadoop伪分布式部署

hadoop Pseudo-Distributed Operation

下载java jdk安装包

curl -O http://download.oracle.com/otn-pub/java/jdk/8u171-b11/512cd62ec5174c3487ac17c61aaa89e8/jdk-8u171-linux-x64.rpm?AuthParam=1525399927_bf1d984d06bb3ec51e18b1e0cd30a4b7

下载hadoop安装包

curl -O http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz

基础环境配置

  1. 禁止ipv6网络
vi /etc/sysctl.conf

net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
  1. 修改主机名
hostnamectl set-hostname master.hadoop
  1. 关闭防火墙
systemctl status firewalld

systemctl stop firewalld

systemctl disable firewalld

systemctl status iptables
  1. selinux
sestatus      # 查看状态

setenforce 0  # 临时关闭selinux

vi /etc/selinux/config  # 永久关闭

    SELINUX=disabled
  1. 安装jdk
rpm -ivh jdk-8u171-linux-x64.rpm
  1. 配置ssh信任

ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys


# 取消公钥验证(针对单个用户)
vi ~/.ssh/config

host localhost
  StrictHostKeyChecking no

host 0.0.0.0
  StrictHostKeyChecking no

host *hadoop*
  StrictHostKeyChecking no
  UserKnownHostsFile /dev/null

# 如果是全部用户
vi /etc/ssh/ssh_config
    StrictHostKeyChecking ask --->   StrictHostKeyChecking no

部署hadoop

  1. 部署hadoop,这里部署到/usr/local

1. 解压安装包至/usr/local

tar xvxf hadoop-3.1.0.tar.gz -C /usr/local/
cd /usr/local

mv hadoop-3.1.0 hadoop # 方便操作

2. 配置相关环境变量

在/etc/profile文件追加以下内容

vi /etc/profile
export JAVA_HOME=/usr/java/default
export HADOOP_HOME=/usr/local/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

source /etc/profile

3. 修改hadoopq启动文件中JAVA_HOME

vi /usr/local/hadooop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/java/default

4. 修改关键配置文件

1. etc/hadoop/core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

2. etc/hadoop/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

5.运行hadoop

1. 格式化文件系统

bin/hdfs namenode -format

2. 启动nameNode和dataNode

sbin/start-dfs.sh

如果遇到如下错误,   缺少用户定义:
[root@master hadoop]# start-dfs.sh
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [master.hadoop]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.、

解决方案:
vim sbin/start-dfs.sh
vim sbin/stop-dfs.sh

顶部空白处

HDFS_DATANODE_USER=root  
HADOOP_SECURE_DN_USER=hdfs  
HDFS_NAMENODE_USER=root  
HDFS_SECONDARYNAMENODE_USER=root  

3. web界面查看nameNode

默认地址: http://localhost:9870

4. 创建用于执行mapreduce jobs的目录

$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>

5. 上传文件至分布式文件系统

  $ bin/hdfs dfs -mkdir input
  $ bin/hdfs dfs -put etc/hadoop/*.xml input

6. 运行样例

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep input output 'dfs[a-z]+' 

6. 从分布式文件系统下载输出的文件

  $ bin/hdfs dfs -get output output
  $ cat output/*

or

hdfs dfs -cat output/*

6. 停止nameNode和dataNode

  $ sbin/stop-dfs.sh

7. 单节点yarn方式运行

完成上述第五节的1、2、3、4步后,进行如下操作

1. 配置文件etc/hadoop/mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>

2. 配置文件etc/hadoop/yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

3. 启动ResourceManager daemon and NodeManage进程

start-yarn.sh

出现如下报错:
[root@master hadoop]# start-yarn.sh
Starting resourcemanagers on []
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.

解决方案: 
$ vim sbin/start-yarn.sh 
$ vim sbin/stop-yarn.sh 

顶部添加
YARN_RESOURCEMANAGER_USER=root  
HADOOP_SECURE_DN_USER=yarn  
YARN_NODEMANAGER_USER=root  

4. web界面查看resurceManager服务

默认地址 http://localhost:8088/

5. . 运行样例

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep input output ‘dfs[a-z]+’

6. 停止yarn

  $ sbin/stop-yarn.sh

猜你喜欢

转载自blog.csdn.net/uevol14/article/details/80195581
今日推荐