推荐系统从无到有(1)——Hadoop单机


本文的所有操作均基于

  • centos-release-7-5.1804(英文环境)
  • 使用root(超级用户)
    • 若是root用户登录,打开终端Terminal应出现[root@localhost ~]#
    • 若不是上述情况,键入su来使用root用户。(注:su后需要输入root的密码)
  • Java-1.8.0_232
    • [root@VM_0_15_centos hadoop]# java -version
      OpenJDK version “1.8.0_232”
      OpenJDK Runtime Environment (build 1.8.0_232-b09)
      OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
    • 可以参考xxx进行安装
    • 使用java11的话会导致hadoop启动报错
  • 需要SSH

一、准备部分

  1. 准备好hadoop的文件

a. 进入hadoop的下载页面选择你需要下载的hadoop版本(这里笔者选择的是hadoop-3.2.1.tar.gz)

b. 跳转到如下页面后,点击红框所示hadoop推荐的下载地址开始下载或者使用下面的Backup Sites下载
在这里插入图片描述
可以使用(假设当前所处目录为/home/username/downloads)

	nohup wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoo> p-3.2.1/hadoop-3.2.1.tar.gz &
命令 用途
nohup 可以让命令以守护进程的形式驻守后台,并将std输出保存到当前目录下的nohup.out文件中(使用该命令时,结尾一定要带&)
wget 常见的用于下载文件的一个命令,后接下载链接
curl 一个机遇http协议进行文件上下传的命令,后接地址

c. 使用tar命令解压hadoop的压缩文件

tar -zxvf hadoop-3.2.1.tar.gz -C /usr/local/
# 会解压到/usr/local/hadoop-3.2.1下
命令或参数 用途
tar 解压缩命令
-z 调用gzip执行压缩或解压缩
-x 解开tar文件
-v 列出每一步处理涉及的信息,只用一个“v”时,仅列出文件名,使用两个“v”时,列出权限,所有者,大小,时间,文件名等信息
-f 指定要处理的文件名
-C 后接解压缩的目录(该目录已存在于硬盘上)

d. 进入/usr/lcoal文件夹,且改hadoop-3.2.1的文件夹名

cd /usr/local/
# 更改hadoop解压缩的文件夹名
mv hadoop-3.2.1 hadoop
  1. 配置hadoop用户(注:推荐使用一个hadoop用户来使用hadoop,当然直接使用root用户操作也是可以的,接下来的操作都基于root

a. 新增hadoop用户并修改其密码

[root@VM_0_15_centos hadoop]$ useradd -m hadoop -s /bin/bash
[root@VM_0_15_centos hadoop]$ passwd hadoop
Changing password for user hadoop.
New password: 
BAD PASSWORD: The password contains the user name in some form
(如果出现了上面这个提示可以不用理会,继续确认密码)
Retype new password:  

b. 为hadoop用户增加管理员权限

[root@VM_0_15_centos hadoop]# visudo

之后会进入vi/vim的一个编辑界面
输入/root+回车查找"root",多按几次n键,找到如下红框所示行
在这里插入图片描述
按下i键,在红框所示一行下新增

hadoop ALL=(ALL) ALL

在这里插入图片描述
按下ESC键,输入:wq,从而保存并且退出
在这里插入图片描述

c. 配置SSH无密码登录
先检查SSH是否安装,键入ssh并回车

[root@VM_0_15_centos hadoop]$ ssh
usage: ssh [-1246AaCfGgKkMNnqsTtVvXxYy] [-b bind_address] [-c cipher_spec]
[-D [bind_address:]port] [-E log_file] [-e escape_char]
[-F configfile] [-I pkcs11] [-i identity_file]
[-J [user@]host[:port]] [-L address] [-l login_name] [-m mac_spec]
[-O ctl_cmd] [-o option] [-p port] [-Q query_option] [-R address]
[-S ctl_path] [-W host:port] [-w local_tun[:remote_tun]]
[user@]hostname [command]

如果没有安装请看这里
接下来键入ssh localhost并回车

[root@VM_0_15_centos hadoop]$ ssh localhost
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

然后是添加rsa密钥

[root@VM_0_15_centos ~]$ cd ~/.ssh
[root@VM_0_15_centos .ssh]$ pwd
/root/.ssh
[root@VM_0_15_centos .ssh]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:vtKP8vpJnCl50rIHWWo9RB+wtGAsXCCroBgR+UT1+lw > root@VM_0_15_centos
The key's randomart image is:
+---[RSA 2048]----+
|o+ooo++ o.       |
|...ooo.o.o.      |
|oo.  ...o. .     |
|+o.  .  o .      |
|+   .  *E        |
|     o=*oo       |
|     .BoO.       |
|      oBo+       |
|      oB*..      |
+----[SHA256]-----+
[root@VM_0_15_centos .ssh]$ cat id_rsa.pub >> authorized_keys # 将公钥添加到授权列表里
[root@VM_0_15_centos .ssh]$ chmod 600 ./authorized_keys # 更改授权列表的读写权限为root的rw-
命令或参数 用途
chmod 更改文件或文件夹的权限
后接的数字或字母 即chmod abc filename,其中a,b,c各一个数字,分别表示User,Group,Other的权限
权限取值示例 a=rwx(User获得全权限)
777(所有用户全权限)
600 (User获得读写权)
权限的取值解释 r=4,w=2,x=2
如果是rwx,则取4+2+1=7
如果是rw-,则取4+2=6
如果是r-x,则取4+1=5
以此类推

二、安装部分

  1. 为hadoop配置环境变量
[root@VM_0_15_centos hadoop]$ vim /etc/profile
在末尾新增
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

如果上面你hadoop所在的目录与我的不同,则应换成你hadoop的目录
java config 是我的Java环境变量配置
在这里插入图片描述

  1. 配置hadoop

a. 进入hadoop的配置文件目录

[root@VM_0_15_centos hadoop]$ cd /usr/local/hadoop/etc/hadoop/ # 这是hadoop配置文件所在的目录

下面一共要修改4个文件,分别是:

b. 修改core-site.xml

[root@VM_0_15_centos hadoop]$ vi core-site.xml

加入

<property>
	<name>hadoop.tmp.dir</name>
	<value>file:/usr/local/hadoop/tmp</value>
</property>
<property>
	<name>fs.defaultFS</name>
	<value>hdfs://localhost:9000</value>
</property>

:wq退出
在这里插入图片描述

c. 修改hdfs-site.xml

<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/data/hadoop/dfs/name</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/data/hadoop/dfs/data</value>
</property>
<property>
    <name>dfs.datanode.fsdataset.volume.choosing.policy</name>
    <value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
</property>
<property>
    <name>dfs.namenode.http-address</name>
    <value>localhost:8000</value>
</property>
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>localhost:8001</value>
</property>
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>

如下
在这里插入图片描述

d. 6. 修改mapred-site.xml

[root@VM_0_15_centos hadoop]$ mv mapred-site.xml.template mapred-site.xml # 较新版本无需复制
[root@VM_0_15_centos hadoop]$ vi core-site.xml

加入如下内容

<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    <description>Execution framework set to Hadoop YARN.</description>
</property>
<property>
    <name>yarn.app.mapreduce.am.staging-dir</name>
    <value>/tmp/hadoop-yarn/staging</value>
</property>
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>localhost:10020</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>localhost:19888</value>
</property>
<property>
    <name>mapreduce.jobhistory.done-dir</name>
    <value>${yarn.app.mapreduce.am.staging-dir}/history/done</value>
</property>
<property>
    <name>mapreduce.jobhistory.intermediate-done-dir</name>
    <value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate</value>
</property>
<property>
    <name>mapreduce.jobhistory.joblist.cache.size</name>
    <value>1000</value>
</property>
<property>
    <name>mapreduce.tasktracker.map.tasks.maximum</name>
    <value>8</value>
</property>
<property>
    <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
    <value>8</value>
</property>
<property>
    <name>mapreduce.jobtracker.maxtasks.perjob</name>
    <value>5</value>
</property>

e. 以及yarn-site.xml

<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>localhost</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>localhost:8088</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
</property>
<property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>864000</value>
</property>
<property>
    <name>yarn.log-aggregation.retain-check-interval-seconds</name>
    <value>86400</value>
</property>
<property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/yarnapp/logs</value>
</property>
<property>
    <name>yarn.log.server.url</name>
    <value>http://localhost:19888/jobhistory/logs/</value>
</property>
<property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/data/apache/tmp/</value>
</property>
<property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>5000</value>
</property>
<property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1024</value>
</property>
<property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>4.1</value>
</property>
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

三、运行

  1. 配置启动脚本
[root@VM_0_15_centos hadoop]$ pwd
/usr/local/hadoop/etc/hadoop # 上面修改配置文件时待在的目录
[root@VM_0_15_centos hadoop]$ cd /usr/local/hadoop/sbin
[root@VM_0_15_centos sbin]$ pwd
/usr/local/hadoop/sbin # 启动脚本所在的目录
[root@VM_0_15_centos sbin]$ vi ./start-dfs.sh

在start-yarn.sh开头添加:

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=root
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root

stop-yarn.sh,start-dfs.sh,stop-dfs.sh,start-all.sh,stop-all.sh都在开头加入上述内容

  1. 在hadoop-env.sh中加入JAVA_HOME
[root@VM_0_15_centos sbin]$ vi ../etc/hadoop/hadoop-env.sh

如下
在这里插入图片描述

  1. 首先格式化namenode
[root@VM_0_15_centos hadoop]$ pwd
/usr/local/hadoop
[root@VM_0_15_centos hadoop]$ ./bin/hdfs namenode -format namenode
# 因为我们在第二部分已经配置了hadoop的环境变量,所以也可以像下面这样操作
[root@VM_0_15_centos hadoop]$ hdfs namenode -format namenode
……
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at VM_0_15_centos/127.0.0.1
************************************************************/

当/data/hadoop/下出现dfs文件夹即初始化完成
在这里插入图片描述

  1. 启动hadoop
[root@VM_0_15_centos hadoop]$ start-all.sh
[root@VM_0_15_centos hadoop]$ jps # 检查是否hadoop启动了没
10288 DataNode
10595 SecondaryNameNode
10954 ResourceManager
11164 NodeManager
10029 NameNode
13887 Jps

出现有DataNode,SecondaryNameNode,ResourceManager,NodeManager,NameNode且不消失,则hadoop启动成功


四、测试

  1. 在HDFS中创建input目录
[root@VM_0_15_centos hadoop]# hdfs dfs -mkdir -p /usr/root/wordcount/input
  1. 查看是否创建成功
[root@VM_0_15_centos hadoop]# hdfs dfs -ls /usr/root/wordcount
Found 1 items
drwxr-xr-x   - root supergroup          0 2019-10-28 12:39 /usr/root/wordcount/input
  1. 创建一个文本文件
[root@VM_0_15_centos downloads]$ pwd
/home/tomato/downloads
[root@VM_0_15_centos downloads]$ vi test.txt 

往test.txt内填入内容(我这里放的是Python之禅,实际上放什么都是可以的,这只是个示例测试)

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

:wq保存离开vim

  1. 将test.txt上传到刚刚创建好的文件夹,并查看是否上传成功
[root@VM_0_15_centos downloads]$ hdfs dfs -put test.txt /usr/root/wordcount/input
[root@VM_0_15_centos downloads]$ hdfs dfs -ls /usr/root/wordcount/input
Found 1 items
-rw-r--r--   1 root supergroup       1210 2019-10-28 12:54 /usr/root/wordcount/input/test.txt
  1. 运行hadoop内置的wordcount示例
[root@VM_0_15_centos hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount /usr/root/wordcount/input /usr/root/wordcount/output
#不要提前创建output目录,会导致job无法运行
2019-10-30 15:30:09,672 INFO mapreduce.Job: Running job: job_1572417977459_0001
2019-10-30 15:30:22,601 INFO mapreduce.Job: Job job_1572417977459_0001 running in uber mode : false
2019-10-30 15:30:22,608 INFO mapreduce.Job:  map 0% reduce 0%
2019-10-30 15:30:33,278 INFO mapreduce.Job:  map 100% reduce 0%
2019-10-30 15:30:39,318 INFO mapreduce.Job:  map 100% reduce 100%
2019-10-30 15:30:40,340 INFO mapreduce.Job: Job job_1572417977459_0001 completed successfully
2019-10-30 15:30:40,495 INFO mapreduce.Job: Counters: 54
File System Counters
    FILE: Number of bytes read=1206
.........
File Input Format Counters 
    Bytes Read=857
File Output Format Counters 
    Bytes Written=817
# 出现上述信息那么wordcount示例已经完成
  1. 查看wordcount任务的输出
[root@VM_0_15_centos hadoop]$ hdfs dfs -cat /usr/root/wordcount/output/* # 下面会输出结果
Zen	1
a	2
ambiguity,	1
and	1
.....
counts.	1

到此Hadoop安装完成

猜你喜欢

转载自blog.csdn.net/JikeStardy/article/details/102818670