前言:
hadoop主要应用场景:
数据分析平台;
推荐系统;
业务系统的底层存储系统;
业务监控系统。
实际应用:电子商务,能源开采,节能,在线旅游,诈骗检测,图像处理,IT安全等。
大三学校开设了hadoop的搭建和应用的课程,教材印刷2020年3月。随着网络的更新和不断发展,以后的搭建过程和应用也会不断地变化,当作一份记忆把学习的过程写在这里。
本节搭建工具和环境:
vm虚拟机,ubuntu系统,JAVA包(),hadoop包()
辅助工具:xshell6 and xftp (关于两个工具的使用请自行查找我的博客)
前期环境准备:
CPU:开启虚拟机cpu虚拟化
网络:开启NAT模式
真实主机网络:配置网卡和NAT模式相匹配
以下内容采用离线安装教程
设置cpu的虚拟化:
开启网卡:
设置网卡:
到虚拟机中设置NAT网络:
xshell工具和xftp,与服务器建立连接。(方便我们后期的操作)
使用工具在服务器上创建文件夹,并上传文件(这里的hbase压缩包暂时存放到/home目录下,稍后在hbase的配置过程中做讲解)
要想实现hadoop的搭建需要jdk包的环境配置:
解压刚才上传的jdk文件包,不做解压目录的变化
root@user01:/home# cd jdk/
root@user01:/home/jdk# ls
jdk-8u171-linux-x64.tar.gz
root@user01:/home/jdk# tar -xzvf jdk-8u171-linux-x64.tar.gz
配置jdk的环境变量:
export JAVA_HOME=/home/jdk/jdk1.8.0_171
export CLASSPATH=$JAVA_HOME/lib/
export PATH=$JAVA_HOME/bin:$PATH
export PATH JAVA_HOME CLASSPATH
root@user01:/home/jdk# vi /root/.bashrc
(在该文件末尾添加上述变量)
使环境变量生效:
root@user01:/home/jdk# source /root/.bashrc
检查jdk是否安装正常:
java -version //查看java的版本
echo $JAVA_HOME //校验变量值
接下来前期工作准备完成。准备安装hadoop
root@user01:/home# cd hadoop/
root@user01:/home/hadoop# ls
hadoop-2.7.3.tar.gz
root@user01:/home/hadoop# tar -xzvf hadoop-2.7.3.tar.gz
配置hadoop环境变量:
root@user01:/home# vi /root/.bashrc
配置Hadoop的搜索路径
vi /root/.bashrc
export HADOOP_HOME=/home/hadoop/hadoop-2.7.3
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib/
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
变量生效:
root@user01:/home# source /root/.bashrc
检查安装是否正常:
root@user01:/home# hadoop version
大致会显示的信息:
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /home/hadoop/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar
伪分布式模式配置:
修改Hadoop的配置文件
1.修改/home/hadoop/hadoop-2.7.3/etc/hadoop/hadoop-env.sh
vi hadoop-env.sh
export JAVA_HOME=/home/jdk/jdk1.8.0_171 //添加的变量语句
删除荧光绿前环境变量的光标高亮路径区域:
将jdk路径粘贴到该位置:
伪分布式配置
2.配置/home/hadoop/hadoop-2.7.3/etc/hadoop/core-site.xml,增加以下内容
vi core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
将上述代码填入该图片显示的位置
3.配置/home/hadoop/hadoop-2.7.3/etc/hadoop/hdfs-site.xml
vi hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/tmp/dfs/data</value>
</property>
4.格式化namenode
hadoop namenode –format
查看是否成功:
successfully formatted 和 Exiting with status0 表示成功,Exiting with status1表示格式化出现错误
5.手动启动namenode,datanode,secondarynamenode
start-dfs.sh
root@user01:/home/hadoop/hadoop-2.7.3/etc/hadoop# start-dfs.sh
Starting namenodes on [localhost]
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:No+kRFk4mIW/DdRFxPw7Y1ylSLKji1k3lzWBcklqDmA.
Are you sure you want to continue connecting (yes/no)? ys
Please type 'yes' or 'no': yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
root@localhost's password:
localhost: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-namenode-user01.out
root@localhost's password:
localhost: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-user01.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:No+kRFk4mIW/DdRFxPw7Y1ylSLKji1k3lzWBcklqDmA.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
[email protected]'s password:
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-user01.out
6.手动启动yarn管理端口
start-yarn.sh
7.jps查看启动的进程
8.浏览器输入网址http://localhost:50070测试
总结:
整体的安装过程可能太过于麻烦,老师教过一遍后自己摸索着搭建的,一开始学习hadoop时不是怎么了解hadoop的结构和原理,问了老师hadoop能做什么,老师说以后你就知道了,可能这不是我门的主修课程吧,哈哈哈.
自己又查了百度,和其他大佬写的博客,发现有位大佬把hadoop比作切菜和炒菜,这样抽象的比喻就特别好理解.
做一个菜鸟虚心学习.