hadoop 伪分布式安装详细教程(windows下)

安装前需要准备的东西:

1.一台客户机

2.安装虚拟机,并在虚拟机中配置好linux系统

3.下载jdk安装包(我用的jdk版本是jdk-8u201-linux-x64)

4.下载hadoop安装包  (我用的hadoop版本是hadoop-2.7.6)

5.下载SecureCRT(这个不是必需的,它只是一个远程连接服务器软件)

接下来让我们步入安装的进程吧:

将JDK及HADOOP安装包在linux中解压,放在名为java和hadoop的文件夹下,具体指令如下:

#jdk-8u201-linux-x64.tar.gz是我下载的jdk版本

tar -zxvf jdk-8u201-linux-x64.tar.gz  
mv jdk-8u201-linux-x64 ./java

#hadoop-2.7.6.tar.gz是我下载的hadoop版本

tar -zxvf hadoop-2.7.6.tar.gz
mv hadoop-2.7.6 hadoop

配置环境变量,具体指令是:

vi /etc/profile
#在profile文件的末尾加上如下代码
#JAVA_HOME
export JAVA_HOME=/home/guohao/java
export PATH=$JAVA_HOME/bin:$PATH 
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar 

#HADOOP_HOME
export HADOOP_HOME=/home/guohao/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

#配置HADOOP_CONF_DIR
export HADOOP_CONF_DIR=/home/guohao/hadoop/etc/hadoop

然后让文件生效:

source /etc/profile

ps:我建议再在~/.bashrc中将上述指令加入进去,因为在我测试过程中会发现,如果不修改.bashrc,会发现在haoop配置好启用后,关闭服务器,再次启动会出现问题,需要重新再source /etc/profile。。。。建议一劳永逸...

接下来配置ssh免密码登录,,,这个需要提前在linux上安装openssh-server,当ssh服务安装好了后,会在home目录下生成一个.ssh文件

然后具体的操作是,cd到.ssh文件夹中执行如下指令:

ssh-keygen -t rsa              # 看见提示,一通回车就行
$ cat id_rsa.pub >> authorized_keys 
$ chmod 600 ./authorized_keys    

然后免密登录就配置好啦,,,可以用ssh localhost指令进行验证

咳咳咳.....这下我们就要步入正题啦,对hadoop中的配置文件进行配置,,,要配置的文件呐,有6个,分别是:

core-site.xml,hadoop-env.sh,hdfs-site.xml,mapred-site.xml.template,slaves,yarn-site.xml

这个是core-site.xml的配置

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

############我注释的地方是需要更改的地方,其他位置建议不要更改,直接copy即可

<configuration>
	<property>
		<name>fs.default.name</name>
		<value>hdfs://192.168.40.128:9000</value>    #主机名写自己的主机ip
	</property>
	
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/home/guohao/hadoop/current/tmp</value>  #中间的路径写成自己的hadoop文件夹的地址,current/tmp在原来的haoop文件夹中是不存在的,将它写这里,初始化hadoop后会自动生成这个文件夹
	</property>

	<property>
		<name>fs.trash.interval</name>
		<value>4320</value>
	</property>
</configuration>

这个是hadoop-env.sh的配置:

###我只保留了我们需要更改的这部分,因为行数太多了(狗头)### 
# The java implementation to use.
export JAVA_HOME=/home/guohao/java/     #在这里将java_home的路径加进去,并把#去掉 

这个是hdfs-site.xml的配置:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/home/guohao/hadoop/current/dfs/name</value>  #和current有关的文件夹都是虚拟文件夹,在初始化hadoop后这些文件夹会自动生成
	</property>

	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/home/guohao/hadoop/current/data</value>
	</property>

	<property>
		<name>dfs.webhdfs.enabled</name>
		<value>true</value>
	</property>

	<property>
		<name>dfs.permissions.superusergroup</name>
		<value>staff</value>
	</property>

	<property>
		<name>dfs.permissions.enabled</name>
		<value>false</value>
	</property>
</configuration>

这是mapred-site.xml的配置(如果hadoop/etc/hadoop文件夹中不存在这个文件,而是mapred-site.xml.template的时,对mapred-site.xml.template进行copy,然后改名为mapred-site.xml):

#########需要更改的地方是主机名和文件夹啦,我就不一一注释啦########
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>

	<property>
		<name>mapreduce.jobtracker.http.address</name>
		<value>192.168.40.128:50030</value>
	</property>

	<property>
		<name>mapreduce.jobhistory.address</name>
		<value>192.168.40.128:10020</value>
	</property>

	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>192.168.40.128:19888</value>
	</property>

	<property>
		<name>mapreduce.jobhistory.done.dir</name>
		<value>/jobhistory/done</value>
	</property>

	<property>
		<name>mapreduce.intermediate-done-dir</name>
		<value>jobhistory/done_intermediate</value>
	</property>

	<property>
		<name>mapreduce.job.ubertask.enable</name>
		<value>true</value>
	</property>
</configuration>

接下来是yarn-site.xml的配置啦:

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
<property>
	<name>yarn.resourcemanager.hostname</name>
	<value>192.168.40.128</value>
</property>

<property>
	<name>yarn.nodemanager.aux.services</name>
	<value>mapreduce_shuffle</value>
</property>

<property>
	<name>yarn.nodemanager.aux.services.mapreduce.shuffle.class</name>
	<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
	<name>yarn.resourcemanager.address</name>
	<value>192.168.40.128:18040</value>
</property>

<property>
	<name>yarn.resourcemanager.scheduler.address</name>
	<value>192.168.40.128:18030</value>
</property>

<property>
	<name>yarn.resourcemanager.resource-tracker.address</name>
	<value>192.168.40.128:18025</value>
</property>

<property>
	<name>yarn.resource.manager.admin.address</name>
	<value>192.168.40.128:18141</value>
</property>

<property>
	<name>yarn.resourcemanager.webapp.address</name>
	<value>192.168.40.128:18088</value>
</property>

<property>
	<name>yarn.log-aggregation-enable</name>
	<value>true</value>
</property>

<property>
	<name>yarn.log-aggregation.retain-seconds</name>
	<value>86400</value>
</property>

<property>
	<name>yarn.log-aggregation.retain-check-interval-seconds</name>
	<value>86400</value>
</property>

<property>
	<name>yarn.nodemanager.remote-app-log-dir</name>
	<value>/tmp/logs</value>
</property>

<property>
	<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
	<value>logs</value>
</property>
</configuration>

最后一个slaves啦,因为是本机伪分布式,所以把本机ip加进去就行啦

配置完这些文件后,就让我们来初始化吧,执行如下指令:

hdfs namenode -format

运行后的结果大概是酱紫的:

然后呢,我们就开启hadoop服务啦,在hadoop文件夹下的sbin文件夹中执行

/home/guohao/hadoop/sbin/start-all.sh

然后呢结果是下图酱紫哒:

然后可以jps看一下进程,是酱紫滴:

下来我们直接在浏览器中去看web界面啦,在浏览器中输入ip:50070,然后会蹦跶出这个界面啦:

如果你在浏览器中输入网址,出现服务器拒绝连接的报错时,那阔能就是没有关闭防火墙,linux关闭防火请的指令是酱紫哒:

#即时生效,重启后失效
ervice iptables start   #开启
service iptables stop   #关闭

#永久性生效
chkconfig iptables on   #开启
chkconfig iptables off  #关闭

————————————————————————————————————————————————————

BINGO!

猜你喜欢

转载自blog.csdn.net/bai_and_hao_1314/article/details/88143460
今日推荐