Linux basic environment construction (CentOS7)-install Hadoop

Linux basic environment construction (CentOS7)-install Hadoop

1 Hadoop download and installation

Hadoop's position in the big data technology system is very important. Hadoop is the foundation of big data technology. The solid mastery of the basic knowledge of Hadoop will determine how far you go on the road of big data technology.
Insert picture description here

Hadoop download

Hadoop download link: https://pan.baidu.com/s/1q7Z6HLHJbq-HNjzVqljCNQ

Extraction code: h5bv

Transfer the downloaded installation package to the Linux virtual machine via Xftp

Hadoop installation

Create the working path /usr/hadoop, download the corresponding software, and unzip it to the working path.

mkdir /usr/hadoop		#首先在根目录下建立工作路径/usr/hadoop
cd /opt/software		#进入安装包的文件夹
tar -zxvf hadoop-2.7.3.tar.gz -C /usr/hadoop

2 Configure Hadoop environment variables (3 units)

vim /etc/profile

Add the following content:

 #HADOOP
export HADOOP_HOME=/usr/hadoop/hadoop-2.7.3
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib
export PATH=$PATH:$HADOOP_HOME/bin

Insert picture description here

source /etc/profile		#使profile生效

3 Configure each component of Hadoop (It is recommended to copy and paste the configuration file directly to prevent mistakes)

The various components of hadoop are configured using XML, and these files are stored in the etc/hadoop directory of hadoop.

1.hadoop-env.sh

cd $HADOOP_HOME/etc/hadoop
vim hadoop-env.sh

Enter the following to modify the java environment variables:

export JAVA_HOME=/usr/java/jdk1.8.0_171

Insert picture description here
Type "Esc" to exit the editing mode, and use the command ":wq" to save and exit.

2.core-site.xml

vim core-site.xml
<configuration>
<property>
  <name>fs.default.name</name>
   <value>hdfs://master:9000</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
   <value>/usr/hadoop/hadoop-2.7.3/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
  <name>io.file.buffer.size</name>
   <value>131072</value>
</property>
<property>
  <name>fs.checkpoint.period</name>
   <value>60</value>
</property>
<property>
  <name>fs.checkpoint.size</name>
   <value>67108864</value>
</property>
</configuration>

master: the ip or mapping name of the master node.

9000: The ports configured on the master node and the slave node are both 9000.
Insert picture description here

3.mapred-site.xml

Hadoop does not have this file. You need to copy the mapred-site.xml.template sample file to mapred-site.xml and edit it:

cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml

Insert picture description here
Insert picture description here

<configuration>
<property>
<!--指定Mapreduce运行在yarn上-->
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>

4.yarn-site.xml

vim yarn-site.xml

Insert picture description here

<configuration>
<!-- 指定ResourceManager的地址-->
<property>
 <name>yarn.resourcemanager.address</name>
   <value>master:18040</value>
 </property>
 <property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>master:18030</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.address</name>
   <value>master:18088</value>
 </property>
 <property>
   <name>yarn.resourcemanager.resource-tracker.address</name>
   <value>master:18025</value>
 </property>
 <property>
  <name>yarn.resourcemanager.admin.address</name>
  <value>master:18141</value>
 </property>
<!-- 指定reducer获取数据的方式-->
 <property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
 </property>
 <property>
  <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
</configuration>

5.hdfs.site.xml

vim hdfs-site.xml

Insert picture description here

<configuration>
<property>
 <name>dfs.replication</name>
   <value>2</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/usr/hadoop/hadoop-2.7.3/hdfs/name</value>
   <final>true</final>
</property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/usr/hadoop/hadoop-2.7.3/hdfs/data</value>
   <final>true</final>
 </property>
 <property>
  <name>dfs.namenode.secondary.http-address</name>
   <value>master:9001</value>
 </property>
 <property>
   <name>dfs.webhdfs.enabled</name>
   <value>true</value>
 </property>
 <property>
   <name>dfs.permissions</name>
   <value>false</value>
 </property>
</configuration>

dfs.replication: Because Hadoop is reliable, it will back up multiple texts, where value refers to the number of backups (less than or equal to the number of slave nodes).

6.slaves & master

Write the slave file and add the subnode slave1 and slave2;

vim slaves

Insert picture description here

Write the master file and add the master node master.

vim master

Insert picture description here

7 Synchronize other virtual machines

Distribute profile files and hadoop files to slave1 and slave2 nodes

scp -r /etc/profile root@slave1:/etc/profile	#将环境变量profile文件分发到slave1节点
scp -r /etc/profile root@slave2:/etc/profile	#将环境变量profile文件分发到slave2节点
scp -r /usr/hadoop root@slave1:/usr/			#将hadoop文件分发到slave1节点
scp -r /usr/hadoop root@slave2:/usr/			#将hadoop文件分发到slave2节点

Validate environment variables of two slave nodes

source /etc/profile		#slave1和slave2都要执行

8 Format hadoop (operate only in master)

First check whether jps has started hadoop, if not, format it

hadoop namenode -format

When "Exiting with status 0" appears, it indicates that the formatting is successful.
Insert picture description here

9 Start the hadoop cluster

Only enable operation commands on the master host. It will start the slave node. (Operate only in master)

cd /usr/hadoop/hadoop-2.7.3		#回到hadoop目录
sbin/start-all.sh		#主节点开启服务

master
Insert picture description here
slave1
Insert picture description here
slave2
Insert picture description here

Pay attention to the process difference of each node!
If the process of each node is as above, then your hadoop is completely distributed and built!

Guess you like

Origin blog.csdn.net/weixin_47580081/article/details/108647420