Linux basic environment construction (CentOS7)-install Hadoop
1 Hadoop download and installation
Hadoop's position in the big data technology system is very important. Hadoop is the foundation of big data technology. The solid mastery of the basic knowledge of Hadoop will determine how far you go on the road of big data technology.
Hadoop download
Hadoop download link: https://pan.baidu.com/s/1q7Z6HLHJbq-HNjzVqljCNQ
Extraction code: h5bv
Transfer the downloaded installation package to the Linux virtual machine via Xftp
Hadoop installation
Create the working path /usr/hadoop, download the corresponding software, and unzip it to the working path.
mkdir /usr/hadoop #首先在根目录下建立工作路径/usr/hadoop
cd /opt/software #进入安装包的文件夹
tar -zxvf hadoop-2.7.3.tar.gz -C /usr/hadoop
2 Configure Hadoop environment variables (3 units)
vim /etc/profile
Add the following content:
#HADOOP
export HADOOP_HOME=/usr/hadoop/hadoop-2.7.3
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib
export PATH=$PATH:$HADOOP_HOME/bin
source /etc/profile #使profile生效
3 Configure each component of Hadoop (It is recommended to copy and paste the configuration file directly to prevent mistakes)
The various components of hadoop are configured using XML, and these files are stored in the etc/hadoop directory of hadoop.
1.hadoop-env.sh
cd $HADOOP_HOME/etc/hadoop
vim hadoop-env.sh
Enter the following to modify the java environment variables:
export JAVA_HOME=/usr/java/jdk1.8.0_171
Type "Esc" to exit the editing mode, and use the command ":wq" to save and exit.
2.core-site.xml
vim core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop/hadoop-2.7.3/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.checkpoint.period</name>
<value>60</value>
</property>
<property>
<name>fs.checkpoint.size</name>
<value>67108864</value>
</property>
</configuration>
master: the ip or mapping name of the master node.
9000: The ports configured on the master node and the slave node are both 9000.
3.mapred-site.xml
Hadoop does not have this file. You need to copy the mapred-site.xml.template sample file to mapred-site.xml and edit it:
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<configuration>
<property>
<!--指定Mapreduce运行在yarn上-->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4.yarn-site.xml
vim yarn-site.xml
<configuration>
<!-- 指定ResourceManager的地址-->
<property>
<name>yarn.resourcemanager.address</name>
<value>master:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:18030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:18141</value>
</property>
<!-- 指定reducer获取数据的方式-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
5.hdfs.site.xml
vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/hadoop/hadoop-2.7.3/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/hadoop/hadoop-2.7.3/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
dfs.replication: Because Hadoop is reliable, it will back up multiple texts, where value refers to the number of backups (less than or equal to the number of slave nodes).
6.slaves & master
Write the slave file and add the subnode slave1 and slave2;
vim slaves
Write the master file and add the master node master.
vim master
7 Synchronize other virtual machines
Distribute profile files and hadoop files to slave1 and slave2 nodes
scp -r /etc/profile root@slave1:/etc/profile #将环境变量profile文件分发到slave1节点
scp -r /etc/profile root@slave2:/etc/profile #将环境变量profile文件分发到slave2节点
scp -r /usr/hadoop root@slave1:/usr/ #将hadoop文件分发到slave1节点
scp -r /usr/hadoop root@slave2:/usr/ #将hadoop文件分发到slave2节点
Validate environment variables of two slave nodes
source /etc/profile #slave1和slave2都要执行
8 Format hadoop (operate only in master)
First check whether jps has started hadoop, if not, format it
hadoop namenode -format
When "Exiting with status 0" appears, it indicates that the formatting is successful.
9 Start the hadoop cluster
Only enable operation commands on the master host. It will start the slave node. (Operate only in master)
cd /usr/hadoop/hadoop-2.7.3 #回到hadoop目录
sbin/start-all.sh #主节点开启服务
master
slave1
slave2
Pay attention to the process difference of each node!
If the process of each node is as above, then your hadoop is completely distributed and built!