Preface: The master node (Master) is deployed on Ubuntu, and the slave node (slave1) is deployed on centos to configure such a Hadoop big data environment.
Pay attention to the network configuration in the previous blog! ! ! !
Add user group
CentOS operating system
1. Use the following code to add a hadoop user and set a password:
adduser hadoop
passwd hadoop
2. Add administrator privileges to hadoop users (subsequent experiments may use administrator privileges) using the following code:
chmod -v u+w /etc/sudoers
#Change sudoers file permissions before modifying it
vi /etc/sudoers
#Modify the sudoers file
chmod -v u+w /etc/sudoers
vi /etc/sudoers
After finding it, ## Allow root to run any commands anywhere
add the following code below the root line:
hadoop ALL=(ALL) ALL
#This line of code means to give the newly created hadoop user root privileges
After saving and exiting, change the permissions of the sudoers file back to use the following code:
hadoop ALL=(ALL) ALL
chmod -v u-w /etc/sudoers
#Change sudoers file permissions back to original state
chmod -v u-w /etc/sudoers
Ubuntu operating system
Create a hadoop user with the following command:
sudo useradd -m hadoop -s /bin/bash
Use the following command to set a password for the hadoop user:
sudo passwd hadoop
Give the hadoo user super privileges using the following command
sudo adduser hadoop sudo
password-free login
The principle of password-free login
There is no .ssh folder in each server at the beginning , and the .ssh folder will be automatically generated after using the shh login command once
There is no authorized_keys file initially in the .ssh folder . After generating the id_rsa.pub file, you can use the cat command to add it
First log in to each node in the cluster with ssh to create the .ssh folder, please switch to hadoop user here
Ubuntu OS use the following command :
su - hadoop
ssh localhost #登陆ssh
exit #退出ssh登陆
CentOS operating systems use the following commands:
su - hadoop
ssh slave1
exit
generate public key
Use the following command to generate a key file on each server in the cluster:
Use the following code in Ubuntu:
cd ~/.ssh
ssh-keygen -t rsa
In the process, you only need to press Enter all the way to the end
Use the following code in CentOS:
cd ~/.ssh
ssh-keygen -t rsa
Same as Ubuntu
transfer public keys to each other
Use the following code in Ubuntu system:
scp -r id_rsa.pub hadoop@slave1:/home/hadoop #将Master节点的公钥传输给slave1
This process requires entering the password of the hadoop user in slave1
Use the following code in CentOS system
scp -r id_rsa.pub hadoop@Master:/home/hadoop #将slave1节点的公钥传输给Master
Similarly, you need to enter the hadoop user password in the Master
Mutually put the public key into the verification file
Note that the public keys just transferred here are placed in the /home/hadoop folder
But the authorized_keys file takes effect only in the /home/hadoop/.ssh folder
Use the following command in Ubuntu system:
cat /home/hadoop/id_rsa.pub >> authorized_keys
Similarly, use the following command in the CentOS system:
cat /home/hadoop/id_rsa.pub >> authorized_keys
Test whether password-free login is effective
Use the following command in CentOS system:
ssh Master
exit
Password-free login successful
Use the following command in Ubuntu system:
ssh slave1
注意此处我们的免密登陆是
不成功
的,原因是因为
我们的CentOS系统中的.ssh文件夹和authorized
There is a problem with the _keys file permissions
Modify permissions with the following command
chmod 0755 /home/hadoop
chmod 700 /home/hadoop/.ssh
chmod 600 /home/hadoop/.ssh/authorized_keys
Go to Master again and try to log in without password:
ssh slave1
Password-free login successful
Install and configure Hadoop
Because our CentOS does not have jdk yet, we will decompress and install jdk first
1. Transfer the installation package, by the way, also transfer hadoop here (the same operation in Ubuntu system)
2. Unzip jdk and configure environment files
cd ~
tar -zxf jdk-8u121-linux-x64.tar.gz
cd /usr/lib
sudo mkdir jdk #注意使用sudo命令创建jdk文件夹
sudo mv ~/jdk1.8.0_121/ /usr/lib/jdk #移动jdk到/usr/lib/jdk文件夹中
Edit the /etc/profile environment variable (add the hadoop environment variable directly here)
Because the /etc/profile file can only be modified by super users, we switch back to super users
su -
vi /etc/profile
Add the following environment variable code:
#java setting
export JAVA_HOME=/usr/lib/jdk/jdk1.8.0_121
export CLASSPATH=$JAVA_HOME/lib/
export PATH=$PATH:$JAVA_HOME/bin
export PATH JAVA_HOME CLASSPATH
#hadoop setting
export HADOOP_HOME=/home/hadoop/hadoop-2.7.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile #刷新环境变量文件
java -version
Unzip and configure hadoop environment variables
1. First unzip hadoop on the Ubuntu system
tar -zxf hadoop-2.7.1.tar.gz
2. Edit hadoop environment variables
vim ~/.bashrc
Add the following environment variable code in .bashrc (actually the same as in CentOS)
#java setting
export JAVA_HOME=/usr/lib/jdk/jdk1.8.0_121
export CLASSPATH=$JAVA_HOME/lib/
export PATH=$PATH:$JAVA_HOME/bin
export PATH JAVA_HOME CLASSPATH
#hadoop setting
export HADOOP_HOME=/home/hadoop/hadoop-2.7.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Configure hadoop file
1. Configure the filesenv.sh
vim /home/hadoop/hadoop-2.7.1/etc/hadoop/hadoop-env.sh
vim /home/hadoop/hadoop-2.7.1/etc/hadoop/mapred-env.sh
vim /home/hadoop/hadoop-2.7.1/etc/hadoop/yarn-env.sh
2. Configure the core-site.xml file
vim /home/hadoop/hadoop-2.7.1/etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoopfile</value>
</property>
vim /home/hadoop/hadoop-2.7.1/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
· <value>Master:50090</value>
</property>
mv /home/hadoop/hadoop-2.7.1/etc/hadoop/mapred-site.template.xml /home/hadoop/hadoop-2.7.1/etc/hadoop/mapred-site.xml #因为mapred.site文件是个模板所以需要利用模板生成一个可用文件
vim /home/hadoop/hadoop-2.7.1/etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
vim /home/hadoop/hadoop-2.7.1/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
vim /home/hadoop/hadoop-2.7.1/etc/hadoop/slaves
Master
slave1
Transfer the configured file to slave1
scp -r /home/hadoop/hadoop-2.7.1 hadoop@slave1:/home/hadoop
Format hdfs to start hadoop cluster
cd /home/hadoop/hadoop-2.7.1/sbin
hdfs namenode -format #再次重申不要多次使用这个命令,只有安装好hadoop第一次启动前才使用
Start hadoop cluster
cd /home/hadoop/hadoop-2.7.1/sbin
start-all.sh