Getting started with Hadoop: Install hadoop to build a cluster
Preparation
VMware Workstation
CentOS-7 image
Remote connection tools (xshell, MobaXterm (what I use now), etc.)
hadoop installation package, jdk installation package
链接:https://pan.baidu.com/s/13rWvSjP9ukoIOq-Nr6UDfg
提取码:2580
Install virtual machine
Select typical
Choose to install the operating system later
Select Linux version CentOS 7 64-bit
Size the disk according to your needs, and then choose to split the disk into multiple files.
Click Finish
Edit virtual machine settings
Select CD/DVD (IDE), click Use ISO image file, select the location of your centos image, and click OK
Start the virtual machine
Choose the first one and install CentOS7
Choose Chinese
Select the installation source and verify the image integrity after entering it.
Select installation location
Click on the hard drive and click Finish
Select network and hostname
Open Ethernet, modify the hostname and apply
Enter the settings configured to ipv4
First change the method to manual and add an ip. The first three digits of the ip should be the same as the first three digits of ipv4 on your Windows. The following numbers should be between 0 and 255. I will write 20 here (don’t be confused with others). conflict), the subnet mask is fixed, the gateway and DNS server just add a 2 to the first three digits, then save, click Done in the upper left corner
Windows can check your own IP address. Win+R, enter cmd and then enter ipconfig. You can see it.
start installation
*Set root password (remember it yourself)
Create user
The name is arbitrary, the password must be remembered, and this account must be set as an administrator.
Then wait for the installation to complete and restart the virtual machine
Log in as root after booting
There is no ifconfig command in our CentOs and you need to download it yourself.
//先输入这个命令
yum search ifconfig
//会出现下面这种情况
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile
base: mirrors.neusoft.edu.cn
extras: mirrors.neusoft.edu.cn
updates: mirrors.neusoft.edu.cn
============ 匹配:ifconfig ==============
net-tools.x86_64 : Basic networking tools
//然后安装匹配到的ifconfig,安装这个即可
yum install net-tools.x86_64
Use ifconfig to check your IP address and find out why the IP address above is not the same as the IP we set?
Use vi editor to edit the configuration file of the network card
vi /etc/sysconfig/network-scripts/ifcfg-ens33
Change dhcp in BOOTPROTO="dhcp" to static
(For those who don’t know the vi command, here’s a look: Just use the up, down, left, and right keys on the keyboard to move the cursor behind the word you want to modify, and press i to enter insert mode to modify the text. After modifying and confirming that there is no problem, press the esc key first. Exit insert mode, go to command mode, and then press shift+; go to last line mode and enter wq (w means to save q or exit, save and exit))
After modification, you need to restart the network card
service network restart
Use ifconfig to view ip
If there is no problem, shut down the virtual machine, click the virtual machine option on the toolbar, select Management and then clone hadoop1
Select the current state of the virtual machine
Create a full clone
Modify the name and location of the virtual machine and click Finish
After completion, do not turn on the computer yet. Edit the settings of hadoop2 first. Click on the network adapter and select Advanced.
Click to generate a new Mac address (because the cloned Mac address is the same, you need to regenerate a new Mac address)
After the change, start hadoop2 and check the ip. Because of the cloning, the ip address also needs to be modified.
Enter the configuration file again
vi /etc/sysconfig/network-scripts/ifcfg-ens33
Modify the IP address, exit and then restart the network card (the IP addresses should be connected as much as possible, my hadoop1 is 20, I use 21 for hadoop2, and so on)
service network restart
Clone another hadoop3. The process is the same as hadoop2. After completion, start building the hadoop cluster (note that the Mac address must be regenerated, not the same as the Mac address of the first two machines, and the IP address must also be changed)
Modify the host name, change the second and third computers to server01 and server02 respectively, and restart the virtual machine after modification.
hostnamectl set-hostname 要修改的主机名
Modify the mapping of hosts. The purpose of this is to let the three virtual machines know each other (all three must be changed)
vi /etc/hosts
添加下面三行
192.168.29.20 master1.com master1
192.168.29.21 server01.com server01
192.168.29.22 server02.com server02
How to test whether they know each other? By adding the host name after the ping command, if there is a response, there will be no problem.
Turn off the firewall (all three)
查看防护墙状态
Systemctl status firewalld
active (running说明防火墙在开着的
关闭防火墙
systemctl stop firewalld
永久关闭防火墙(因为上面的呢那种方法重启后防火墙会继续开启,而永久关闭防火墙是将配置文件改了,建议先关闭防火墙然后在永久关闭)
systemctl disable firewalld.service
Give users password-free root permissions (try not to use root on Linux, generally use administrators created by yourself, so the administrator should have the same permissions as root)
sudo vi /etc/sudoers
在第一行添加(xiaoyu是我创建管理员的那个名字,要根据你设置的用户名来)
xiaoyu ALL=(root)NOPASSWD:ALL
Install jdk
Use the remote connection tool to connect to the virtual machine. Use an ordinary user (that is, the administrator) or switch via su xiaoyu (this is my ordinary username, change it to yours).
Because this is a newly created virtual machine, there is nothing now. We need to create two new folders, models (used to store the decompressed software) and software (used to store the installation package).
Go into softwares, drag in the jdk installation package and modify the executable permissions of the jdk compressed package.
chmod 764 jdk-8u191-linux-x64.tar.gz..
Then put the jdk decompressed models into
tar -zxvf jdk-8u191-linux-x64.tar.gz -C ../models
Configure jdk environment variables
sudo vi /etc/profile
在最后两行添加
export JAVA_HOME=/home/xiaoyu/models/jdk1.8.0_191 (是自己jdk的安装目录)
export PATH=$PATH:$JAVA_HOME/bin:
Refresh the configuration file after saving
source etc/profile
Then enter Java or javac. When a long list of things appears, it means that the jdk installation is successful (the other two must also be installed)
Set up SSH password-free login
用以生成公钥私钥的秘钥对
ssh-keygen -t rsa
然后四次回车
After success, send the ssh key to yourself and the other two virtual machines.
ssh-copy-id master1
ssh-copy-id server01
ssh-copy-id server02
After sending, use ssh to test it.
ssh master1
ssh server01
ssh server02
Install hadoop (key points)
As usual, first drag the hadoop installation package into softwares, and then unzip it into models.
tar -zxvf ./hadoop-2.7.1.tar.gz -C ../ models/
Modify environment variables
sudo vi /ect/profile
没有修改之前
export JAVA_HOME=/home/xiaoyu/models/jdk1.8.0_191
export PATH=$PATH:$JAVA_HOME/bin:
修改之后(添加了一个HADOOP_HOME,并在PATH后加了hadoop的bin目录和sbin目录)
export JAVA_HOME=/home/xiaoyu/models/jdk1.8.0_191
export HADOOP_HOME=/home/xiaoyu/models/hadoop-2.7.1
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:
Modify the configuration file of hadoop (novices recommend visual folder operations on the left)
hadoop 的配置文件都在/models/hadoop-2.7.1/etc/hadoop
Modify hadoop-env.sh
Double-click the hadoop-env.sh file to change the path of the jdk inside.
Here is a quick way to check the path of your own jdk.
echo $JAVA_HOME
Modify core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://master1:8020</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
Modify hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/xiaoyu/hadoopdata/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/xiaoyu/hadoopdata/dfs/data</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/home/xiaoyu/hadoopdata/checkpoint/dfs/cname</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>/home/xiaoyu/hadoopdata/checkpoint/dfs/cname</value>
</property>
<property>
<name>dfs.http.address</name>
<value>master1:50070</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>server01:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
Modify the mapred-site.xml file
There is no mapred-site.xml file in the configuration file, but there is mapred-site.xml.template file. Rename the
mapred-site.xml.template file to mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master1:19888</value>
</property>
Modify yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master1:8088</value>
</property>
Modify slaves file
master1
server01
server02
Create the namenode metadata storage directory (the other two nodes must also be created)
The namenode metadata storage directory is configured in the hdfs-site.xml configuration file, so we need to create the hadoopdata folder.
Copy the entire local hadoop directory to the other two virtual machines.
First enter the models folder where hadoop is stored, and then Enter the following command
//scp -r 要发送的文件 用户名@主机名:要发送到的路径
scp -r ./hadoop-2.7.1/ xiaoyu@server01:/home/xiaoyu/models/
scp -r ./hadoop-2.7.1/ xiaoyu@server02:/home/xiaoyu/models/
Format hadoop
hadoop namenode -format
If successful appears, it means success.
Start hadoop
start-all.sh
Access hadoop web port
http://192.168.29.20:50070
When you see the following page, it means you have succeeded.
Then click Live Nodes on the page.
Our three nodes are displayed above.
Test whether Hadoop's HDFS can be used
Create a 1.txt file
//把1.txt文件上传到HDFS上
hdfs dfs -put ./1.txt /
See the 1.txt we uploaded above, successful
Finally, if you have any questions, you can contact me
QQ: 1031248402