Starting from scratch: one-stop service from installing virtual machines to installing hadoop and building clusters

Getting started with Hadoop: Install hadoop to build a cluster

Preparation

VMware Workstation
CentOS-7 image
Remote connection tools (xshell, MobaXterm (what I use now), etc.)
hadoop installation package, jdk installation package
链接:https://pan.baidu.com/s/13rWvSjP9ukoIOq-Nr6UDfg 
提取码:2580 

Install virtual machine

Select typical

Please add image description

Choose to install the operating system later

Please add image description

Select Linux version CentOS 7 64-bit

Please add image description

Size the disk according to your needs, and then choose to split the disk into multiple files.

Please add image description

Click Finish

Please add image description

Edit virtual machine settings

Please add image description

Select CD/DVD (IDE), click Use ISO image file, select the location of your centos image, and click OK

Please add image description

Start the virtual machine

Please add image description

Choose the first one and install CentOS7

Please add image description

Choose Chinese

Please add image description

Select the installation source and verify the image integrity after entering it.

Please add image description

Select installation location

Please add image description

Click on the hard drive and click Finish

Please add image description

Select network and hostname

Please add image description

Open Ethernet, modify the hostname and apply

Please add image description

Enter the settings configured to ipv4

Please add image description

First change the method to manual and add an ip. The first three digits of the ip should be the same as the first three digits of ipv4 on your Windows. The following numbers should be between 0 and 255. I will write 20 here (don’t be confused with others). conflict), the subnet mask is fixed, the gateway and DNS server just add a 2 to the first three digits, then save, click Done in the upper left corner
Windows can check your own IP address. Win+R, enter cmd and then enter ipconfig. You can see it.

Please add image description
Please add image description

start installation

Please add image description

*Set root password (remember it yourself)

Please add image description

Create user

Please add image description

The name is arbitrary, the password must be remembered, and this account must be set as an administrator.

Please add image description

Then wait for the installation to complete and restart the virtual machine

Please add image description
Please add image description

Log in as root after booting

Please add image description

There is no ifconfig command in our CentOs and you need to download it yourself.
//先输入这个命令
 yum search ifconfig
 
 //会出现下面这种情况
 已加载插件:fastestmirror
Loading mirror speeds from cached hostfile
  base: mirrors.neusoft.edu.cn
 extras: mirrors.neusoft.edu.cn
 updates: mirrors.neusoft.edu.cn
============ 匹配:ifconfig ==============
net-tools.x86_64 : Basic networking tools

//然后安装匹配到的ifconfig,安装这个即可
yum install net-tools.x86_64

Please add image description

Use ifconfig to check your IP address and find out why the IP address above is not the same as the IP we set?

Please add image description

Use vi editor to edit the configuration file of the network card
 vi  /etc/sysconfig/network-scripts/ifcfg-ens33

Please add image description

Change dhcp in BOOTPROTO="dhcp" to static

(For those who don’t know the vi command, here’s a look: Just use the up, down, left, and right keys on the keyboard to move the cursor behind the word you want to modify, and press i to enter insert mode to modify the text. After modifying and confirming that there is no problem, press the esc key first. Exit insert mode, go to command mode, and then press shift+; go to last line mode and enter wq (w means to save q or exit, save and exit))
Please add image description

After modification, you need to restart the network card
service network restart
Use ifconfig to view ip

Please add image description

If there is no problem, shut down the virtual machine, click the virtual machine option on the toolbar, select Management and then clone hadoop1

Please add image description

Select the current state of the virtual machine

Please add image description

Create a full clone

Please add image description

Modify the name and location of the virtual machine and click Finish

Please add image description
Please add image description
Please add image description

After completion, do not turn on the computer yet. Edit the settings of hadoop2 first. Click on the network adapter and select Advanced.

Please add image description
Please add image description

Click to generate a new Mac address (because the cloned Mac address is the same, you need to regenerate a new Mac address)

Please add image description

After the change, start hadoop2 and check the ip. Because of the cloning, the ip address also needs to be modified.

Please add image description

Enter the configuration file again
vi  /etc/sysconfig/network-scripts/ifcfg-ens33
Modify the IP address, exit and then restart the network card (the IP addresses should be connected as much as possible, my hadoop1 is 20, I use 21 for hadoop2, and so on)
service network restart

Please add image description

Clone another hadoop3. The process is the same as hadoop2. After completion, start building the hadoop cluster (note that the Mac address must be regenerated, not the same as the Mac address of the first two machines, and the IP address must also be changed)
Modify the host name, change the second and third computers to server01 and server02 respectively, and restart the virtual machine after modification.
hostnamectl  set-hostname  要修改的主机名
Modify the mapping of hosts. The purpose of this is to let the three virtual machines know each other (all three must be changed)
vi /etc/hosts
添加下面三行
192.168.29.20    master1.com      master1
192.168.29.21    server01.com      server01
192.168.29.22    server02.com      server02

Please add image description

How to test whether they know each other? By adding the host name after the ping command, if there is a response, there will be no problem.

Please add image description

Turn off the firewall (all three)
查看防护墙状态
Systemctl   status  firewalld
active (running说明防火墙在开着的

Please add image description

关闭防火墙
systemctl stop firewalld

Please add image description

永久关闭防火墙(因为上面的呢那种方法重启后防火墙会继续开启,而永久关闭防火墙是将配置文件改了,建议先关闭防火墙然后在永久关闭)
systemctl disable firewalld.service
Give users password-free root permissions (try not to use root on Linux, generally use administrators created by yourself, so the administrator should have the same permissions as root)
sudo vi /etc/sudoers 
在第一行添加(xiaoyu是我创建管理员的那个名字,要根据你设置的用户名来)
xiaoyu ALL=(root)NOPASSWD:ALL

Please add image description

Install jdk

Use the remote connection tool to connect to the virtual machine. Use an ordinary user (that is, the administrator) or switch via su xiaoyu (this is my ordinary username, change it to yours).

Please add image description

Because this is a newly created virtual machine, there is nothing now. We need to create two new folders, models (used to store the decompressed software) and software (used to store the installation package).

Please add image description

Go into softwares, drag in the jdk installation package and modify the executable permissions of the jdk compressed package.
chmod 764 jdk-8u191-linux-x64.tar.gz..
Then put the jdk decompressed models into
tar -zxvf jdk-8u191-linux-x64.tar.gz -C ../models

Please add image description

Configure jdk environment variables
sudo vi /etc/profile
在最后两行添加
export JAVA_HOME=/home/xiaoyu/models/jdk1.8.0_191      (是自己jdk的安装目录)
export PATH=$PATH:$JAVA_HOME/bin:

Please add image description

Refresh the configuration file after saving
source etc/profile
Then enter Java or javac. When a long list of things appears, it means that the jdk installation is successful (the other two must also be installed)

Please add image description

Set up SSH password-free login
用以生成公钥私钥的秘钥对
ssh-keygen -t rsa
然后四次回车

Please add image description

After success, send the ssh key to yourself and the other two virtual machines.
ssh-copy-id  master1

ssh-copy-id  server01

ssh-copy-id  server02

Please add image description

After sending, use ssh to test it.
ssh master1 

ssh server01

ssh server02

Please add image description

Install hadoop (key points)

As usual, first drag the hadoop installation package into softwares, and then unzip it into models.

tar -zxvf ./hadoop-2.7.1.tar.gz  -C ../ models/

Please add image description

Modify environment variables
sudo vi /ect/profile

没有修改之前
export JAVA_HOME=/home/xiaoyu/models/jdk1.8.0_191     
export PATH=$PATH:$JAVA_HOME/bin:

修改之后(添加了一个HADOOP_HOME,并在PATH后加了hadoop的bin目录和sbin目录)
export JAVA_HOME=/home/xiaoyu/models/jdk1.8.0_191
export HADOOP_HOME=/home/xiaoyu/models/hadoop-2.7.1
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:

Modify the configuration file of hadoop (novices recommend visual folder operations on the left)

Please add image description

hadoop 的配置文件都在/models/hadoop-2.7.1/etc/hadoop

Please add image description
Please add image description

Modify hadoop-env.sh

Double-click the hadoop-env.sh file to change the path of the jdk inside.
Please add image description
Here is a quick way to check the path of your own jdk.

echo $JAVA_HOME

Please add image description

Modify core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://master1:8020</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>

Please add image description

Modify hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/xiaoyu/hadoopdata/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/xiaoyu/hadoopdata/dfs/data</value>
</property>
<property>
	<name>fs.checkpoint.dir</name>
	<value>/home/xiaoyu/hadoopdata/checkpoint/dfs/cname</value>
</property>
<property>
	<name>fs.checkpoint.edits.dir</name>
	<value>/home/xiaoyu/hadoopdata/checkpoint/dfs/cname</value>
</property>
<property>
   <name>dfs.http.address</name>
   <value>master1:50070</value>
</property>
<property>
   <name>dfs.secondary.http.address</name>
   <value>server01:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

Please add image description

Modify the mapred-site.xml file

There is no mapred-site.xml file in the configuration file, but there is mapred-site.xml.template file. Rename the
mapred-site.xml.template file to mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master1:19888</value>
</property>

Please add image description

Modify yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master1:8088</value>
</property>

Please add image description

Modify slaves file
master1
server01
server02

Please add image description

Create the namenode metadata storage directory (the other two nodes must also be created)

The namenode metadata storage directory is configured in the hdfs-site.xml configuration file, so we need to create the hadoopdata folder.
Please add image description
Copy the entire local hadoop directory to the other two virtual machines.
First enter the models folder where hadoop is stored, and then Enter the following command

//scp -r  要发送的文件  用户名@主机名:要发送到的路径
scp -r ./hadoop-2.7.1/  xiaoyu@server01:/home/xiaoyu/models/
scp -r ./hadoop-2.7.1/  xiaoyu@server02:/home/xiaoyu/models/

Format hadoop
hadoop namenode -format

If successful appears, it means success.
Please add image description

Start hadoop
start-all.sh

Please add image description

Access hadoop web port
http://192.168.29.20:50070

When you see the following page, it means you have succeeded.
Please add image description
Then click Live Nodes on the page.
Please add image description
Our three nodes are displayed above.

Test whether Hadoop's HDFS can be used

Create a 1.txt file

//把1.txt文件上传到HDFS上
hdfs dfs -put ./1.txt /

Please add image description
See the 1.txt we uploaded above, successful
Please add image description

Finally, if you have any questions, you can contact me
QQ: 1031248402

Guess you like

Origin blog.csdn.net/weixin_57283233/article/details/121398662