Article directory
-
- Preface
- 1. Configuration environment
- 2. docker installation
- 3. Create hadoop container
-
- Host environment preparation
- Pull image
- Enter the directory where the installation package is stored
- Upload jdk and hadoop
- Unzip the package
- Create a folder to save data
- Container environment preparation
- Start hadoop container
- Enter the hadoop container
- Install vim
- Install ssh
- Configure password-free login
- Generate secret key
- set password
- copy public key
- Test password-free login
- Configure JDK
- ConfigureHadoop
- 4. Initialize and start Hadoop
- 5. Configure and open the container to start Hadoop
- 6. View web ui
- Summarize
Preface
This blog will introduce 云耀云服务器L实例
how to deploy Docker container Hadoop on the server. Hadoop is an open source distributed computing framework for processing large-scale data sets. By using Docker, we can easily deploy Hadoop in any environment without worrying about dependencies and configuration issues. This blog will detail how to install and configure Hadoop in Docker. Whether you are a beginner or an experienced developer, this blog will provide you with a Docker中部署Hadoop
detailed guide on getting started.
This is Maynor
the third article of Huawei Cloud Yaoyun Server L instance evaluation created by Huawei Cloud Evaluation Series Portal:
Introduction to Yunyao Cloud Server L Instance
It is云耀云服务器L实例
a new generation of lightweight application cloud server, specially built for small and medium-sized enterprises and developers, providing out-of-the-box convenience. 云耀云服务器L实例
It provides rich and carefully selected application images, which can be deployed with one click, greatly simplifying the process of customers building e-commerce websites, web applications, applets, learning environments, and various development and testing tasks in the cloud.
Introduction to Docker
Docker is a container 开源的容器化平台
that helps developers package applications and their dependencies into a self-contained container for fast, reliable, and portable application deployment. The core concept of Docker is a container, which is a lightweight, portable, self-contained software unit that contains all the components needed to run an application, such as 代码、运行时环境、系统工具和系统库
.
1. Configuration environment
Purchase Yunyao Cloud Server L instance
On the Yunyao Cloud Server L instance details page, click Purchase.
- Check configuration and confirm purchase.
Check the status of Yunyao Cloud Server L instance
Check the status of the purchased Yunyao Cloud Server L instance and it is running normally.
reset Password
To reset the password, click the reset password option. Identity verification is required. After selecting mobile phone verification, the password can be reset successfully.
Check the elastic public IP address
- Copy the elastic public IP address and use it when connecting to the server remotely.
FinalShell connection server
In the FinalShell tool, fill in the server's elastic public IP address, account and password information, and connect to the remote server via ssh.
2. docker installation
Configure CentOS7 Alibaba Cloud yum source
cd /etc/yum.repos.d/
mv CentOS-Base.repo CentOS-Base.repo.bak
wget -O CentOs-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
yum source update
yum clean all
yum makecache
Install the dependency packages required by Docker
yum install -y yum-utils device-mapper-persistent-data lvm2
Configure Alibaba Cloud Docker yum source
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
Check Docker version
yum list docker-ce --showduplicates
Install Docker version 18.03.0
yum install docker-ce-18.03.0.ce
Start the Docker service
systemctl enable docker
systemctl start docker
systemctl status docker
3. Create hadoop container
Host environment preparation
Pull image
docker pull centos:7
Enter the directory where the installation package is stored
cd /mnt/docker_share
Upload jdk and hadoop
- Prerequisite: Install the upload software tool (yum install lrzsz)
rz jdk*.tar.gz;rz hadoop*.tar.gz
Unzip the package
-
Unzip it to the opt directory. We will map the software packages in this directory to the docker container later.
tar -xvzf jdk-8u141-linux-x64.tar.gz -C /opt tar -xvzf hadoop-2.7.0.tar.gz -C /opt
Create a folder to save data
mkdir -p /data/dfs/nn
mkdir -p /data/dfs/dn
Container environment preparation
Start hadoop container
- Note that you must add --privileged=true, otherwise system services cannot be used.
docker run \
--net docker-bd0 --ip 172.33.0.121 \
-p 50070:50070 -p 8088:8088 -p 19888:19888 \
-v /mnt/docker_share:/mnt/docker_share \
-v /etc/hosts:/etc/hosts \
-v /opt/hadoop-2.7.0:/opt/hadoop-2.7.0 \
-v /opt/jdk1.8.0_141:/opt/jdk1.8.0_141 \
-v /data/dfs:/data/dfs \
--privileged=true \
-d -it --name hadoop centos:7 \
/usr/sbin/init
NOTE: Make sure SELinux is disabled on the host
Enter the hadoop container
docker exec -it hadoop bash
Install vim
-
In order to facilitate subsequent editing of the configuration file, install a vim
yum install -y vim
Install ssh
- Because starting a Hadoop cluster requires password-free login, Centos7 containers need to install ssh.
yum install -y openssl openssh-server
yum install -y openssh-client*
-
Modify ssh configuration file
vim /etc/ssh/sshd_config # 在文件最后添加 PermitRootLogin yes RSAAuthentication yes PubkeyAuthentication yes
-
Start ssh service
systemctl start sshd.service # 设置开机自动启动ssh服务 systemctl enable sshd.service # 查看服务状态 systemctl status sshd.service
Configure password-free login
Generate secret key
ssh-keygen
set password
-
Set the root user's password to 123456
passwd
copy public key
ssh-copy-id hadoop.bigdata.cn
Test password-free login
ssh hadoop.bigdata.cn
Configure JDK
vim /etc/profile
# 配置jdk的环境变量
export JAVA_HOME=/opt/jdk1.8.0_141
export CLASSPATH=${JAVA_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
# 让上一步配置生效
source /etc/profile
ConfigureHadoop
-
core-site.xml
<property> <name>fs.defaultFS</name> <value>hdfs://hadoop.bigdata.cn:9000</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> <property> <name>hadoop.http.staticuser.user</name> <value>root</value> </property>
-
hdfs-site.xml
<property> <name>dfs.namenode.http-address</name> <value>hadoop.bigdata.cn:50070</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop.bigdata.cn:50090</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///data/dfs/nn</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///data/dfs/dn</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property>
-
yarn-site.xml
<property> <name>yarn.resourcemanager.hostname</name> <value>hadoop-yarn</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop.bigdata.cn</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/user/container/logs</value> </property>
-
mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop.bigdata.cn:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop.bigdata.cn:19888</value> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/tmp/mr-history</value> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/tmp/mr-done</value> </property>
-
hadoop-env.sh
export JAVA_HOME=/opt/jdk1.8.0_141
-
slaves
hadoop.bigdata.cn
-
Configure environment variables
- vim /etc/profile
- source /etc/profile
export HADOOP_HOME=/opt/hadoop-2.7.0 export PATH=${HADOOP_HOME}/sbin:${HADOOP_HOME}/bin:$PATH
4. Initialize and start Hadoop
format hdfs
hdfs namenode -format
Start hadoop
start-all.sh
# 启动history server
mr-jobhistory-daemon.sh start historyserver
Test hadoop
cd $HADOOP_HOME
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar pi 2 1
View progress
bash-4.1# jps
561 ResourceManager
659 NodeManager
2019 Jps
1559 NameNode
1752 SecondaryNameNode
249 DataNode
5. Configure and open the container to start Hadoop
Create startup script
-
Create a new file to store the startup script
touch /etc/bootstrap.sh chmod a+x /etc/bootstrap.sh vim /etc/bootstrap.sh
-
document content
#!/bin/bash source /etc/profile cd /opt/hadoop-2.7.0 start-dfs.sh start-yarn.sh mr-jobhistory-daemon.sh start historyserver
Join the automatic startup service
vim /etc/rc.d/rc.local
/etc/bootstrap.sh
# 开启执行权限
chmod 755 /etc/rc.d/rc.local
Configure domain name mapping for the host
-
In order to facilitate future access, configure the host domain name mapping on the window, the hosts file in the C:\Windows\System32\drivers\etc directory
-
Add the following mappings (you can add all planned domain name mappings here)
192.168.88.100 hadoop.bigdata.cn
6. View web ui
-
HDFS
- http://192.168.88.100:50070
-
YARN
- http://192.168.88.100:8088
-
Job History Server
- http://192.168.88.100:19888
Summarize
This blog introduces 云耀云服务器L实例
the steps to deploy Hadoop using Docker. By purchasing a cloud server L instance and configuring the environment, we can easily install and configure Docker and create Hadoop容器
. Within the container we can upload and unpack the required packages and create folders to save the data. Finally, we can verify the Hadoop installation and configuration by accessing the corresponding web UI. By using Docker, we can avoid dependency and configuration issues and easily 任何环境
deploy Hadoop in .