Huawei Cloud Yaoyun Server L instance evaluation | Deploy Hadoop in Docker environment

Preface

​ This blog will introduce 云耀云服务器L实例how to deploy Docker container Hadoop on the server. Hadoop is an open source distributed computing framework for processing large-scale data sets. By using Docker, we can easily deploy Hadoop in any environment without worrying about dependencies and configuration issues. This blog will detail how to install and configure Hadoop in Docker. Whether you are a beginner or an experienced developer, this blog will provide you with a Docker中部署Hadoopdetailed guide on getting started.

​ This is Maynorthe third article of Huawei Cloud Yaoyun Server L instance evaluation created by Huawei Cloud Evaluation Series Portal:

Huawei Cloud Yaoyun Server L instance evaluation | Deploying ClickHouse21.1.9.41 database in a single node environment

Huawei Cloud Yaoyun Server L instance evaluation | Deploying hadoop2.10.1 in a pseudo-distributed environment

Introduction to Yunyao Cloud Server L Instance

​It is云耀云服务器L实例 a new generation of lightweight application cloud server, specially built for small and medium-sized enterprises and developers, providing out-of-the-box convenience. 云耀云服务器L实例It provides rich and carefully selected application images, which can be deployed with one click, greatly simplifying the process of customers building e-commerce websites, web applications, applets, learning environments, and various development and testing tasks in the cloud.

Introduction to Docker

Docker is a container 开源的容器化平台that helps developers package applications and their dependencies into a self-contained container for fast, reliable, and portable application deployment. The core concept of Docker is a container, which is a lightweight, portable, self-contained software unit that contains all the components needed to run an application, such as 代码、运行时环境、系统工具和系统库.

1. Configuration environment

Purchase Yunyao Cloud Server L instance

On the Yunyao Cloud Server L instance details page, click Purchase.

image-20230915164709448

  • Check configuration and confirm purchase.

image-20230915164730739

Check the status of Yunyao Cloud Server L instance

Check the status of the purchased Yunyao Cloud Server L instance and it is running normally.

image-20230915165006300

reset Password

To reset the password, click the reset password option. Identity verification is required. After selecting mobile phone verification, the password can be reset successfully.

image-20230915165053276

Check the elastic public IP address

  • Copy the elastic public IP address and use it when connecting to the server remotely.

image-20230915165639764

FinalShell connection server

In the FinalShell tool, fill in the server's elastic public IP address, account and password information, and connect to the remote server via ssh.

image-20230915165703665

2. docker installation

Configure CentOS7 Alibaba Cloud yum source

cd /etc/yum.repos.d/
mv CentOS-Base.repo CentOS-Base.repo.bak
wget -O CentOs-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo

yum source update

yum clean all
yum makecache

Install the dependency packages required by Docker

yum install -y yum-utils device-mapper-persistent-data lvm2

Configure Alibaba Cloud Docker yum source

yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

Check Docker version

yum list docker-ce --showduplicates

Install Docker version 18.03.0

yum install docker-ce-18.03.0.ce

Start the Docker service

systemctl enable docker
systemctl start docker
systemctl status docker

3. Create hadoop container

Host environment preparation

Pull image

docker pull centos:7

Enter the directory where the installation package is stored

cd /mnt/docker_share

Upload jdk and hadoop

  • Prerequisite: Install the upload software tool (yum install lrzsz)

rz jdk*.tar.gz;rz hadoop*.tar.gz

Unzip the package

  • Unzip it to the opt directory. We will map the software packages in this directory to the docker container later.

    tar -xvzf jdk-8u141-linux-x64.tar.gz -C /opt
    tar -xvzf hadoop-2.7.0.tar.gz -C /opt
    

Create a folder to save data

mkdir -p /data/dfs/nn
mkdir -p /data/dfs/dn

image-20230918193712822

Container environment preparation

Start hadoop container

  • Note that you must add --privileged=true, otherwise system services cannot be used.
docker run \
--net docker-bd0 --ip 172.33.0.121 \
-p 50070:50070 -p 8088:8088 -p 19888:19888 \
-v /mnt/docker_share:/mnt/docker_share \
-v /etc/hosts:/etc/hosts \
-v /opt/hadoop-2.7.0:/opt/hadoop-2.7.0 \
-v /opt/jdk1.8.0_141:/opt/jdk1.8.0_141 \
-v /data/dfs:/data/dfs \
--privileged=true \
-d  -it --name hadoop centos:7 \
/usr/sbin/init

NOTE: Make sure SELinux is disabled on the host

Enter the hadoop container

docker exec -it hadoop bash

image-20230918193838301

Install vim

  • In order to facilitate subsequent editing of the configuration file, install a vim

    yum install -y vim

Install ssh

  • Because starting a Hadoop cluster requires password-free login, Centos7 containers need to install ssh.
yum install -y openssl openssh-server
yum install -y openssh-client*
  • Modify ssh configuration file

    vim /etc/ssh/sshd_config
    # 在文件最后添加
    PermitRootLogin yes
    RSAAuthentication yes
    PubkeyAuthentication yes
    
  • Start ssh service

    systemctl start sshd.service
    # 设置开机自动启动ssh服务
    systemctl enable sshd.service
    # 查看服务状态
    systemctl status sshd.service
    

Configure password-free login

Generate secret key

ssh-keygen

set password

  • Set the root user's password to 123456

    passwd

copy public key

ssh-copy-id hadoop.bigdata.cn

Test password-free login

ssh hadoop.bigdata.cn

Configure JDK

vim /etc/profile
# 配置jdk的环境变量
export JAVA_HOME=/opt/jdk1.8.0_141
export CLASSPATH=${JAVA_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
# 让上一步配置生效
source /etc/profile

image-20230918193916756

ConfigureHadoop

  • core-site.xml

      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop.bigdata.cn:9000</value>
      </property>
      <property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.http.staticuser.user</name>
        <value>root</value>
      </property>
    
  • hdfs-site.xml

      <property>
        <name>dfs.namenode.http-address</name>
        <value>hadoop.bigdata.cn:50070</value>
      </property>
      <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop.bigdata.cn:50090</value>
      </property>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>
      <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///data/dfs/nn</value>
      </property>
      <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///data/dfs/dn</value>
      </property>
      <property>
        <name>dfs.permissions</name>
        <value>false</value>
      </property>
    
  • yarn-site.xml

      <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop-yarn</value>
      </property>
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
      <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop.bigdata.cn</value>
      </property>
      <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
      </property>
      <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/user/container/logs</value>
      </property>
    
  • mapred-site.xml

      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
      </property>
      <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop.bigdata.cn:10020</value>
      </property>
      <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop.bigdata.cn:19888</value>
      </property>
      <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/tmp/mr-history</value>
      </property>
      <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/tmp/mr-done</value>
      </property>
    
  • hadoop-env.sh

    export JAVA_HOME=/opt/jdk1.8.0_141
    
  • slaves

    hadoop.bigdata.cn
    
  • Configure environment variables

    • vim /etc/profile
    • source /etc/profile
    export HADOOP_HOME=/opt/hadoop-2.7.0
    export PATH=${HADOOP_HOME}/sbin:${HADOOP_HOME}/bin:$PATH
    

4. Initialize and start Hadoop

format hdfs

hdfs namenode -format

Start hadoop

start-all.sh 
# 启动history server
mr-jobhistory-daemon.sh start historyserver

image-20230918194219493

Test hadoop

cd $HADOOP_HOME
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar pi 2 1

image-20230918194247221

View progress

bash-4.1# jps
561 ResourceManager
659 NodeManager
2019 Jps
1559 NameNode
1752 SecondaryNameNode
249 DataNode

image-20230918194306053

5. Configure and open the container to start Hadoop

Create startup script

  • Create a new file to store the startup script

    touch /etc/bootstrap.sh
    chmod a+x /etc/bootstrap.sh
    vim /etc/bootstrap.sh
    
  • document content

    #!/bin/bash
    source /etc/profile
    cd /opt/hadoop-2.7.0
    start-dfs.sh
    start-yarn.sh
    mr-jobhistory-daemon.sh start historyserver
    

image-20230918194541916

Join the automatic startup service

vim /etc/rc.d/rc.local 
/etc/bootstrap.sh
# 开启执行权限
chmod 755 /etc/rc.d/rc.local

image-20230918194621739

Configure domain name mapping for the host

  • In order to facilitate future access, configure the host domain name mapping on the window, the hosts file in the C:\Windows\System32\drivers\etc directory

  • Add the following mappings (you can add all planned domain name mappings here)

    192.168.88.100 hadoop.bigdata.cn
    

6. View web ui

  • HDFS

    • http://192.168.88.100:50070

    image-20230919094943687

  • YARN

    • http://192.168.88.100:8088

    image-20230919095004408

  • Job History Server

    • http://192.168.88.100:19888

image-20230919095048474

Summarize

​ This blog introduces 云耀云服务器L实例the steps to deploy Hadoop using Docker. By purchasing a cloud server L instance and configuring the environment, we can easily install and configure Docker and create Hadoop容器. Within the container we can upload and unpack the required packages and create folders to save the data. Finally, we can verify the Hadoop installation and configuration by accessing the corresponding web UI. By using Docker, we can avoid dependency and configuration issues and easily 任何环境deploy Hadoop in .

Guess you like

Origin blog.csdn.net/xianyu120/article/details/133017067