Hadoop-HA模式-部署步骤
1.在宿主机(192.168.15.47)上创建文件夹:/data/zkdocker/bigdata/download/ha 2.创建hadoopHA配置文件:core-site.xml,hdfs-site.xml,hosts_ansible.ini,mapred-site.xml,yarn-site.xml 3.在宿主机(192.168.15.47)目录/data/zkdocker/bigdata/download下载好软件: hadoop-2.7.7.tar.gz,jdk-8u201-linux-x64.tar.gz,zookeeper-3.4.14.tar.gz 4.宿主机上使用docker创建10个centos7镜像: docker run -it --name=hadoop01 --hostname=hadoop01 -v /data/zkdocker/bigdata:/tmp centos7 docker run -it --name=hadoop02 --hostname=hadoop02 -v /data/zkdocker/bigdata:/tmp centos7 docker run -it --name=hadoop03 --hostname=hadoop03 -v /data/zkdocker/bigdata:/tmp centos7 docker run -it --name=hadoop04 --hostname=hadoop04 -v /data/zkdocker/bigdata:/tmp centos7 docker run -it --name=hadoop05 --hostname=hadoop05 -v /data/zkdocker/bigdata:/tmp centos7 docker run -it --name=hadoop06 --hostname=hadoop06 -v /data/zkdocker/bigdata:/tmp centos7 docker run -it --name=hadoop07 --hostname=hadoop07 -v /data/zkdocker/bigdata:/tmp centos7 docker run -it --name=slave1 --hostname=slave1 -v /data/zkdocker/bigdata:/tmp centos7 docker run -it --name=slave2 --hostname=slave2 -v /data/zkdocker/bigdata:/tmp centos7 docker run -it --name=slave3 --hostname=slave3 -v /data/zkdocker/bigdata:/tmp centos7 5.宿主机进入容器命令: docker exec -it hadoop01 /bin/bash 6.每个容器中都执行安装ssh: yum install which -y yum install openssl openssh-server openssh-clients mkdir /var/run/sshd/ sed -i "s/UsePAM.*/UsePAM no/g" /etc/ssh/sshd_config ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key ssh-keygen -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key /usr/sbin/sshd -D & 7.每个容器中都创建hadoop用户 useradd hadoop passwd hadoop 密码:hadoop 8.配置10个容器的hosts vi /etc/hosts 9.进入hadoop账户 su - hadoop 10.ssh免密登录: 每个容器都执行:ssh-keygen -t rsa ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop01 hadoop01,hadoop02上执行:ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop01 ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop02 以此类推... 11.宿主机上用ansible复制xml yum install -y ansible vi /etc/ansible/hosts ansible all -m ping su - hadoop ansible all -m copy -a "src=/data/zkdocker/bigdata/download/ha/core-site.xml dest=/home/hadoop/hadoop/etc/hadoop/ owner=hadoop mode=644" ansible all -m copy -a "src=/data/zkdocker/bigdata/download/ha/hdfs-site.xml dest=/home/hadoop/hadoop/etc/hadoop/ owner=hadoop mode=644" ansible all -m copy -a "src=/data/zkdocker/bigdata/download/ha/mapred-site.xml dest=/home/hadoop/hadoop/etc/hadoop/ owner=hadoop mode=644" ansible all -m copy -a "src=/data/zkdocker/bigdata/download/ha/yarn-site.xml dest=/home/hadoop/hadoop/etc/hadoop/ owner=hadoop mode=644" 12.每个容器都用hadoop账户安装java,hadoop,slave1-3安装zookeeper mkdir /home/hadoop/javalib18 cd /home/hadoop/javalib18 tar -zxf /tmp/download/jdk-8u201-linux-x64.tar.gz mv jdk1.8.0_201/ jdk cd .. tar -zxf /tmp/download/hadoop-2.7.7.tar.gz mv hadoop-2.7.7/ hadoop tar -zxf zookeeper-3.4.14.tar.gz mv zookeeper-3.3.14/ zookeeper 13.每个容器都配置~/.bashrc vi ~/.bashrc export JAVA_HOME=/home/hadoop/javajdk18/jdk export ZOOKEEPER_HOME=/home/hadoop/zookeeper export HADOOP_HOME=/home/hadoop/hadoop export PATH=$PATH:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin 14.source编译 source ~/.bashrc 15.在slave1,slave2,slave3分别启动zookeeper zkServer.sh start zkServer.sh status 验证成功:两个follower,一个leader 16.在slave1,slave2,slave3分别启动journalnode hadoop-daemon.sh start journalnode ps ajx|grep java|awk '{print $11}'|cut -d _ -f 2 验证成功:三个journalnode进程 17.在hadoop01容器中格式化第一个namenode: hdfs namenode -format hadoop-daemon.sh start namenode 验证成功:一个namenode进程 18.在hadoop02容器中同步第一个namenode的数据 hdfs namenode -bootstrapStandby hadoop-daemon.sh start namenode 验证成功:两个个namenode进程 19.web查看namenode http://hadoop01:50070 --> 只是都在standby状态 http://hadoop02:50070 --> 只是都在standby状态 20.在hadoop01容器中手动切换nn1为激活状态 hdfs haadmin -transitionToActive nn1 -->此命令激活失败,必须强制切换 hdfs haadmin -transitionToActive --forcemanual nn1 验证成功:http://hadoop01:50070 --> active状态 http://hadoop02:50070 --> 还是standby状态 hdfs haadmin -getServiceState nn1 --> active hdfs haadmin -getServiceState nn2 --> standby 21.在zookeeper上配置故障自动转移节点 hdfs zkfc -formatZK 在slave1上运行zkCli.sh: ls / [zookeeper, hadoop-ha] 22.在hadoop01容器中启动集群 start-dfs.sh 目前所有容器中的进程情况: hadoop01 namenode zkfc hadoop02 namenode zkfc hadoop03 hadoop04 hadoop05 datanode hadoop06 datanode hadoop07 datanode slave1 journalnode zookeeper slave2 journalnode zookeeper slave3 journalnode zookeeper 23.验证高可用,kill一个nn1: hadoop01中kill -9 namenode的进程号 hdfs haadmin -getServiceState nn1 --> 失败 hdfs haadmin -getServiceState nn2 --> active 24.重新启动,同步nn2, 再启动nn1: hdfs namenode -bootstrapStandby hadoop-daemon.sh start namenode 25.用ansible批量建立互信(没有用好,无效果) playbook:pushssh.yaml --- - hosts: all user: hadoop tasks: - name: ssh-copy authorized_key: user=hadoop key="{{ lookup('file', '/home/hadoop/.ssh/id_rsa.pub') }}" 运行命令:ansible-playbook pushssh.yaml 26.Yarn Ha启动 在hadoop03容器中启动:start-yarn.sh 在hadoop04容器中启动:start-yarn.sh 27.检测rm状态 bin/yarn rmadmin -getServiceState rm1 --> active bin/yarn rmadmin -getServiceState rm2 --> standby 在hadoop03容器中kill掉rm1 bin/yarn rmadmin -getServiceState rm1 --> 离线 bin/yarn rmadmin -getServiceState rm2 --> active 启动rm1 在hadoop03容器中启动:sbin/yarn-daemon.sh start resourcemanager bin/yarn rmadmin -getServiceState rm1 --> standby bin/yarn rmadmin -getServiceState rm2 --> active