hadoop2.0在线升级，不停止hadoop集群

介绍

HDFS 滚动升级允许单个的hdfs节点（守护进程）进行升级。例如，datanodes 节点可以单独升级不影响namenodes。反之亦然。

升级

在hadoop2.0版本，hdfs 支持 name services的ha功能，并且是强一致性的。这两个特性可以让我们有机会实现升级hdfs集群而不需要关闭hdfs服务。只有做了HA的集群才可以滚动升级。
如果在新的版本中有新的功能，并且这个功能不能在旧版本中使用，这种情况请遵循以下步骤：

1. 关闭新功能
2. 升级集群
3. 开启新功能

ps：Rolling update 仅在2.4.0版本以上支持

一个ha集群有，两到多个nn和多个dn，jn，zkns。因为jns 相对稳定并且在绝大多数都不需要被升级当升级hdfs时。滚动升级仅仅是升级nn，dn。
升级非联邦集群
假如有两个主节点nn1 nn2，并且分别是active 和 standby 状态。升级步骤如下：

准备升级

运行
hdfs dfsadmin -rollingUpgrade prepare
创建fsimage 用于rollback。
代码如下：

        RollingUpgradeInfo startRollingUpgrade() throws IOException {
            checkSuperuserPrivilege(); //检查权限
            checkOperation(OperationCategory.WRITE);
            writeLock();
            try {
              checkOperation(OperationCategory.WRITE);
              long startTime = now();
              if (!haEnabled) { // for non-HA, we require NN to be in safemode
                startRollingUpgradeInternalForNonHA(startTime);
              } else { // for HA, NN cannot be in safemode
                checkNameNodeSafeMode("Failed to start rolling upgrade");
                startRollingUpgradeInternal(startTime);
              }getEditLog().logStartRollingUpgrade(rollingUpgradeInfo.getStartTime());
              if (haEnabled) {
                // roll the edit log to make sure the standby NameNode can tail
                getFSImage().rollEditLog();
              }
            } finally {
              writeUnlock();
            }

            getEditLog().logSync();// 同步jn节点 并且flush jn内存数据
            if (auditLog.isInfoEnabled() && isExternalInvocation()) {
              logAuditEvent(true, "startRollingUpgrade", null, null, null);
            }
            return rollingUpgradeInfo;
          }

运行

hdfs dfsadmin -rollingUpgrade query

检查rollback images的状态，直到”Proceed with rolling upgrade” 出现。表示准备好了。

  RollingUpgradeInfo queryRollingUpgrade() throws IOException {
    checkSuperuserPrivilege();
    checkOperation(OperationCategory.READ);
    readLock();
    try {
      if (rollingUpgradeInfo != null) {
        boolean hasRollbackImage = this.getFSImage().hasRollbackFSImage();// 有可以回滚的images就返回true
        rollingUpgradeInfo.setCreatedRollbackImages(hasRollbackImage);
      }
      return rollingUpgradeInfo;
    } finally {
      readUnlock();
    }
  }

升级Active and Stanby NNs

关闭NN2的服务，升级NN2(如果是tar包安装升级就是换目录。把hadoop目录软连成高版本目录)
开启 NN2 as standby with the
hdfs namenode -rollingUpgrade started.

ps：看了代码 这个hdfs name -rollingUpgrade started 和 hdfs name 一样 (CDH 5.3.3)除了日志处理部分，而且启动的时候 建议添加 nohup & 后台执行

Failover 切换，NN2:active NN1：standby (hdfs haadmin -failover nn1 nn2)
关闭NN1上 namenode 服务：hadoop-daemon.sh stop namenode
升级namenode节点 hadoop tar
开启 NN1 as standby with the hdfs namenode -rollingUpgrade startedoption.

升级Datanode 节点

选择一些指定的datanode节点（同一机架的）

扫描二维码关注公众号，回复： 12198601 查看本文章

运行 hdfs dfsadmin -shutdownDatanode <DATANODE_HOST:IPC_PORT> upgrade关闭datanode
运行hdfs dfsadmin -getDatanodeInfo <DATANODE_HOST:IPC_PORT>查看状态
升级并且重启datanode
在所有的datanode上执行123步骤全部更新完毕

结束rolling update

在主节点运行
hdfs dfsadmin -rollingUpgrade finalize

注意事项

jn和nn和dn最好是分离没有交集，不在同一台机器上运行
备份好namenode.dir 下面的所有editslog和fsimages 以备后患
执行回滚或者降级参考一下流程：

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#namenode_-rollingUpgrade

Downgrade without Downtime
In a HA cluster, when a rolling upgrade from an old software release to a new software release is in progress, it is possible to downgrade, in a rolling fashion, the upgraded machines back to the old software release. Same as before, suppose NN1 and NN2 are respectively in active and standby states. Below are the steps for rolling downgrade:

Downgrade DNs
Choose a small subset of datanodes (e.g. all datanodes under a particular rack).
Run “hdfs dfsadmin -shutdownDatanode upgrade” to shutdown one of the chosen datanodes.
Run “hdfs dfsadmin -getDatanodeInfo ” to check and wait for the datanode to shutdown.
Downgrade and restart the datanode.
Perform the above steps for all the chosen datanodes in the subset in parallel.
Repeat the above steps until all upgraded datanodes in the cluster are downgraded.
Downgrade Active and Standby NNs
Shutdown and downgrade NN2.
Start NN2 as standby normally. (Note that it is incorrect to use the “-rollingUpgrade downgrade” option here.)
Failover from NN1 to NN2 so that NN2 becomes active and NN1 becomes standby.
Shutdown and upgrade NN1.
Start NN1 as standby normally. (Note that it is incorrect to use the “-rollingUpgrade downgrade” option here.)
Finalize Rolling Downgrade
Run “hdfs dfsadmin -rollingUpgrade finalize” to finalize the rolling downgrade.
Note that the datanodes must be downgraded before downgrading the namenodes since protocols may be changed in a backward compatible manner but not forward compatible, i.e. old datanodes can talk to the new namenodes but not vice versa.

Downgrade with Downtime
Administrator may choose to first shutdown the cluster and then downgrade it. The following are the steps:

Shutdown all NNs and DNs.
Restore the pre-upgrade release in all machines.
Start NNs with the “-rollingUpgrade downgrade” option.
Start DNs normally.

原文链接：https://blog.csdn.net/leone911/article/details/51395874