HA HA Cluster Overview and principles resolved

HA HA Cluster Overview and principles resolved

1 Overview

1) The so-called HA (High Available), namely high availability (7 * 24 hours uninterrupted service).

2) The key strategy to achieve high availability is to eliminate single points of failure. Strictly speaking HA HA mechanism should be divided into individual components: HDFS YARN HA and the HA.

Before 3) Hadoop2.0, single point of failure (SPOF) in HDFS cluster NameNode.

4) NameNode major impact HDFS clusters in the following two aspects:

NameNode machine accidents, such as downtime, the cluster will not be available until the administrator restarted

NameNode machines need to be upgraded, including software, hardware upgrades, this time the cluster will not be able to use

In response to these questions, bigwigs provides the following solutions:

HDFS HA ​​By configuring two NameNodes implemented Active Standby / heat in the cluster NameNode Prepared to solve the above problems. If a fault occurs, such as a machine or the machine crashes need to upgrade and maintenance, then the switch can be quickly NameNode by this way to another machine.

General is: a machine that is not enough, I'll get more than a few, after all, smaller probability hang together multiple machines thing. How do these fit machine? Note that reference to a node of Active and Standby state, that there is a node Active mode, which is the mode of operation, the other, that is selling mode to stay. If the work is hung up, sell elect to stay on top of a go.

2. HDFS-HA work points

1) metadata management requires the following changes:

Two NameNode keep a memory of their metadata, only one copy of the original, and now we have;

Edits log only NameNode node Active (working) state can do write, read only sell to stay;

Edits two NameNode can be read;

Edits shared on a shared storage management (qjournal and NFS mainstream achieve two), so you can see each node;

2) It should be a state management function module

Each node has a zkfailover NameNode client, resident in each of the nodes NameNode where each node NameNode zkfailover responsible for monitoring their own lies, and to identify its own node status, this logo registered to the zookeeper years, when when the state is required to switch from zkfailover responsible for switching, the need to prevent brain split switching (split brain: that work with multiple NameNode, resulting in inconsistent data) occur phenomenon.

3) must be able to ensure that no ssh password between two NameNode, two nodes to be able to communicate at any time

4) isolation (Fence), in which the same time there is only one NameNode provide services outside (two at the same time, it will split brain da)

3. HDFS-HA automatic failover mechanism

As already understand the main points of HA work, then how to configure automatically deploy HA failover. Automatic failover two new components to HDFS deployment: ZooKeeper and ZKFailoverController (ZKFC) process, as shown in FIG.

ZooKeeper (have elaborated a special classification under Zookeeper essay) main function is to maintain a small amount of data coordination, inform the client to change these data and monitoring highly available service client failure. Automatic transfer failure depends on HA ZooKeeper the following functions:

1) fault detection: Each cluster NameNode maintain a persistent session ZooKeeper, if the machine crashes, ZooKeeper the session is terminated, notice another ZooKeeper NameNode needs to trigger a failover, the host quickly.

2) active NameNode selection: ZooKeeper provides a simple mechanism for selecting a single node in the active state. If the current active duty NameNode crashes, another node may obtain a special exclusive lock from ZooKeeper to indicate that it should become active NameNode. ZKFC another new automatic failover component is ZooKeeper client, also monitor and manage the state of NameNode. Each run NameNode host also runs a ZKFC process, ZKFC responsible for:

  • Health Monitoring: ZKFC use a health check command NameNode in the same host periodically with ping, as long as the state of health NameNode not reply, ZKFC think that the node is healthy. If the node crashes, freezes or enters an unhealthy state, health monitor identifies the node for non-health.
  • ZooKeeper session management: in an open session when ZooKeeper in local NameNode is healthy, ZKFC maintained. If the local NameNode in the active state, ZKFC also maintained a special znode lock which uses ZooKeeper support for transient nodes, if a session is terminated, the node will lock automatically deleted.
  • Based on the selection ZooKeeper: If the local NameNode is healthy, and ZKFC found no other nodes currently holding znode lock, it will acquire the lock for himself. If successful, it has won the selection, and is responsible for running the failover process so that its local NameNode to Active. Failover and manual failover process described above is similar to the transfer, if the first active NameNode before the need to protect, and then converted into local NameNode Active state.

References: Silicon Valley is still Hadoop (HDFS) Handout

Guess you like

Origin www.cnblogs.com/simon-1024/p/11749930.html