Enterprise-class load balancing cluster --RHCS cluster concept to address the phenomenon of competition for resources between cluster nodes through a fence equipment, and service migration between the cluster nodes are still normal client access (HA HA)

First, the definition RHCS cluster and related concepts

Definitions 1.1 RHCS cluster

RHCSThat RedHat Cluster Suite, Chinese meant namely Red Hat Cluster Suite. Red Hat Cluster Suite (RedHat Cluter Suite, RHCS) is an integrated set of software components that can be configured differently when deploying to meet the high availability, load balancing, scalability, file sharing and cost savings needed. It is mainly usedRed Hat Enterprise linux6. It provides two different types of clusters as follows:

  • **High availability:** application / service failover, failover to critical applications and services by creating a server cluster nodes n
  • Load Balancing: IP load balancing, to receive a group of servers on an IP network requests are load balanced

Features 1.2 RHCS cluster

(1) support up to 128 nodes (Red Hat Enterprise Linux 3 and Red Hat Enterprise Linux 4 supports 16 nodes).
(2) can provide high availability for the plurality of applications.
(3) NFS / CIFS Failover: Supports highly available file used in Unix and Windows environments.
(4) fully shared storage subsystem: All cluster members can access the same storage subsystem.
(5) Integrated Data integrity: the latest I / O barrier (Barrier) technology, such as programmable embedded and external power supply switching device (power switches).
(6) Service failover: Red Hat Cluster Suite can ensure timely detection of hardware to stop operating or fault occurred and automatic recovery system at the same time, it can also be applied to ensure proper operation by monitoring application and automatically restarts when it fails .

1.3 RHCS cluster of

(1) cluster architecture manager
This is a basic kit RHCS cluster, provides the basic functionality of your cluster so that each node in the cluster composed of working together, specifically includes distributed cluster manager (CMAN), membership management, lock management (DLM) profile Manager (CCS), a gate device (FENCE).

(2) high availability service manager
providesService node monitoring and failover servicesWhen a service node fails, the service to another node on health

(3) cluster configuration management tool
byLUCI to manage and configure RHCS cluster, LUCI is a web-based cluster configuration, you can easily build a powerful cluster system by LUCI.Ricci node may be used to host and manage luci segments communicate.

(4) Linux Virtual Server
LVS is an open sourceLoad balancing softwareBy LVS may be a reasonable allocation of the client's request based on the specified load strategies and algorithms to each node, dynamic, intelligent load balancing. LVS related

(5) RedHatGS (, Ltd. Free Join Files ystem)
GFS Redhat company is the development of a cluster file system, the latest version is GFS2,GFS file system service allows multiple simultaneous read and write a disk partitionMay be implemented GFS centralized management of data, data synchronization and eliminates the trouble of copying, but can not exist independently GFS need RHCS underlying support assembly.

(. 6) Logical Volume Manager cluster
Cluster Logical Volume Manager, and CLVM is, the LVM is extended, which allows the use of machines in the cluster to manage shared storage LVM

(7) ISCSI
is a protocol on the Internet, in particular, is a standard for data transmission over Ethernet, he is a new storage technology IPstorage theory. RHCS can be derived by using ISCSI technology and distribution shared storage

1.4 RHCS cluster operating principle and Features

(1) Distributed Cluster Manager (CMAN)
it is running on each node, providing cluster management tasks CMAN used to manage cluster members, messages and notifications RHCS, he byMonitor the operational status of each node To understand the relationship between nodes members, when a node fails, CMAN promptly notify underlying this change, and then make the appropriate adjustments.

(2) Lock Manager (DLM)
represent a distributed lock manager, he is to build a RHCS underlying basis, and also provides a common cluster of lock operation mechanism. In RHCS in, DLM running on each node of the cluster, GFS to synchronize access to data by locking system metadata management mechanism. CLVM through which the lock manager to synchronize and update data to the LVM logical volume, where the DLM server need not set the lock, which uses peer lock management, greatly improving the processing performance, while, the DLM avoiding the need when a single node fails overall recovery performance bottlenecks. In addition, the request of DLM are local, you do not need to network requests, so the request will take effect immediately. Finally, the DLM by layering mechanism, may be implemented in parallel a plurality of lock management space.

(3) Configuration File Management (of CCS)
the Cluster the Configuration the System, referred to as CCS, it is mainly used for the cluster configuration file management and profile synchronization between nodes. Sometimes, luci management aspects of the interface may be due to factors such as the network is not so carefree, CCS becomes very necessary, CCS running on each node of the cluster,Monitor each cluster node on a single configuration file /etc/cluster/cluster.conf state. When this document any changes, they all change this update to each node in the cluster and keep each node configuration file synchronization. For example, the administrator updates the cluster configuration file on node A, CCS found A node configuration file changes, this change immediately spread to other nodes up. rhcs profile is cluster.conf, it is an xml file that contains the specific name of the cluster, the cluster node information, the cluster resource and service information, fence equipment, etc.

(4) a gate device (the FENCE)
the FENCE RHCS cluster device is an essential part,By FENCE equipment can be avoided due to unforeseen circumstances caused by the emergence of "split-brain" phenomenon.
Appears FENCE device, is to solve similar problems, Fence device mainly through the server or storage hardware itself management interface or an external power management device, to issue the server or storage direct hardware management instruction, the server is rebooted or shut down, or with disconnected from the network.

(5) High Availability Service Manager (rgmanager)
high-availability service management is mainly used applications, services and resources, supervision, start and stop the cluster. It provides a cluster management service, when a node failure of service
, high availability cluster service management process service can be transferred from the failed node to other nodes up healthy, and this service is the ability to transfer automatically and transparently of.

RHCS通过rgmanager来管理集群服务,rgmanager运行在每个集群节点上,在服务器上对应的进程clurgmgrd。
在一个RHCS集群中,高可用性服务包含集群服务和集群资源两个方面,集群服务其实就是应用服务,例如apache、mysql等,
集群资源有很多种,例如一个IP地址、一个运行脚本、ext3/GFS文件系统等。

在RHCS集群中,高可用性服务是和一个失败转移域结合在一起的,所谓失败转移域是一个运行特定服务的集群节点的集合。
在失败转移域中,可以给每个节点设置相应的优先级,通过优先级的高低来决定节点失败时服务转移的先后顺序,
如果没有给节点指定优先级,那么集群高可用服务将在任意节点间转移。
因此,通过创建失败转移域不但可以设定服务在节点间转移的顺序,而且可以限制某个服务仅在失败转移域指定的节点内进行切换。

(6) cluster configuration management tool (Ricci and luci)
Conga cluster is based on a new network configuration tools, Conga are configured and managed by web cluster nodes.
Conga two parts, namely luci and Ricci, luci mounted on a separate computer, for configuring and managing the clusterricci mounted on each cluster node, Luci ricci and communicate by each node in the cluster. RHCS also provides some powerful cluster management command-line tool, commonly used clustat, cman_tool, ccs_tool, fence_tool, clusvcadm etc.

(. 7) RedhatGFS
GFS is a cluster storage solution RHCS system provided, which allows a plurality of nodes a cluster shared storage at the block level, each node by a shared memory space, access to ensure the consistency of the data, more practical say, GFS is a cluster file system RHCS provided by multiple nodes simultaneously mount a file system partition, and file system data from destruction, which is a single file system can not be achieved.

In order to achieve a plurality of nodes for simultaneously read and write file system operations, GFS using a lock manager to manage I / O operations,When a write operation process a file, the file is blocked, and does not allow other processes to read and write until this process is completed normally write it releases the lock, when the lock has been released, other processes can read and write to this file operates Further, when the modified data on a node GFS file system, such modifications will be visible through the operation of the underlying communication mechanism RHCS immediately on the other nodes.

In the construction of RHCS cluster, GFS generally as a shared memory, run on each node, and can be configured and managed by GFS RHCS management tool. These should be noted that the relationship between the GFS and RHCS, general beginners is easy to confuse the concept: Run RHCS, GFS is not necessary, only when required shared storage, it needs GFS support, and set up GFS cluster file system, you must RHCS have underlying support, so installation node GFS file system, you must install RHCS components.

Second, to address the phenomenon of competition for resources between cluster nodes through a fence device

  • FENCE working principle is:When an accident causes abnormal host or down, the opportunity to prepare first call FENCE device, then the FENCE abnormal host device reboot or isolated from the network, when FENCE operation successfully executed, the return information to the backup server, backup machine after receiving FENCE after the success of the information, services and resources began to take over the host. Such devices through FENCE, abnormal node will release occupied resources, to ensure that resources and services are always running on a node.
  • RHCS of FENCE equipment can be divided into two: internal and external FENCE FENCE, FENCE have IBMRSAII common internal cards, HP's iLO card, as well as equipment such as IPMI, external fence equipment UPS, SANSWITCH, NETWORKSWITCH etc.

lab environment
System 1) rhel6.5 of
2) Virtual Machine

ip use
Host (real machine): 172.25.7.250 As the fence equipment, temporarily do not have front
Virtual machine server1: 172.25.7.101 Download ricci, luci (Conga configured to provide a user interface), a master node
Virtual machine server2: 172.25.7.102 Download ricci, as deputy node

1、The source yum configuration rhel6.5 And configure the virtual machine server1 server2 yum source of added high availability, load balancing, a memory, a file system (HA added back to storage, etc.)

Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description

The server1 source of yum configuration file sent to server2:
Here Insert Picture Description

Creating a basic cluster environment

To build the most basic RHCS cluster environment, that is, create a cluster

2, the installation management node cluster and cluster service ricci graphical management tool luci service on server1, server1 where the management node, and while doing HA node
(1) server1:yum install -y ricci luci

Here Insert Picture Description
Here Insert Picture Description(2)server2: yum install -y ricci
Here Insert Picture Description

(3) the server1 and Server2, after installing ricci will produce ricci user , the user to set a password ricci: redhat

  • Starting the Red Hat Enterprise Linux 6.1, requires the use of ricci on any node in the cluster configuration to promote the updated password, so users need to modify the ricci password after installation ricci, this is the password behind the node is set to be finished after installation see the / etc / passwd file to automatically generate a user ricci

Here Insert Picture Description
Here Insert Picture Description
(4)Two nodes open ricci, luci and set the boot

  • rhel6, the open service : /etc/init.d/ service start
/etc/init.d/ricci start
/etc/init.d/luci start
  • rhel6, the set service boot from the start : chkconfig service on
chkconfig ricci on
chkconfig luci on

server1 inHere Insert Picture Description

Here Insert Picture Description

Check these two services use ports: 8084 and 11111
Here Insert Picture Description

server2
Here Insert Picture Description

3. At this point, the real machine login https://172.25.7.101:8084

  • Browser access https://172.25.12.1:8084 following screen appears, manually import the certificate, click [Advanced]
  • Note: To delete a node cluster is removed first, after deleting a cluster node will automatically disappear

Here Insert Picture Description
用户名:root
密码:redhat
Here Insert Picture Description

4、添加server1和server2到集群:
点击集群管理(manage cluster),选择create,创建集群

Here Insert Picture Description

设置集群的基本信息,添加节点

Here Insert Picture Description
点击create cluster开始创建,可以看到开始创建集群

Here Insert Picture Description

  • 创建时两台虚拟机都会重启,如果之前没有设置服务开机启动这个时候就会报错,重启后要在虚拟机中手动打开服务创建的过程才会继续

此时,主机连接的server1和server2会断开。 为了方便操作,我们仍旧连接server1和server2,在真机上操作。

Here Insert Picture Description

创建集群完成,节点添加成功
Here Insert Picture Description

集群中的服务
Here Insert Picture Description

5、查看集群的状态:clustat
Here Insert Picture Description
Here Insert Picture Description

6、查看集群节点上的单一配置文件/etc/cluster/cluster.conf

Here Insert Picture Description
Here Insert Picture Description

添加fence,解决争抢资源的现象

如何通过fence设备解决集群节点之间争抢资源的现象?

RHCS高可用集群比之前的lvs调度器性能更好,更全面,有更多功能,前端后端均有,在这里我们可以把每个集群节点当作一个调度器,实际上功能不止调度器一个。一般情况下集群节点中有主有备,正常情况下会有一个正常工作(调度器正常),但是当调度器坏了就完蛋了。
因此要做到一台调度器坏了,调度器1就去通知调度器2接管它的工作,正常情况下调度器1和调度器2会一直通信,当2收不到1的消息的时候,就说明1坏了,2马上接替1的工作,但是当1和2之间的心跳检测出现问题的时候,也就是有一个卡死了,其实两个都能正常工作,这个时候两个都能工作,因此1和2都会去为客户端服务,会去争抢资源,因此需要fence这样一个物理设备抑制争抢资源。 当争抢资源的时候,1会去通过fence设备强制重启2(或者2重启1),两个可以互相强制重启,但是实际当中集群也是有主有备的。现在server1是一个集群,server2是一个集群,真机是一个fence设备(保安)。通过fence这个物理设备将集群连接在一起,保证时刻只有一个集群正常工作一旦出现争抢资源的现象,主的集群就会通过fence强制重启备的集群,从而使主集群正常工作。

7、 添加fence
(1)配置真机的yum源:将虚拟机的yum源的配置文件发给真机

Here Insert Picture Description

(2)在真机上安装fence的软件包

fence只是中间一个通道,server1和server2都是连接在fence这个物理设备上,通过一个集群去拔另外一个集群的电,防止争抢资源

yum search fence -y
yum install fence-virtd.x86_64 fence-virtd-libvirt.x86_64 fence-virtd-multicast.x86_64 -y

Here Insert Picture Description

Here Insert Picture Description

Here Insert Picture Description

Here Insert Picture Description

(3)配置fence : fence_virtd -c

Here Insert Picture Description
设备选择br0,其他用默认

Here Insert Picture Description

vim /etc/fence_virt.conf 可查看配置文件

Here Insert Picture Description

Here Insert Picture Description

(4)生成fence_xvm.key :dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=128 count=1Here Insert Picture Description

(5)把fence_xvm.key分发到HA节点,通过这个key来管理节点

scp fence_xvm.key root@server1:/etc/cluster/
scp fence_xvm.key root@server2:/etc/cluster/

Here Insert Picture Description

Here Insert Picture Description

(6)web界面为节点配置fence

添加fence设备(fence virt)—>vmfence(名字随便起)Here Insert Picture Description

Here Insert Picture Description

Here Insert Picture Description
Here Insert Picture Description

Here Insert Picture Description

  • 给server1和server2集群添加fence,vmfence_1(名字) uuid(server1主机的),vmfence_2(名字) uuid(server2主机的)。。因为两个集群的ip可能会一样,有可能会一次关闭两个集群,不安全,应该把每个集群唯一的uuid写在fence设备上面,在真机里面virt-manager把两个uuid查看出来。

Here Insert Picture Description

(7)启动fence:systemctl start fence_virtd.service

Here Insert Picture Description
Here Insert Picture Description

Here Insert Picture Description
Here Insert Picture Description

8、测试

(1)在server1上面输入:fence_node server2可以看到集群1把集群2强制重启了

Here Insert Picture Description
server1通过fence干掉server2,可以看到server2断电重启

Here Insert Picture DescriptionHere Insert Picture Description

(2)当破坏server1的内核时,输入:echo c > /proc/sysrq-trigger,fence会强制重启server1

Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description

(3)当server2掉线,下线时,
Here Insert Picture Description
查看集群状态,发现server2不在线。
Here Insert Picture Description

三、实现各集群节点之间服务迁移时客户端仍正常访问(高可用HA)

背景 :当一个集群节点(类似于调度器)坏了,如何将服务安全的迁移到另一个集群上面,对于客户访问资源来说,访问毫无感觉,透明。在集群的图形化管理工具里面进行设置(类似于一个总的负责人,它管理所有的集群节点)
思路 :先设置迁移服务的规则,再设置客户访问资源的规则(入口地址,脚本方式启动web服务),最后资源组:所有的资源都要放在一个组里面,迁移的时候也是也是一整套的删除

3.1 高可用服务配置(以httpd为例)

server1和server2配置httpd服务,编写默认发布页面
Here Insert Picture Description

Here Insert Picture Description

1、设置失败回切(故障转移域)

(1)在浏览器里面:设置服务迁移failover Domains

Here Insert Picture Description

 命名为webfail,当一个节点出现故障时切换到正常的哪一个上
将server1和server2添加在域中,设置两个节点的优先级,数字越小优先级越高
第三个表示是否服务往优先级高迁移,企业一般不会再自动迁移到优先级高

Here Insert Picture Description

2、设置回切资源

点击add --> 添加ip address(集群外的VIP)—>设置切换主机的时间5秒—>submit提交
再次点击add–>添加script(因为httpd时脚本启动的)—>添加/etc/init.d/httpd文件

Here Insert Picture Description

Here Insert Picture Description

资源添加完毕后,查看httpd服务的状态,不用启动httpd服务,集群会自己开启

3、设置资源组

点击add设置资源组的名字为apache -->添加资源(上一步中添加的vip)—>添加资源(httpd脚本)

Here Insert Picture Description

Here Insert Picture Description

Here Insert Picture Description

刷新页面,显示httpd服务运行在server1上(server1优先级高)
Here Insert Picture Description

4、测试

(1)在server1上, 关闭httpd服务
Here Insert Picture Description

Here Insert Picture Description
服务和入口ip自动迁移到server2上,且不回迁

Here Insert Picture Description
(2)在server2中输入echo c> /proc/sysrq-tigger,破坏server2的内核,使其重启
Here Insert Picture Description
服务自动迁移到了server1

Here Insert Picture Description

当server2重启成功后,服务不会自动回迁到server1上(可设定回迁)
Here Insert Picture Description
没有手动开启httpd服务时在web集群管理界面,刷新界面只会开启一个节点的httpd
所以当服务落在某一个节点时只能访问到虚拟ip和当前节点,其他节点是访问不到的

Published 102 original articles · won praise 21 · views 5337

Guess you like

Origin blog.csdn.net/ranrancc_/article/details/102577279