LVS load balancing (Introduction to LVS, three working modes, persistent connection)

1. Introduction and principle
of LVS 1. Overview of
LVS LVS (Linux Virtual Server) is a Linux virtual server that runs on the Linux platform. LVS is integrated into the Linux kernel module and is divided into two parts, user mode (ipvsadm) and kernel mode (ipvs). The kernel mode is the core code of the kernel, and the user mode is the management tool of the user space. They work at their respective levels. Here, one is in user space and one is in kernel space. LVS implements an IP-based data request load balancing scheduling scheme in the Linux kernel. Its architecture is shown in Figure 1. Terminal Internet users access the company’s external load balancing server from the outside, and the end user’s Web request will be sent to the LVS scheduler , The scheduler decides to send the request to a certain back-end Web server according to its own preset algorithm. For example, the polling algorithm can evenly distribute external requests to all the back-end servers. Although the terminal user accesses the LVS scheduler, it will be forwarded To the real back-end server, but if the real server is connected to the same storage, the service provided is the same. No matter which real server is accessed by the end user, the service content is the same. The language is transparent. Finally, according to the different LVS working modes, the real server will choose different ways to send the data required by the user to the end user. The LVS working mode is divided into NAT mode, TUN mode, and DR mode.
Insert picture description here

2. The IPVS principle
hook function, a mechanism of the Linux kernel, gives priority to obtaining the right to use the data message. When IPVS is turned on, IPVS will forcibly obtain the right to use the message sent by the client.
Tip: In the production environment, people can call it IPVS or LVS, or IPVS module or LVS scheduling. One is the kernel mode and the other is the package name. There is basically no difference, similar to the relationship between Apache and httpd.

2. Working mode analysis
1. NAT-based LVS load balancing
NAT (Network Address Translation) is network address translation. Its function is to modify the data header to enable private IP addresses located inside the enterprise to access the external network, and external users can Visit the private IP host located inside the company. The topology of the LVS/NAT working mode is shown in Figure 2. The LVS load scheduler uses two network cards to configure different IP addresses, eth1 is connected to the internal network through a switching device, and eth0 (VIP) is connected to the external network.
Insert picture description here

In the first step, the user 192.168.45.10 resolves to the external network address on the company's load scheduler through the Internet DNS server. Compared with the real server IP, the LVS external network IP, also known as VIP (Virtual IP Address), is the external reception of the entire load balancing cluster. Visited IP, the user can connect to the back-end real server (Real Server) by visiting the VIP, and all this is transparent to the user, the user thinks that he is accessing the real server, but he does not know what he is accessing VIP is just a scheduler, and it is not clear where and how many real servers there are.

In the second step, the user sends the request to the load scheduler eth0 (192.168.66.10). At this time, we need to write the cluster scheduling rules in the LVS, including the VIP external network address, the real server IP and the corresponding port. LVS will Select a real server (192.168.88.11~192.168.88.13) at the backend according to the preset algorithm (such as rr polling), and then LVS will perform DNAT conversion (target address conversion) on the data request packet, and the source address remains unchanged. Modify the target address to the real server IP and port selected by the algorithm, and forward the data request packet to the real server through eth1 (192.168.88.10).

In the third step, after the real server receives the data packet, it returns the response data packet to the LVS scheduler. The gateway of the real server is designated as the eth1 of the load scheduler. The load scheduler will complete the SNAT conversion after receiving the response data packet. The source address (the IP of the load scheduler eth1) is modified to the VIP (eth0) of the load scheduler. After the modification is completed, the load scheduler sends the returned data packet to the client. In addition, since the LVS scheduler has a connection Hash table, the connection request and forwarding information will be recorded in this table. When the next data packet of the same connection is sent to the scheduler, the previous connection record can be found directly from the Hash table , And select the same real server and port information based on the recorded information.
Reminder: Regarding DNAT/SNAT conversion, when the request message is sent through LVS, the target address conversion is completed, and the public network IP is converted to the local area network IP (DNAT); when the real server returns the data message through the LVS, the source address conversion is completed. IP is converted to public network IP (SNAT). The summary is that if the message is passed to the destination address conversion, the return must be a source address conversion; the past is a source address conversion, and the return must be a destination address conversion.

1.1 Features of NAT mode
1) The load scheduler works between the real server and the client;
2) The load scheduler must be a Linux operating system, and the real server can be any operating system;
3) Inbound and outbound data messages are all performed by the load scheduler Complete;
4) Support port mapping.
The difference between the LVS NAT mode and the cluster concurrency in Nginx 7-layer load scheduling: In the
NAT mode, the cluster concurrency is related to the throughput of the current system, because at this time LVS is only responsible for obtaining the permission to use data packets. In LVS Although the DNAT/SNAT conversion has been realized, it is the Linux kernel, that is, the function of the firewall. The LVS does not process this traffic during the whole process. It is equivalent to the security guard on duty at the gate and will not send letters to both sides. The communication efficiency depends on the throughput of the door (not the size of the door). In fact, after the address translation is completed, Linux uses the function of routing and forwarding. In Nginx7-layer load scheduling, the concurrency of the cluster is related to the concurrency of the Nginx scheduler. At this time, Nginx will personally participate in receiving and forwarding data packets, similar to the letter porter working in the gate, and the porter moving the letter. The speed directly determines the communication efficiency on both sides of the door.
Throughput: the data flow that can be processed per second

2. LVS load balancing DR mode based on DR mode is
also called direct routing mode. Its architecture is shown in the figure. In this mode, LVS still only undertakes inbound requests for data and selects a reasonable real server according to the algorithm. The end real server is responsible for sending the response data packet back to the client. After selecting the real server according to the algorithm, the scheduler modifies the MAC address of the data frame to the MAC address of the selected real server without modifying the data message, and sends the data frame to the real server through the switch. During the whole process, the VIP of the real server does not need to be visible to the outside world.
Insert picture description here

第一步：用户端192.168.45.10将请求数据包发送给路由器eth0（192.168.50.10），路由器对数据包完成DNAT目标地址转换，具体内容是：如果数据报文目标IP为192.168.50.10：80，那么将目标地址修改为192.168.88.10：80。

第二步：路由器通过eth1（192.168.50.11）将完成DNAT转换后的请求数据包发送到负载调度器的192.168.88.10：80，由于负载调度器与真实服务器都在二层网络，同一个广播域中，此时负载调度器网卡一般会开启2个接口，一个作为VIP接收公网访问，另一个子接口作为局域网通信用。DR模式中，负载调度器不会更改数据包的目标IP和端口，而是根据算法更改数据包的目标MAC地址为某台真实服务器的MAC地址，更改完成后，数据包会被发送到局域网内拥有该MAC地址的某台真实服务器的物理端口。

第三步：显然，由于此数据包的目标IP不是该真实服务器的IP，而是负载调度器的VIP，故真实服务器并不会接收已经送到自家物理端口门前的数据包，为了能正常接收数据包，真实服务器会在自身的回环网卡上再开启一个子接口，这个子接口IP与负载调度器VIP一致。我们知道，网卡的基本属性是通告和响应，所谓通告就是网卡默认会将自己所有合法、标准的地址向当前广播域中广播，这也是为什么同一个广播域中的Windows主机上线后，我们可以在其他主机的工作组中查看到这台主机的地址和名称。因而，真实服务器的网卡也会通过eth0默认将eth0的IP以及回环接口上新增的合法标准IP广播给局域网中的其他主机。这样，当前广播域中会出现2个相同的IP，即负载调度器的VIP与真实服务器回环接口的子接口IP一致而冲突。为避免此问题，我们使用一种叫做ARP通信行为控制的技术来隐藏真实服务器回环网卡子接口的IP，目的是使这个子接口IP只有在目标IP和自己IP一致时才响应，且它不能被局域网其他主机识别。ARP通信行为控制也包含通告行为和响应行为，我们通过它限制回环网卡子接口IP对外通告和响应，这样回环网卡子接口IP即使与局域网其他IP冲突，由于它永远不会响应其他主机，也就不会造成IP地址冲突（文章末尾介绍详细规则）。其实，在更老的版本里，我们是通过将负载调度器VIP地址与网卡自身MAC地址绑定在交换机上的方式，使真实服务器回环网卡子接口IP在局域网广播而不会有其他主机响应它。笔者不建议使用这种方式，因为生产环境复杂，需要与网络工程师协商，不易掌控。有了与VIP一样的子接口IP以后，我们拥有了处理请求数据报文的身份，但还需要使用路由转发功能，因为此时的回环网卡子接口只能内部通信，所以让eth0帮忙将数据报文转发给回环网卡子接口IP。至此，真实服务器才能处理请求数据报文。
提示：回环网卡上的127.0.0.1是一个回送地址，也叫测试IP，默认只在当前主机内部生效。当我们在回环网卡上另外配置一个合法标准IP时，它是可以对外通信的，为了不造成局域网地址冲突，所以我们需要对它隐藏。

第三步较为复杂，为了便于理解笔者类比为现实生活中的例子：某小区收到一份防疫通知（请求数据报文），但发信员只知道红头文件要送到小区大门（集群VIP），并不清楚具体给谁处理，实际通知一般由门卫（负载调度器）负责传达。但门卫不清楚红头文件政策，就随便（用算法非常认真地计算后）告诉发信员要交给社区党支部（真实服务器）处理（更改数据包MAC地址），于是文件被送到了社区党支部门口（局域网主机间通过MAC地址通信），此时的社区党支部只有文员（真实服务器eth0）和李四（回环网卡子接口）。李四是一名党员（回环网卡），他是一个单纯善良的孩子，开门后不禁有很多问号，他确认党支部没有被赋予这项任务（数据包目标IP与真实服务器IP不一致），他也就没有解读文件的权利。眼下其他党员不在，在问清事情的来龙去脉后，李四毅然决定代替党支部接收完成门卫移交的重任（通过ARP通信控制行为获得查看数据包的身份）。党支部集体决定以后由李四负载处理此类事件（真实服务器开启路由转发，获取数据包），整个党支部协同完成文件中布置的任务（真实服务器返回数据包给路由器）。

第四步：真实服务器将返回数据包发送给路由器的eth1，路由器完成SNAT源地址转换，将返回数据报文的源地址改为路由器eth0的IP（用户端发送请求数据包时的公网目标IP），至此返回数据包被发送给用户端。

2.1 DR模式特点
1）负载调度器与真实服务器处在同一个广播域中；
2）负载调度器必须是Linux系统，真实服务器也必须是Linux系统（因为要使用ARP通信行为控制）；
3）数据入站由负载调度器完成，出站由真实服务器完成；
4）不支持端口映射。

3、基于TUN的LVS负载均衡
在LVS（NAT）模式的集群环境中，由于所有的数据请求及响应的数据包都需要经过LVS调度器转发，如果后端服务器的数量大于10台，则调度器就会成为整个集群环境的瓶颈。我们知道，数据请求包往往远小于响应数据包的大小。因为响应数据包中包含有客户需要的具体数据，所以LVS（TUN）的思路就是将请求与响应数据分离，让调度器仅处理数据请求，而让真实服务器响应数据包直接返回给客户端。LVS/TUN工作模式拓扑结构如图4所示。其中，IP隧道（IP tunning）是一种数据包封装技术，它可以将原始数据包封装并添加新的包头（内容包括新的源地址及端口、目标地址及端口），从而实现将一个目标为调度器的VIP地址的数据包封装，通过隧道转发给后端的真实服务器（Real Server），通过将客户端发往调度器的原始数据包封装，并在其基础上添加新的数据包头（修改目标地址为调度器选择出来的真实服务器的IP地址及对应端口），LVS（TUN）模式要求真实服务器可以直接与外部网络连接，真实服务器在收到请求数据包后直接给客户端主机响应数据。
Insert picture description here

第一步：客户端向负载调度器发送请求数据包，负载调度器接收到请求数据包后对请求数据包进行二次封装，第二次封装时目标地址为真实服务器的IP，通过这种方式，请求数据包将被发送到真实服务器上。

第二步：真实服务器接收到数据包后拆掉外层封装（第二层），根据原始请求数据包的源地址将返回数据包发送给客户端。

3.1 TUN模式的特点
1）负载调度器与真实服务器必须有公网地址，或者通过公网地址能够路由（生产环境中各个服务器通常在不同地域）；
2）负载调度器必须是Linux系统，真实服务器也必须是Linux系统（数据包要封装）
3）数据入站由负载调度器完成，出站由真实服务器完成；
4）不支持端口映射。
提示：由于这种模式主要过程都是公网之间的访问，延时无法确保，而且这种方式服务器一般配置在不同地区，维护成本也高，因此很少使用。

三、三种模式对比
1、并发能力
DR > TUN > NAT > 七层负载的Nginx
提示：延时跟压力无关，故TUN优于NAT模式。

2、生产环境中使用面积（频率）
NAT/DR > TUN
提示：如果集群需要更大的并发量，选择DR模式；
如果集群需要更高的灵活性，选择NAT模式（支持端口映射，前后端口可以不一致）。

四、持久化连接和ARP通信行为控制
1、HTTPS存在的问题
以NAT模式为例，用户若使用HTTPS协议访问服务器，在首次访问时，中间会有大量HTTPS会话过程。同一个用户第二次连接时的请求可能被分配到第二台真实服务器，同样也需要大量的HTTPS会话过程才能建立通信，整个过程无疑会耗用大量资源。HTTPS本身就是一个资源消耗型的服务体系，甚至我们有一种硬件叫SSL加速器专门用来提高HTTPS握手的速度，它与HTTPS会话卸载层比较类似，结构图如下：
用户与SSL加速器通信时使用HTTPS协议，数据报文经过SSL时被卸载成HTTP协议。
Insert picture description here

2、持久化连接的原理及作用
把同一个 client 的请求信息记录到 lvs 的 hash 表里，保存时间使用 persistence_timeout 控制，单位为秒。如果用户配置了持久化时间 persistence_timeout，当客户端的请求到达 LB（负载调度器）后，IPVS 会在记录表里添加一条 state 为 NONE 的连接记录。该连接记录的源 IP 为客户端 IP，端口为 0，超时时间为上面所说的持久化时间 persistence_timeout ，会逐步减小。当 NONE 的超时时间减到 0 时，如果 IPVS 记录中还存在 ESTABLISHED 或 FIN_WAIT 状态的连接，则 persistence_timeout 值会刷新为初始值。在该 NONE 状态的连接记录存在的期间，同一客户端IP的消息都会被调度到同一个 RS 节点。
提示：
1）算法类似于 SH，将同一个IP的用户请求，发送给同一个服务器，但是又有时间限制；（关于算法，笔者后续更新，这里大家理解成计算机的一种条件选择方案即可）
2）持久化连接不属于调度算法，但是会应用与算法，并且比算法优先级更高。

3、持久化连接的分类
首先明确一下，持久化连接最常用的应用场景是HTTPS。
3.1 PCC（持久客户端连接）
将来自于同一个客户端的所有请求统统定向至此前选定的RS。也就是只要IP相同，分配的服务器始终相同。换句话说，我们还可以同时调度多个集群，如2个不同的集群VIP都是10.10.10.10，可映射为不同端口，一个为443，一个为80。当用户访问443端口时，访问请求被分配到443端口所在集群的真实服务器RS1，同理80端口也一样，这里只看用户的访问IP，不管端口，这样就能调度2个集群。总结就是，只要负载调度器VIP（集群IP）一样，不管访问集群的端口是谁，同一个用户在持久化连接时间内，会被分配到同一个真实服务器。
ipvsadm -A -t 172.16.0.8:0 -s wlc -p 120
ipvsadm：命令行管理工具
-A：添加一个新的集群
-t：TCP集群
192.168.88.10:0 ：集群地址，:0代表任意端口
-s wlc：指定算法为wlc算法
-p ：指定持久化时间为120S

3.2 PPC（持久端口连接）
将来自于同一个客户端对同一个服务(端口)的请求，始终定向至此前选定的RS。这种类型相当于是在第一种的基础上指定了端口。换句话说，要同时匹配负载调度器VIP（集群IP）和访问端口，在这两者都保持不变的前提下，同一个用户在持久化连接时间内，会被分配到同一个真实服务器。
ipvsadm -A -t 172.16.0.8:80 -s rr -p 120

3.3 PFMC（持久防火墙标记连接）
将来自于同一客户端对指定服务(端口)的请求，始终定向至此选定的RS，不过它可以将两个毫不相干的端口定义为一个集群服务，这种方式使用较少。
原理：使用防火墙工具给用户分类，分完类后打上标签，集群根据标签值来进行负载调度。
iptables -t mangle -A PREROUTING -d 172.16.0.8 -p tcp --dport 80 -j MARK --set-mark 10
#在防火墙中写入规则，规则匹配成功后执行ipvsadm命令声明的算法来控制
iptables -t mangle -A PREROUTING -d 172.16.0.8 -p tcp --dport 443 -j MARK --set-mark 10
service iptables save
ipvsadm -A -f 10 -s wlc -p 120

3、ARP通信行为控制
3.1 ARP 响应行为
arp-ignore
0 表示只要本机配置有相应 IP 地址就响应。
1 表示仅在请求的目标地址配置在请求到达的网络接口上时，才给予响应。换句话说，除了发给当前网卡当前接口地址以外的响应信息，它都不会接收，只有报文的目标IP和自己IP一样时才响应。

3.2 ARP announcement behavior
arp-announce
0 means to advertise any address on any network interface of this machine.
1 means to avoid as far as possible the target network advertising the address information table that does not match its network.
#Try to represent that there may be errors
2 means that only the address information that matches its network is notified to the target network. In other words, it will not broadcast any IP except the address of the current interface of the current network card, only its own IP.

Finish
There is a lot of knowledge about the principles and procedures of LVS, and it needs practical operation to better understand. The author will build experiments with everyone to deepen the understanding.

LVS load balancing (Introduction to LVS, three working modes, persistent connection)

Guess you like