Spend a weekend to master the core principles of SpringCloud Ribbon

Preface

The second distributed framework article after SpringCloud Feign, also adhering to the principle of the big goal of one SpringCloud component in a single weekend

If you want to see Feign’s friends, click here , the core principles of Feign will meet you by chance

In the usual use of SpringCloud, Feign is generally used, because Feign integrates Ribbon

But Ribbon is another knowledge point that cannot be ignored, and it is much more difficult than Feign. List the main topics of the article outline

  1. How to obtain an instance of the registry service
  2. How non-health service instances go offline
  3. Ribbon low-level principle realization
  4. Custom Ribbon load balancing strategy

The article uses SpringCloud Ribbon source code Hoxton.SR9 version: 2.2.6.RELEASE

In addition, at the end of the article, I said some thoughts during the process of looking at the source code, and the author's unreasonable process description in Ribbon.

Concept tips

Load balancing

Load balancing refers to the distribution of multiple execution units through a load balancing strategy. There are two common load balancing methods

  • Independent process unit, through the load balancing strategy, the request is distributed to different executions, similar to Nginx
  • Client behavior, bind the load balancing strategy to the client, the client will maintain a list of service providers, and distribute it to different service providers through the client load balancing strategy

Ribbon

Ribbon is a load balancing component open sourced by Netflix. The load balancing behavior occurs on the client side, so it belongs to the second type mentioned above.

Generally speaking, when SpringCloud is built and used, Ribbon is used as a client-side load balancing tool. But it will not be used independently, but combined with RestTemplate and Feign. Feign integrates Ribbon at the bottom, without additional configuration, and can be used out of the box.

In order to be more relevant to the Ribbon theme, the article uses RestTemplate as a network calling tool

RestTemplate is a network framework that provides access to third-party RESTFul Http interfaces under Spring Web

Environmental preparation

The registration center uses Ali Nacos to create two services, the producer cluster starts, and the consumer uses RestTemplate + Ribbon to call. The overall structure of the call is as follows

The producer code is as follows, register the service with Nacos, and expose the Http Get service to the outside

The consumer code is as follows, register the service with Nacos, and initiate a remote load balancing call through RestTemplate + Ribbon

RestTemplate is not load balanced by default, so you need to add @LoadBalanced

Start three producer instances to register Nacos, start and register successfully as shown below

It is almost impossible to introduce the principles of the framework in a strict order without quoting terms that have not been introduced . The author tries to explain as clearly as possible

How to obtain an instance of the registry service

Let’s first take a look at how Ribbon obtains the registry running instance on the client side. This point was what I was more confused about before.

Knowledge points related to service registration will be put into Nacos source code analysis instructions

Let’s first give an example. When we execute a request, we must perform load balancing, right? At this time, the code follows the load balancing to obtain the source code of the service list.

Explain the place marked in the yellow box above:

  • RibbonLoadBalancerClient: Responsible for load balancing request processing
  • ILoadBalancer: The interface defines a series of methods to achieve load balancing, which is equivalent to a route. The default implementation class ZoneAwareLoadBalancer in Ribbon
  • unknown: ZoneAwareLoadBalancer is a multi-zone load balancer, this unkonwn represents the meaning of the default zone
  • allServerList: represents the interface service instance obtained from the Nacos registry, and upServerList represents the health instance

Now if you want to know how Ribbon gets the service instance, you need to follow up getLoadBalancer()

getLoadBalancer

Let me first declare that the semantics of the getLoadBalancer() method is to obtain a Spring Bean named ribbon-produce and type ILoadBalancer.class from the Ribbon parent-child context container.

I said before when I talked about Feign that Ribbon will create a Spring parent-child context for each service provider, where Bean will be obtained from the child context.

Seeing this did not solve our doubts, thinking that there will be code to pull the service list in the method, but the goose just returns a Bean containing the service instance, so we can only follow the context of this Bean

We need to start with the load balancing client, because the default is ZoneAwareLoadBalancer, then we need to follow up when it was created and what did the initialization do

ZoneAwareLoadBalancer

ZoneAwareLoadBalancer is a zone-based load balancer, because if different computer rooms deploy service lists across regions, cross-region access will cause higher latency. ZoneAwareLoadBalancer is to solve this problem, but the default is the same region

ZoneAwareLoadBalancer very important, or that it represented the load balancing routing role is very important. Before making a service call, this class will be used to obtain the available Server according to the load balancing algorithm for remote calls, so we have to grasp what we did when creating this load balancing client

ZoneAwareLoadBalancer is created through a sub-container when the service is called for the first time

@Bean @ConditionalOnMissingBean  // RibbonClientConfiguration 被加载,从 IOC 容器中获取对应实例填充到 ZoneAwareLoadBalancer
public ILoadBalancer ribbonLoadBalancer(IClientConfig config,
                                        ServerList<Server> serverList, ServerListFilter<Server> serverListFilter,
                                        IRule rule, IPing ping, ServerListUpdater serverListUpdater) {
    ...
    return new ZoneAwareLoadBalancer<>(config, rule, ping, serverList,
            serverListFilter, serverListUpdater);
}

public ZoneAwareLoadBalancer(IClientConfig clientConfig, IRule rule,
                             IPing ping, ServerList<T> serverList, ServerListFilter<T> filter,
                             ServerListUpdater serverListUpdater) {
    // 调用父类构造方法
    super(clientConfig, rule, ping, serverList, filter, serverListUpdater);
}
复制代码

In the DynamicServerListLoadBalancer, the parent class BaseLoadBalancer is called to initialize a part of the configuration and methods, and the server service list and other metadata are also initialized.

public DynamicServerListLoadBalancer(IClientConfig clientConfig, IRule rule, IPing ping,
                                     ServerList<T> serverList, ServerListFilter<T> filter,
                                     ServerListUpdater serverListUpdater) {
    // 调用父类 BaseLoadBalancer 初始化一些配置,包括 Ping(检查服务是否可用)Rule(负载均衡规则)
    super(clientConfig, rule, ping);  
    // 较重要,获取注册中心服务的接口
    this.serverListImpl = serverList;
    this.filter = filter;
    this.serverListUpdater = serverListUpdater;
    if (filter instanceof AbstractServerListFilter) {
        ((AbstractServerListFilter) filter).setLoadBalancerStats(getLoadBalancerStats());
    }
    // 初始化步骤分了两步走,第一步在上面,这一步就是其余的初始化
    restOfInit(clientConfig);
}
复制代码

First, let’s talk about the initialization method in BaseLoadBalancer. Here we mainly assign some important parameters and Ping and Rule. In addition, the timer is executed according to the IPing implementation class. The following describes what Ping and Rule are

The method roughly does the following things:

  1. Set key parameters such as client configuration objects and names
  2. Get the interval of each Ping and the maximum time of Ping
  3. Set the specific load balancing rule IRule, the default ZoneAvoidanceRule, poll according to the server and zone zones
  4. Set specific Ping method, default DummyPing, return True directly
  5. According to the specific implementation of Ping, execute the scheduled task Ping Server

Here is an introduction to what IPing and IRule are filled in, and what are their implementations

IPing service detection

The IPing interface is responsible for sending a ping request to the Server instance to determine whether the Server has a response, so as to determine whether the Server is available

The interface has only one method isAlive, and the detection ping function is completed through the implementation class

public interface IPing {
    public boolean isAlive(Server server);
}
复制代码

The IPing implementation classes are as follows:

  • PingUrl: Initiate a network call by ping to determine whether the Server is available (generally, to create a PingUrl, you need to specify the path, the default is IP + Port)
  • PingConstant: fixed return whether a certain service is available or not, by default it returns True, which means it is available
  • NoOpPing: No operation, return True directly, indicating that it is available
  • DummyPing: The default class, returns True directly, and implements the initWithNiwsConfig method

IRule load balancing

The IRule interface is responsible for processing load balancing strategies according to different algorithms and logic. There are 7 built-in strategies, and the default ZoneAvoidanceRule

  1. BestAvailableRule: Select the server with the smallest amount of requests in the service list
  2. RandomRule: Randomly select Server in the service list
  3. RetryRule: Retry Server according to the polling method
  4. ZoneAvoidanceRule: According to the server's Zone zone and availability polling to select the server
  5. ...

As mentioned above, there will be two initialization steps. I just mentioned only one. Next, let’s talk about the remaining initialization method restOfInit. Although the name is called the remaining initialization, it is very important in terms of importance.

void restOfInit(IClientConfig clientConfig) {
    boolean primeConnection = this.isEnablePrimingConnections();
    // turn this off to avoid duplicated asynchronous priming done in BaseLoadBalancer.setServerList()
    this.setEnablePrimingConnections(false);
    // 初始化服务列表,并启用定时器,对服务列表作出更新
    enableAndInitLearnNewServersFeature();
    // 更新服务列表,enableAndInitLearnNewServersFeature 中定时器的执行的就是此方法
    updateListOfServers();
    if (primeConnection && this.getPrimeConnections() != null) {
        this.getPrimeConnections()
                .primeConnections(getReachableServers());
    }
    this.setEnablePrimingConnections(primeConnection);
    LOGGER.info("DynamicServerListLoadBalancer for client {} initialized: {}", clientConfig.getClientName(), this.toString());
}
复制代码

The code for obtaining the service list and regularly updating the service list is here, and it is worth looking at the source code carefully. Pay attention to the update service list method is wider

public void updateListOfServers() {
    List<T> servers = new ArrayList<T>();
    if (serverListImpl != null) {
        // 获取服务列表数据
        servers = serverListImpl.getUpdatedListOfServers();
        LOGGER.debug("List of Servers for {} obtained from Discovery client: {}",
                getIdentifier(), servers);

        if (filter != null) {
            servers = filter.getFilteredListOfServers(servers);
            LOGGER.debug("Filtered List of Servers for {} obtained from Discovery client: {}",
                    getIdentifier(), servers);
        }
    }
    // 更新所有服务列表
    updateAllServerList(servers);
}
复制代码

The first question goes round and round, finally I have to find out how to obtain the list of services. serverListImpl is implemented from ServerList . Because we use the Nacos registry, the specific implementation of ServerList is NacosServerList.

public interface ServerList<T extends Server> {
    public List<T> getInitialListOfServers();
    public List<T> getUpdatedListOfServers();
}
复制代码

ServerList only two interface methods, respectively, to obtain initialization service list set to get an updated list of service set , Nacos two calls are made to achieve an implementation method, it may be designed so

It is equivalent to the outbound interface ServerList provided by Ribbon. Registrar developers who want to integrate with Ribbon, then you should implement this interface. Then Ribbon will be responsible for calling the method implementation in the ServerList implementation class.

Between Ribbon and each service registry, this implementation is very similar to that between JDBC and each database

The problem of going round and round is clear. Let's summarize the content of the registration center to obtain service examples.

  1. The load balancing client obtains the service registration list information from the Nacos registry during initialization
  2. Different IPing achieved according to the acquired service list to send ping serial , in order to determine the availability of services. Yes, that is a serial, if you many examples, may consider rewriting this piece of logic ping
  3. If the availability of the service has changed or been artificially offline , then pull or re-updated service list
  4. When the load balancing client has these service registration class lists, it can naturally carry out the IRule load balancing strategy

How non-health service instances go offline

First of all I made two "bold" experiment , first turn off the implementation process for producers SpringBoot project, this time Nacos registry is real-time perception to the next instance of the service and delete

Proof Nacos client is the presence of similar hook function , perception cancellation of the project will be an example to stop Nacos server. But this time to consider one thing, and that is in the violence Kill or perform operations closed the case, the existence of the list of services in the Ribbon client cache can not perceive

This is the second test process on my side, which can greatly restore the problems that may be encountered when using Ribbon in production.

  1. Changes in client load balancing policy for random load RandomRule , we all can be tested, not fixed load rules
  2. Producer registered three service instances to Nacos, check to ensure that instances registered under normal service group
  3. The focus of operation is here. First, use the corresponding producer interface under the consumer instance request to ensure that Ribbon caches the corresponding Server to the client.
  4. A producer stopped service, then immediately use Jmeter call , initiate a request Jmeter thread group 100 (must arrive before the update request initiated Jmeter Server cache)
  5. At this time, you will see random failures, which means that after stopping a service, the worst result will be 30 seconds of production service unavailability . This time is configurable, and I will talk about why 30 seconds later.

Regular maintenance of service list

For the maintenance of the service list, there are two ways in Ribbon, both of which maintain the client list cache in the form of timed tasks

  1. Use the implementation class PingUrl of IPing to go to the Ping service address every 10 seconds . If the return status is not 200, then the instance is offline by default
  2. The built-in scan of the Ribbon client, by default , pulls Nacos, which is the service instance of the registry , every 30 seconds . If the instance is offline, it will be removed from the client cache.

This piece of source code is not posted anymore, put two source code locations, just look at it for yourself if you are interested

+ DynamicServerListLoadBalancer#enableAndInitLearnNewServersFeature
+ BaseLoadBalancer#setupPingTask
复制代码

If during your interview, the interviewer asked about the relevant content in this section, and can answer both of these points, basically the SpringCloud source code is almost the same

Ribbon low-level principle realization

To realize this piece of content by the underlying principle, we will first explain the entire process of using Ribbon load balancing to call remote requests , and then focus on how the RandomRule load strategy is implemented at the bottom

  1. Creating ILoadBalancer client load balancing, required to initialize Ribbon in the service instance listing on timers and registry
  2. From ILoadBalancer by selecting a list of health services Server load balancing
  3. Replace the service name (ribbon-produce) with the IP + Port in the Server , and then generate an HTTP request to call and return the data

As mentioned above, ILoadBalancer is responsible for load balancing routing, and internally uses the IRule implementation class for load calls

public interface ILoadBalancer {
    public Server chooseServer(Object key);
  	...
}
复制代码

In the chooseServer process, the choose method in the IRule load strategy is called, and a healthy server is obtained inside the method

public Server choose(ILoadBalancer lb, Object key) {
    ... 
    Server server = null;
    while (server == null) {
        ...
        List<Server> upList = lb.getReachableServers();  // 获取服务列表健康实例
        List<Server> allList = lb.getAllServers();  // 获取服务列表全部实例
        int serverCount = allList.size();  // 全部实例数量
        if (serverCount == 0) {  // 全部实例数量为空,返回 null,相当于错误返回
            return null;
        }
        int index = chooseRandomInt(serverCount);  // 考虑到效率问题,使用多线程 ThreadLocalRandom 获取随机数
        server = upList.get(index);  // 获取健康实例
        if (server == null) {
            // 作者认为出现获取 server 为空,证明服务列表正在调整,但是!这只是暂时的,所以当前释放出了 CPU
            Thread.yield();
            continue;
        }
        if (server.isAlive()) {  // 服务为健康,返回
            return (server);
        }
        ...
    }
    return server;
}
复制代码

Briefly talk about the process in the random strategy choose

  1. Get the list of all services and health services, and judge whether the number of all instances is equal to 0 , if yes, return null, which is equivalent to an error
  2. Get all of the services index index from the list, then go to healthy list of instances to obtain Server
  3. If the obtained Server is empty, it will give up the CPU, and then repeat the above process, which is equivalent to a retry mechanism
  4. If the obtained server is not healthy, set the server equal to empty, take a break, and continue the above process again

It's relatively simple. Some friends may ask, what if the healthy instance is smaller than all the instances? There are two possibilities in this case

  1. Luckily, a relatively small number is randomly selected from the total number of instances, and this number happens to be in the list of healthy instances, then return to Server
  2. Luck is relatively back, a certain number is randomly selected from the total number of instances, and the number of healthy instances list is empty or less than this number, and the subscript will be out of bounds.

Leave a question:

Why not select an instance directly from the healthy instance?

If you select directly from the list of healthy instances, you can avoid the subscript out-of-bounds exception. Why should the author get the Server subscript from all the instances first?

Custom Ribbon load balancing strategy

This kind of custom strategy is more friendly in the framework. According to the questions mentioned above, we customize a strategy

@Slf4j
public class MyRule extends AbstractLoadBalancerRule {
    @Override
    public Server choose(Object key) {
        ILoadBalancer loadBalancer = getLoadBalancer();
        while (true && ) {
            Server server = null;
            // 获取已启动并且可访问的服务列表
            List<Server> reachableServers = loadBalancer.getReachableServers();
            if (CollectionUtils.isEmpty(reachableServers)) return null;
            int idx = ThreadLocalRandom.current().nextInt(reachableServers.size());
            server = reachableServers.get(idx);
            if (server == null || server.isAlive()) {
                log.warn("Ribbon 服务实例异常, 获取为空 || 状态不健康");
                Thread.yield();
                continue;
            }
            return server;
        }
    }

    ... initWithNiwsConfig 不用实现
}
复制代码

Talk about the logic of the MyRule load that we implemented ourselves:

  1. IRule gets the service list not implemented in the caller , but abstracts AbstractLoadBalancerRule, so we need to get the service list inheritance just fine
  2. Roughly similar to the random load rule, except that the process is simplified here, and the Server instance is directly obtained from the list of healthy service instances
  3. Make sure that the Server is not empty and the node is healthy, and then return , if it does not match, print the log, sleep for a while and repeat
  4. If you are on the safe side, it is best to add a loop number condition to while to avoid infinite loops

Then register MyRule to the SPring IOC container, and it will replace the default Rule load rule during initialization.

Thinking about IPing

When reading the source code of Ribbon Ping, I found two places that I think are unreasonable.

  1. setPingInterval is meaningless to execute the set Ping task when setting the Ping interval
  2. In the BaseLoadBalancer constructor, ping is null, and setPingInterval is called again , and the result will only return null

The two methods setPingInterval and setPing occur when BaseLoadBalancer is initialized, which is equivalent to continuing the above logic. First explain the execution logic, and then look at the unreasonable places

setupPingTask is used to periodically execute the ping task to the Server, that is, to detect whether the Server is available

Personally feel that there is no need to execute the setupPingTask method in setPingInterval

The above conclusions are based on the following:

  1. When setPingInterval is executed for the first time, ping must be empty, then it will return True in canSkipPing, and then directly end the setPingInterval method
  2. Later, I thought about whether it will be referenced elsewhere and need to be forced to refresh. However, the goose searches for the reference globally, and it is only called during this initialization. Of course, other dependent packages will use this method.
  3. In summary, setPingInterval is meaningless to execute the method of setting Ping tasks

In addition, the author feels that the method called in the code has no practical meaning. Similar to the above, the setPingInterval method is executed when the ping is empty

The above two points are improper points found by the author when talking with the source code, so I have taken up some space here to explain, mainly to express two points of my own thoughts to readers and friends.

  1. Don't be in awe of the source code, it is the production environment that should be in awe! Don't feel that looking at the framework source code is an unattainable thing. In fact, sometimes the code you can't understand may be the product of confusion after maintenance by multiple people. If conditions permit, you still need to follow up the source code to take a look.
  2. Speak out your own opinions bluntly , if you only think about it, then there is probably no answer. Through the indirect method of the article, let more friends see it, correct the wrong statement or get affirmation

Conclusion

Overall, the article is more emphasis on design and source code analysis of the expression , so read the article requires a certain knowledge of the source code. At the same time, the article is narrated for the problem, even if the source code is not understood, you can gain something

This section of Ribbon starts with the initialization of the load balancing client ILoadBalancer, and describes the specific content of the initialization process, including how to turn on the IPing timer and the service list update timer

In addition, viewing Ribbon’s service list from the source code actually initiates a service call to the interface provided by Nacos, obtains it and saves it to the local cache, and then leads to how to ensure that the unhealthy instance goes offline: IPing timer and service update timer

The chapter at the end of the article talks about the full link requesting Ribbon load balancing and how to define a load balancing algorithm by yourself. The last part also talked about the code that feels meaningless to SpringCloud IPing in the process of reading the source code. Of course, it is not ruled out that it is left for other package integration.


Author: Source Interests circle
link: https: //juejin.cn/post/6933767343570944008
Source: Nuggets
 

Guess you like

Origin blog.csdn.net/m0_50180963/article/details/114172048