Architecture design of Ctrip gateway - 20 billion daily traffic

I. Overview

Similar to the approach of many enterprises, Ctrip API Gateway is an infrastructure introduced along with the microservice architecture, and its initial version was released in 2014. With the rapid advancement of servitization within the company, gateways have gradually become a standard solution for applications to be exposed to the external network. In subsequent projects such as "ALL IN Wireless", internationalization, and multi-activity in remote locations, the gateway has continued to develop with the common evolution of the company's public services and infrastructure. As of July 2021, the overall number of access services exceeds 3,000, and the average daily processing traffic reaches 20 billion.

In terms of technical solutions, the early development of the company's microservices was deeply influenced by NetflixOSS. The gateway part was first developed with reference to Zuul 1.0. Its core can be summarized as the following four points:

  • server端:Tomcat NIO + AsyncServlet

  • Business process: independent thread pool, staged chain of responsibility model

  • Client side: Apache HttpClient, synchronous call

  • Core components: Archaius (dynamic configuration client), Hystrix (circuit breaker and current limiting), Groovy (hot update support)

As we all know, synchronous calls will block threads, and the system's throughput capacity is greatly affected by IO.

As an industry leader, Zuul has taken this issue into consideration when designing: by introducing Hystrix, resource isolation and current limiting are achieved, and faults (slow IO) are limited to a certain range; combined with the circuit breaker strategy, some thread resources can be released in advance; Ultimately, the goal is to achieve the goal that local abnormalities will not affect the overall situation.

However, as the company's business continues to develop, the effect of the above strategies gradually weakens. There are two main reasons:

  • Business going overseas: As the gateway serves as the overseas access layer, some traffic needs to be transferred back to China, and slow IO becomes the norm.

  • Growth in service scale: Local exceptions have become the norm, and coupled with the abnormal proliferation of microservices, the thread pool may be in a sub-healthy state for a long time.

Fully asynchronous transformation has been a core work of Ctrip API gateway in recent years. This article will also focus on this and discuss our work and practical experience in the gateway.

Key points include: performance optimization, business form, technical architecture, governance experience, etc.

2. High-performance gateway core design

2.1. Asynchronous process design

Full asynchronous = server-side asynchronous + business process asynchronous + client-side asynchronous

For the server and client, we use the Netty framework, whose NIO/Epoll + Eventloop is essentially an event-driven design.

The core part of our transformation is to asynchronousize business processes. Common asynchronous scenarios include:

  • Business IO events: such as request verification, identity authentication, involving remote calls

  • Self-IO events: For example, the first xx bytes of the message are read

  • Request forwarding: including TCP connection, HTTP request

From experience, asynchronous programming is slightly more difficult than synchronous programming in terms of design, reading and writing, mainly including:

  • Process Design & State Transition

  • Exception handling, including general exceptions and timeouts

  • Context delivery, including business context and trace log

  • thread scheduling

  • flow control

Especially within the context of Netty, if the life cycle design of ByteBuf is not perfect, it can easily lead to memory leaks.

Focusing on these issues, we designed corresponding peripheral frameworks and tried our best to smooth out the synchronous/asynchronous differences in business code to facilitate development; at the same time, we defaulted to safety and fault tolerance to ensure the overall safety of the program.

In terms of tools, we use RxJava, and its main process is shown in the figure below.

picture

  • Maybe

  • RxJava's built-in container class represents three states: normal end, one and only one object returned, and exception.

  • Responsive, convenient for overall state machine design, with built-in exception handling, timeout, thread scheduling and other packages

  • Maybe.empty()/Maybe.just(T), suitable for synchronous scenarios

  • Tool class RxJavaPlugins to facilitate aspect logic encapsulation

  • Filter

  • Represents an independent piece of business logic, a unified interface for synchronous & asynchronous business, and returns Maybe

  • Asynchronous scenarios (such as remote calls) are uniformly encapsulated. If thread switching is involved, switch back through maybe.obesrveOn(eventloop)

  • The asynchronous filter increases the timeout by default, handles it as a weak dependency, and ignores errors.

public interface Processor<T> {    
    ProcessorType getType();
    
    int getOrder();
    
    boolean shouldProcess(RequestContext context);
    
    //对外统一封装为Maybe    
    Maybe<T> process(RequestContext context) throws Exception; 
}
public abstract class AbstractProcessor implements Processor { 
    //同步&无响应,继承此方法 
    //场景:常规业务处理 
    protected void processSync(RequestContext context) throws Exception {}


    //同步&有响应,继承此方法,健康检测
    //场景:健康检测、未通过校验时的静态响应
    protected T processSyncAndGetReponse(RequestContext context) throws Exception {
        process(context);
        return null;
    };


    //异步,继承此方法
    //场景:认证、鉴权等涉及远程调用的模块
    protected Maybe<T> processAsync(RequestContext context) throws Exception 
    {
        T response = processSyncAndGetReponse(context);
        if (response == null) {
            return Maybe.empty();
        } else {
            return Maybe.just(response);
        }
    };


    @Override
    public Maybe<T> process(RequestContext context) throws Exception {
        Maybe<T> maybe = processAsync(context);
        if (maybe instanceof ScalarCallable) {
            //标识同步方法,无需额外封装
            return maybe;
        } else {
            //统一加超时,默认忽略错误
            return maybe.timeout(getAsyncTimeout(context), TimeUnit.MILLISECONDS,
                                 Schedulers.from(context.getEventloop()), timeoutFallback(context));
        }
    }


    protected long getAsyncTimeout(RequestContext context) {
        return 2000;
    }


    protected Maybe<T> timeoutFallback(RequestContext context) {
        return Maybe.empty();
    }
}
  • overall process

  • Following the design of the chain of responsibility, it is divided into four stages: inbound, outbound, error, and log.

  • Each stage consists of one or more filters

  • The filters are executed sequentially and interrupted when an exception is encountered. During the inbound period, any filter returning a response will also trigger an interruption.

public class RxUtil{
    //组合某阶段(如Inbound)内的多个filter(即Callable<Maybe<T>>)
    public static <T> Maybe<T> concat(Iterable<? extends Callable<Maybe<T>>> iterable) {
        Iterator<? extends Callable<Maybe<T>>> sources = iterable.iterator();
        while (sources.hasNext()) {
            Maybe<T> maybe;
            try {
                maybe = sources.next().call();
            } catch (Exception e) {
                return Maybe.error(e);
            }
            if (maybe != null) {
                if (maybe instanceof ScalarCallable) {
                    //同步方法
                    T response = ((ScalarCallable<T>)maybe).call();
                    if (response != null) {
                        //有response,中断
                        return maybe;
                    }
                } else {
                    //异步方法
                    if (sources.hasNext()) {
                        //将sources传入回调,后续filter重复此逻辑
                        return new ConcattedMaybe(maybe, sources);
                    } else {
                        return maybe;
                    }
                }
            }
        }
        return Maybe.empty();
    }
}
public class ProcessEngine{
    //各个阶段,增加默认超时与错误处理
    private void process(RequestContext context) {
        List<Callable<Maybe<Response>>> inboundTask = get(ProcessorType.INBOUND, context);
        List<Callable<Maybe<Void>>> outboundTask = get(ProcessorType.OUTBOUND, context);
        List<Callable<Maybe<Response>>> errorTask = get(ProcessorType.ERROR, context);
        List<Callable<Maybe<Void>>> logTask = get(ProcessorType.LOG, context);

        RxUtil.concat(inboundTask)    //inbound阶段                    
            .toSingle()        //获取response                          
            .flatMapMaybe(response -> {
                context.setOriginResponse(response);
                return RxUtil.concat(outboundTask);
            })            //进入outbound
            .onErrorResumeNext(e -> {
                context.setThrowable(e);
                return RxUtil.concat(errorTask).flatMap(response -> {
                    context.resetResponse(response);
                    return RxUtil.concat(outboundTask);
                });
            })            //异常则进入error,并重新进入outbound
            .flatMap(response -> RxUtil.concat(logTask))  //日志阶段
            .timeout(asyncTimeout.get(), TimeUnit.MILLISECONDS, Schedulers.from(context.getEventloop()),
                     Maybe.error(new ServerException(500, "Async-Timeout-Processing"))
                    )            //全局兜底超时
            .subscribe(        //释放资源
            unused -> {
                logger.error("this should not happen, " + context);
                context.release();
            },
            e -> {
                logger.error("this should not happen, " + context, e);
                context.release();
            },
            () -> context.release()
        );
    }   
}

2.2. Streaming forwarding & single thread

Taking HTTP as an example, a message can be divided into three components: initial line/header/body.

 

In Ctrip, the gateway layer business does not involve the request body.

Since there is no need to store the entire amount, you can directly enter the business process after parsing the request header.

At the same time, if the body part of the request body is received:

①If the request has been forwarded to upstream, forward it directly;

② Otherwise, it needs to be temporarily stored, and then sent together with the initial line/header after the business process is completed;

③The same is true for the processing of the upstream response.

Compared with the way of completely parsing the HTTP message, it is processed like this:

  • Entering the business process earlier means that upstream receives the request earlier, which can effectively reduce the delay introduced by the gateway layer.

  • The body life cycle is compressed, which can reduce the memory overhead of the gateway itself.

Despite the performance improvements, streaming also significantly increases the complexity of the overall process.

picture

In non-streaming scenarios, the sub-processes of Netty Server-side encoding and decoding, inbound business logic, Netty Client-side encoding and decoding, and outbound business logic are independent of each other and each process a complete HTTP object. With streaming processing, requests may be in multiple processes at the same time, which brings the following three challenges:

  • Thread safety issues: If each process uses different threads, concurrent modification of the context may be involved;

  • Multi-stage linkage: For example, if Netty Server encounters a connection interruption halfway through receiving a request, and is already connected to upstream, then the protocol stack on the upstream side will not be completed, and the connection must be closed accordingly;

  • Edge scene processing: For example, if upstream returns 404/413 when the request is not completely sent, should you choose to continue sending, complete the protocol stack, and allow the connection to be reused, or should you choose to terminate the process early to save resources but give up the connection at the same time? For another example, upstream has received the request but has not responded. At this time, the Netty Server is suddenly disconnected. Will the Netty Client also be disconnected? etc.

In order to deal with these challenges, we adopted a single-threaded approach. The core design includes:

  • The online document is bound to Eventloop, and Netty Server/Business Process/Netty Client are executed in the same eventloop;

  • If the asynchronous filter must use an independent thread pool due to the IO library, then the post-processing must be switched back;

  • Perform necessary thread isolation on resources within the process (such as connection pool);

The single-thread method avoids concurrency problems. When dealing with multi-stage linkage and edge scene problems, the entire system is in a certain state, which effectively reduces development difficulty and risks. In addition, reducing thread switching can also improve performance to a certain extent. However, due to the small number of worker threads (generally equal to the number of CPU cores), IO operations must be completely avoided in the eventloop, otherwise it will have a significant impact on the throughput of the system.

2.3 Other optimizations

  • Lazy loading of internal variables

For requested cookie/query and other fields, string parsing will not be performed in advance if it is not necessary.

  • Off-heap memory & zero copy

Combined with the previous streaming forwarding design, the system memory usage is further reduced.

  • ZGC

Since the project was upgraded to TLSv1.3, JDK11 was introduced (JDK8 support is late, version 8u261, 2020.7.14), and a new generation of garbage collection algorithm was also tried. Its actual performance is indeed as good as people expected. Although CPU usage has increased, the overall GC time has dropped significantly.

picture

picture

  • Customized HTTP codec

Due to the long history and openness of the HTTP protocol, many "bad practices" have emerged, which at least affect the request success rate, and at worst pose a threat to website security.

  • Traffic management

For problems such as the request body is too large (413), the URI is too long (414), and non-ASCII characters (400), general web servers will choose to directly reject and return the corresponding status code. Since this type of problem skips the business process, it will cause some trouble in terms of statistics, service location and troubleshooting. By extending the codec, problem requests can also complete the routing process, which helps solve the management problem of non-standard traffic.

  • Request filtering

For example, request smuggling (Fixed by Netty 4.1.61.Final, released on March 30, 2021). By extending the codec and adding custom verification logic, security patches can be applied faster.

3. Gateway business form

As an independent and unified entry point for inbound traffic, the gateway's value to enterprises is mainly demonstrated in three aspects:

  • Decoupling different network environments: Typical scenarios include intranet & external network, production environment & office area, different security domains within IDC, dedicated lines, etc.;

  • Natural public business aspects: including security & authentication & anti-climbing, routing & grayscale, current limiting & circuit breaker & downgrade, monitoring & alarm & troubleshooting, etc.;

picture

picture

  • Efficient and flexible flow control

Here are a few subdivided scenarios:

  • private protocol

In the closed client (APP), the framework layer will intercept the HTTP request initiated by the user and transmit it to the server through a private protocol (SOTP).

In terms of location selection: ① allocate IP through the server to prevent DNS hijacking; ② preheat the connection; ③ adopt a customized location selection strategy, which can be switched automatically according to network conditions, environment and other factors.

In terms of interaction mode: ① Use a lighter protocol body; ② Encrypt, compress and multiplex in a unified manner; ③ The gateway uniformly converts the protocol at the entrance, which has no impact on the business.

  • Link optimization

The key is to introduce the access layer to allow remote users to access nearby and solve the problem of excessive handshake overhead. At the same time, since both the access layer and the IDC are controllable, there is more room for optimization in terms of network link selection, protocol interaction mode, etc.

  • Live more in different places

Different from proportional allocation and nearby access policies, in the remote multi-active mode, the gateway (access layer) needs to shunt traffic according to the shardingKey of the business dimension (such as userId) to prevent underlying data conflicts.

4. Gateway Management

The diagram shown below summarizes the working status of the online gateway. Vertically corresponds to our business process: traffic from various channels (such as APP, H5, mini programs, suppliers) and various protocols (such as HTTP, SOTP) is distributed to the gateway through load balancing, and is processed through a series of business logic. is forwarded to the backend service. After the improvements in Chapter 2, the horizontal business has been significantly improved in terms of performance and stability.

On the other hand, due to the existence of multiple channels/protocols, online gateways deploy independent clusters based on services. In the early days, business differences (such as routing data, functional modules) were managed through independent code branches, but as the number of branches increased, the complexity of overall operation and maintenance also continued to increase. In system design, complexity often also means risk. Therefore, how to uniformly manage multi-protocol and multi-role gateways and how to quickly build customized gateways for new services at a lower cost have become the focus of our next phase of work.

The solution has been intuitively presented in the figure. The first is to perform compatibility processing on the protocol so that the online code can run under one framework; the second is to introduce a control plane to uniformly manage the different characteristics of the online gateway.

4.1 Multi-protocol compatibility

The method of multi-protocol compatibility is not new. You can refer to Tomcat's abstract processing of HTTP/1.0, HTTP/1.1, and HTTP/2.0. Although HTTP has added many new features in each version, we usually cannot perceive these changes when doing business development. The key lies in the abstraction of the HttpServletRequest interface.

In Ctrip, the online gateway handles stateless protocols in the request-response mode. The message structure can also be divided into three parts: metadata, extension header, and business message, so similar attempts can be easily made. Related work can be summarized in the following two points:

  • Protocol adaptation layer: used to shield the encoding and decoding of different protocols, interactive modes, processing of TCP connections, etc.

  • Define common intermediate models and interfaces: Business programming is oriented to intermediate models and interfaces, and better attention is paid to the business attributes corresponding to the protocol.

picture

4.2 Routing module

The routing module is one of the two main components of the control plane. In addition to managing the mapping relationship between gateways and services, the service itself can be summarized by the following model:

{
    //匹配方式
    "type": "uri",

    //HTTP默认采用uri前缀匹配,内部通过树结构寻址;私有协议(SOTP)通过服务唯一标识定位。
    "value": "/hotel/order",
    "matcherType": "prefix",

    //标签与属性
    //用于portal端权限管理、切面逻辑运行(如按核心/非核心)等
    "tags": [
        "owner_admin",
        "org_framework",
        "appId_123456"
    ],
    "properties": {
        "core": "true"
    },

    //endpoint信息
    "routes": [{
        //condition用于二级路由,如按app版本划分、按query重分配等
        "condition": "true",
        "conditionParam": {},
        "zone": "PRO",

        //具体服务地址,权重用于灰度场景
        "targets": [{
            "url": "http://test.ctrip.com/hotel",
            "weight": 100
        }
                   ]
    }]
}

4.3 Module arrangement

Module scheduling is another key component of the control plane. We have multiple stages in the gateway processing flow (shown in pink in the diagram). In addition to common functions such as circuit breakers, current limiting, and logging, during runtime, the business functions that different gateways need to perform are uniformly allocated by the control plane. These functions have independent code modules inside the gateway, and the control plane additionally defines the execution conditions, parameters, gray scale and error handling methods corresponding to these functions. This scheduling method also ensures decoupling between modules to a certain extent.

{
    //模块名称,对应网关内部某个具体模块
    "name": "addResponseHeader",

    //执行阶段
    "stage": "PRE_RESPONSE",

    //执行顺序
    "ruleOrder": 0,

    //灰度比例
    "grayRatio": 100,

    //执行条件
    "condition": "true",
    "conditionParam": {},

    //执行参数
    //大量${}形式的内置模板,用于获取运行时数据
    "actionParam": {
        "connection": "keep-alive",
        "x-service-call": "${request.func.remoteCost}",
        "Access-Control-Expose-Headers": "x-service-call",
        "x-gate-root-id": "${func.catRootMessageId}"
    },

    //异常处理方式,可以抛出或忽略
    "exceptionHandle": "return"
}

5. Summary

Gateway has always been a topic of great concern on various technical exchange platforms, and there are many mature solutions: Zuul 1.0, which is easy to use and developed earlier, high-performance Nginx, Spring Cloud Gateway with high integration, and the increasingly popular Istio etc.

The final selection still depends on the business background and technology ecosystem of each company.

Therefore, at Ctrip, we have chosen the path of independent research and development.

Technology is constantly developing, and we are also continuing to explore, including the relationship between public gateways and business gateways, the application of new protocols (such as HTTP3), the association with ServiceMesh, etc.

Guess you like

Origin blog.csdn.net/WXF_Sir/article/details/132708652