Wonderful, Meituan OCTO distributed service governance system, this description is too clear

OCTO is a distributed service communication framework and service governance system with 100 billion calls of Meituan, which can realize service registration, service automatic discovery, service management, fault-tolerant processing, data visualization, service monitoring and alarming, service grouping, etc. This article summarizes the principles of the OCTO architecture, the integration method of Java applications, and the use of its console.

1 Overview

OCTO is short for octopus (octopus). It is a company-level infrastructure of Meituan, which provides a unified high-performance service communication framework for all the company's businesses, so that the business has good service operation capabilities, and easily realizes service registration, automatic service discovery, load balancing, fault tolerance, grayscale publishing, and data visualization. , monitoring alarms and other functions to improve service opening efficiency, availability, and service operation and maintenance efficiency.

[Special note] OCTO is a dedicated internal system of Meituan. It is not open source and cannot be built externally.  [Significance of this article] OCTO is a domestic heavyweight service governance system, with a current call volume of hundreds of billions. By sorting out the architecture principles and usage methods of the system, it can help us deepen our understanding of distributed services.

1.1 Position in Meituan's technical architecture

1.2 Features

  • Naming Services - Service Registration; Service Autodiscovery
  • Service management - service status monitoring; service start and stop; service load balancing
  • Fault-tolerant processing - real-time screening of abnormal services, automatic allocation of request traffic
  • Traffic distribution - scenarios such as grayscale publishing, node dynamic traffic distribution, etc.
  • Data visualization - statistical reporting and analysis of service calls, providing clear data chart display, and clearly understanding the dependencies between services
  • Service grouping - supports dynamic automatic grouping of services and custom grouping in different scenarios, solving problems such as cross-machine room call penetration and Sandbox in multi-machine room scenarios
  • Service monitoring and alarming - supports multi-index and multi-dimensional monitoring of service and interface levels, and supports multiple alarm methods
  • Unified configuration management - supports unified management of service configuration, flexibly sets differences between different environments, supports historical versions, and delivers real-time configuration items after changes
  • Distributed Service Tracing - Easily diagnose problems such as slow service access and abnormal jitter
  • Overload protection - flexibly define the quota of service consumers. When the call volume exceeds the maximum threshold, QoS is differentiated based on different service consumers and overload protection is triggered.
  • Service Access Control

1.3 Environment Division

The service provider's environment is divided into two systems: online (IDC) and offline (office cloud). The offline system is a simulation of the online system. There are three environments of test/staging/prod in each system.

1.4 Calling process

  • All parties (provider/consumer) register their own dedicated appkey on OCTO, such as appkey-provider/appkey-consumer
  • The provider registers the service (marked as appkey-provider) on OCTO, and the same appkey is deployed in all three environments;
  • Assume that the consumer in the staging environment requests services on OCTO (mark your own appkey-consumer, target appkey-provider)
  • OCTO queries the service list of appkey-provider in the staging environment and sends it to the consumer
  • The consumer accesses the IP:PORT service through mtthrift

1.5 Understanding appkey

Take Nginx as a reference to understand. For the traditional configuration method, the mapping relationship between domain names and physical servers is maintained by Ngnix, and the increase or decrease of physical servers needs to be adjusted by operation and maintenance personnel, which cannot be done dynamically:

For the configuration of appkey, a new layer of appkey is added:

  • The mapping relationship between the domain name and the appkey is configured by Nginx and does not need to be adjusted in the future;
  • The mapping relationship between the appkey and the physical server can be dynamically adjusted.

Similarly for Thrift, for client requests (appkey:port), Thrift Server finds the physical server (IP:port) through appkey.

2 Overall Architecture

2.1 MTransport (Service Communication Framework)

  MTthrift is a secondary development based on Thrift (sourced by Facebook as Apache Thrift). It is a distributed service communication framework dedicated to providing high-performance and transparent RPC remote service invocation solutions. It is the core framework of the OCTO service governance solution. 4000+ services provide support for 200 billion+ visits and are widely used in various business lines of Xinmei University.

  MTransport is a multi-language service communication framework, which shields the implementation details of the underlying high-performance network communication, thereby realizing simple and efficient service development. MTransport supports protocols such as Thrift/HTTP/pigeon. Among them, Thrift includes MTthrift (Java), PThrift (PHP), CThrift (C/C++), Turbo Thrift (NodeJS), etc. Thrift supports code implementation in different language versions, maintains the consistency of communication protocols, supports service registration, and automatic service discovery , distributed service call tracking, etc. HTTP also currently supports JAVA, NodeJs and C++.

  MTthrift provides efficient tools such as service template management, code generation engine, etc.

2.2 HLB (Elastic Load Balancer)

HLB is short for Hardware Load Balance. All HTTP request/response traffic goes through this system, similar to amazon elb.

2.3 SG_agent (Service Governance Agent)

SG is an acronym for Service Governance. SG_agent is deployed on each service node (service provider and consumer), communicates with MNS, provides service registration/discovery, configuration update, access control, quota restriction and other functions, and reports call statistics to the performance monitoring platform.

2.4 MNS (Meituan Naming Service)

MNS is an acronym for Meituan Naming Service. MNS is a service registration and routing center, built on ZooKeeper, to provide a stable and reliable naming service management component for various distributed services of the company, and to quickly realize service registration, routing, and automatic service discovery.

It mainly provides storage/access of service summary, node IP/Port, node weight, quota and other information, and service health status detection.

  1. Reliability: (1) Consistency: No matter which server node is connected to the cluster, a consistent data view is displayed. (2) Atomicity: Node updates either succeed or fail. (3) High availability: In a cluster composed of 2n+1 machines, even if n machines fail, the high availability of the cluster will not be affected.
  2. Decentralization: The MNS in the center mainly provides functions such as service registration, discovery, and routing strategy, and other services are mainly provided by SG_agent in each service node.

2.5 Data-center (service data center)

Collect log data reported by all OCTO services of the company, and provide system performance indicators, health status, basic alarms, etc. for each business line

2.6 Scanner (Health Check System)

Scans the health of each service and removes it from MNS when unavailable.

2.7 MCC (Meituan Configuration Center)

MCC is an acronym for Meituan Config Center. The unified configuration center provides unified configuration management services, realizes configuration and code separation, real-time update of configuration information, high availability, version control, improves service development efficiency, and reduces operation and maintenance costs. The principle is to store the configuration file in JSON format in the ZooKeeper directory. When the user changes the configuration in MSGP, MSGP informs SG_agent to pull data, and the configuration data in zk is placed in the specified directory of the local machine.

2.8 MSGP (Meituan Service Governance Platform)

MSGP is an acronym for Meituan Service Governance PlatForm. Goal: Provide a one-stop management platform for the company's various services such as registration, governance, diagnosis, configuration, quota and other functions.

3 Access method

Use annotations such as @ThriftService, @ThriftMethod, @ThriftStruct, @ThriftField provided by thrift to annotate ordinary Java classes to make them the data model and service interface of thrift. Its usage model is very similar to Dubbo: providers and consumers of services are defined based on a common set of interfaces.

The following three modules are created based on SpringBoot 
interface/provider/consumer. Run the provider on the local machine to successfully register with the OCTO of the dev environment. Run the consumer on the local machine to successfully consume the services provided by the provider. This example has been run through the Meituan intranet.

3.1 Public interface service-interface

Import the dependency package and define the interface DemoThriftService. The used parameters StudentParam / GenderEnum must be marked with relevant annotations.

3.1.1 pom.xml

<dependency>
    <groupId>com.meituan.service.mobile</groupId>
    <artifactId>mtthrift</artifactId>
    <version>1.8.5</version>
</dependency>
<dependency>
    <groupId>com.meituan.mtrace</groupId>
    <artifactId>mtrace</artifactId>
    <version>1.1.14</version>
</dependency>

3.1.2 StudentParam.java

/**
 * @author: kefeng.wang
 * @date: 2018-06-29 16:34
 * @description: 学生定义(作为输入参数)
 **/
@ThriftStruct
public class StudentParam {
    private Integer id;
    private String name;

    @ThriftConstructor
    public StudentParam(Integer id, String name) {
        this.id = id;
        this.name = name;
    }

    @ThriftField
    public Integer getId() {
        return id;
    }

    @ThriftField(1)
    public void setId(Integer id) {
        this.id = id;
    }

    @ThriftField
    public String getName() {
        return name;
    }

    @ThriftField(2)
    public void setName(String name) {
        this.name = name;
    }
}

3.1.3 GenderEnum.java

/**
 * @author: kefeng.wang
 * @date: 2018-06-29 16:41
 * @description: 性别定义(作为输出参数)
 **/
@ThriftEnum
public enum GenderEnum {
    GENDER_MALE(1, "male", "男性"),
    GENDER_FEMALE(2, "female", "女性"),
    GENDER_UNKNOWN(0, "unknown", "未知性别");

    private Integer id;
    private String value;
    private String desc;

    GenderEnum(Integer id, String value, String desc) {
        this.id = id;
        this.value = value;
        this.desc = desc;
    }

    // @ThriftEnumValue
    public Integer getId() {
        return id;
    }

    public void setId(Integer id) {
        this.id = id;
    }

    public String getValue() {
        return value;
    }

    public void setValue(String value) {
        this.value = value;
    }

    public String getDesc() {
        return desc;
    }

    public void setDesc(String desc) {
        this.desc = desc;
    }
}

3.1.4 DemoThriftService.java

/**
 * @author: kefeng.wang
 * @date: 2018-06-29 16:19
 * @description: Thrift 接口定义
 **/
@ThriftService
public interface DemoThriftService {
    @ThriftMethod
    String getVersion() throws TException;

    @ThriftMethod
    StudentParam getGenderStudent(GenderEnum gender) throws TException;
}

3.2 Service provider service-provider

Introduce dependency packages: service-interface is the public interface just defined, and hystrix is ​​used for fault tolerance. In this module, first implement the public interface, then define and publish the relevant configuration, and then run 
the ServiceProviderApplication to start the service provider.

3.2.1 pom.xml

<dependency>
    <groupId>com.meituan</groupId>
    <artifactId>service-interface</artifactId>
    <version>1.0.0</version>
</dependency>
<dependency>
    <groupId>com.netflix.hystrix</groupId>
    <artifactId>hystrix-javanica</artifactId>
    <version>1.5.12</version>
</dependency>

3.2.2 DemoThriftServiceImpl.java

/**
 * @author: kefeng.wang
 * @date: 2018-06-29 17:29
 * @description: Thrift 接口实现(服务提供者)
 **/
public class DemoThriftServiceImpl implements DemoThriftService {
    @Override
    public String getVersion() throws TException {
        return "1.0.0";
    }

    @Override
    @HystrixCommand
    public StudentParam getGenderStudent(GenderEnum gender) throws TException {
        return new StudentParam(1, "张三");
    }
}

3.2.3 DemoServiceProviderConfig.java

/**
 * @author: kefeng.wang
 * @date: 2018-06-29 17:45
 * @description: Thrift 发布(服务提供者)
 **/
@Configuration
public class DemoServiceProviderConfig {
    @Resource(name = "serviceProcessor")
    private DemoThriftService serviceProcessor;

    @Bean(name = "serviceProcessor")
    public DemoThriftService getDemoThriftService() {
        return new DemoThriftServiceImpl();
    }

    @Bean(name = "serverPublisher", initMethod = "publish", destroyMethod = "destroy")
    public ThriftServerPublisher getThriftServerPublisher() {
        ThriftServerPublisher serverPublisher = new ThriftServerPublisher();
        serverPublisher.setServiceInterface(DemoThriftService.class); // [MUST] 接口类
        serverPublisher.setServiceImpl(serviceProcessor); // [MUST] 实现类
        serverPublisher.setAppKey(APPKEY_TEST_SERVER); // [MUST] 服务提供者 appkey
        serverPublisher.setPort(9001); // [MUST] 服务提供者监听端口
        return serverPublisher;
    }
}

3.2.4 ServiceProviderApplication.java

/**
 * @author: kefeng.wang
 * @date: 2018-06-29 17:50
 * @description: 启动(服务提供者)
 **/
@SpringBootApplication
public class ServiceProviderApplication {
    public static void main(String[] args) {
        SpringApplication.run(ServiceProviderApplication.class, args);
    }
}

3.3 Service consumer service-consumer

Introduce dependency packages: service-interface is the public interface just defined. In this module, first specify the service provider and consumption options, then use the shared interface to define the Controller to call, and then run 
the ServiceConsumerApplication to start the service consumer. Start the browser to visit 
http://localhost:8080/demo, and the call is successful.

3.3.1 pom.xml

<dependency>
    <groupId>com.meituan</groupId>
    <artifactId>service-interface</artifactId>
    <version>1.0.0</version>
</dependency>

3.3.2 DemoServiceConsumerConfig.java

/**
 * @author: kefeng.wang
 * @date: 2018-06-29 18:01
 * @description: Thrift 消费者
 **/
@Configuration
public class DemoServiceConsumerConfig {
    @Bean(name = "thriftPoolConfig")
    public MTThriftPoolConfig getMTThriftPoolConfig() {
        MTThriftPoolConfig thriftPoolConfig = new MTThriftPoolConfig();
        thriftPoolConfig.setMaxActive(100);
        thriftPoolConfig.setMaxIdle(20);
        thriftPoolConfig.setMinIdle(5);
        thriftPoolConfig.setMaxWait(3000);
        thriftPoolConfig.setTestOnBorrow(true);
        thriftPoolConfig.setTestOnReturn(false);
        thriftPoolConfig.setTestWhileIdle(false);
        return thriftPoolConfig;
    }

    @Bean(name = "demoThriftService", destroyMethod = "destroy")
    public ThriftClientProxy getThriftClientProxy(MTThriftPoolConfig thriftPoolConfig) {
        ThriftClientProxy thriftClientProxy = new ThriftClientProxy();
        thriftClientProxy.setMtThriftPoolConfig(thriftPoolConfig); // [可选]配置
        thriftClientProxy.setServiceInterface(DemoThriftService.class); // [MUST]接口类
        thriftClientProxy.setAppKey(APPKEY_TEST_CLIENT); // [MUST]服务消费者 appkey
        thriftClientProxy.setRemoteAppkey(APPKEY_TEST_SERVER); // [MUST]服务提供者 appkey
        thriftClientProxy.setRemoteServerPort(9001); // [常用]服务提供者 port
        thriftClientProxy.setTimeout(30000); // [常用]调用超时
        return thriftClientProxy;
    }
}

3.3.3 DemoConsumerController.java

/**
 * @author: kefeng.wang
 * @date: 2018-06-29 18:10
 * @description: Thrift 演示的外部入口
 **/
@RestController
public class DemoConsumerController {
    private static final Logger logger = LoggerFactory.getLogger(DemoConsumerController.class);

    @Resource
    private DemoThriftService demoThriftService;

    @GetMapping("/demo")
    public StudentParam demo() {
        try {
            return demoThriftService.getGenderStudent(GenderEnum.GENDER_MALE);
        } catch (TException e) {
            logger.warn(e.getMessage(), e);
        }

        return null;
    }
}

3.3.4 ServiceConsumerApplication.java

/**
 * @author: kefeng.wang
 * @date: 2018-06-29 17:50
 * @description: 启动(服务消费者)
 **/
@SpringBootApplication
public class ServiceConsumerApplication {
    public static void main(String[] args) {
        SpringApplication.run(ServiceConsumerApplication.class, args);
    }
}

4 Publishing platform (plus)

MtThrift is a customized modification on Thrift, so that the code can be discovered and processed by the OCTO platform after Plus is released.

5 Service Governance Platform (MSGP)

Each environment of test / staging / prod has a corresponding WEB management platform (only accessible within the company or through VPN).

For information security reasons, relevant screenshots are not provided one by one. Common functions are:

  • Service Details/Service Provider: List each host (hostname/IP/PORT) of the current appkey, add or delete hosts, adjust the weight, and enable or disable;
  • Service details/Service consumers: View the current appkey consumers and their consuming provider hosts, call volume, etc. in different time periods;
  • Service operation/service grouping: You can set the priority of the same center and the same computer room;
  • Data analysis/source analysis: Statistics of current appkey upstream service calls (call volume, QPS, time-consuming, etc.) by time period;
  • Data analysis / destination analysis: Statistics of current appkey downstream service calls (call volume, QPS, time-consuming, etc.) by time period;
  • Data analysis/host analysis: Statistics of the current appkey's invocation status (call volume, QPS, time-consuming, etc.) of each host by time period.

Author: Wang Kefeng Source:
https://kefeng.wang/2018/06/29/distributed-octo/

Guess you like

Origin blog.csdn.net/m0_67645544/article/details/124426223