The best of both worlds: a new service mesh that combines Sidecarless and Sidecar models

01 Preface

The Istio community released a new architecture called Ambient in September 2022. Because it architecturally transforms Sidecar into a layer 4 and layer 7 proxy, this model is also called the Sidecarless model. Alibaba Cloud Service Grid ASM is the industry's first managed service grid that supports Ambient mode.

This article is based on the actual record shared at the 2023 Yunqi Conference on the latest progress of ASM product technology of Alibaba Cloud Service Grid. Shi Zehuan and Yin Hang from the Alibaba Cloud native product line service grid team will use 4 parts to introduce readers to how ASM is implemented . This new form of service grid is the integration of Sidecarless and Sidecar modes, as well as the serverlessization of the service grid.

02 Evolution of service grid architecture

As service grids become more and more popular under the cloud-native technology system, more and more enterprise technical teams use service grids in production environments. I believe that many friends have already learned about service grids. For convenience, they have just started to contact service grids. In order for students to understand the subsequent content more smoothly, let's first introduce the classic Sidecar mode of service grid.

The classic service grid architecture is divided into a control plane and a data plane based on the responsibilities of the components. The control plane acts as the steward of the service grid, providing different configurations for the data plane components according to needs. The data plane acts as an executor and can be configured based on the control plane. The delivered configuration controls traffic and is the real executor of service grid capabilities.

In order to implement service grid capabilities such as traffic routing, load balancing, fault injection, request response manipulation or authentication, zero trust network, etc., the service grid injector (Injector) will inject a dedicated service execution server into the application Pod. A grid-capable container is in the same Pod as the application and shares the network namespace. This container is the sidecar of the service grid.

Since Sidecar shares the network namespace with the application container, application traffic can be easily intercepted to the grid proxy process located in the container through iptables rules. This is the classic Sidecar architecture.

ASM decouples the control plane from the data plane and deploys it in a managed manner. Compared with Istio, ASM has some significant advantages. First of all, ASM has complete life cycle management capabilities. In the practice of service grid operation and maintenance, due to the complex configuration of the service grid and the rapid iteration of the community, there are often some challenges in installing and upgrading the service grid. With the help of ASM's life cycle management capabilities , users can create, delete, and upgrade service mesh instances with one click without having to consider configuration issues or compatibility and adaptation issues.

Secondly, ASM provides the ability to shield misconfigurations and diagnoses. We have observed that when some customers use service grids, it is a common problem that configuration errors lead to results that do not meet expectations. ASM converts these problems into check items and diagnoses. Items can intercept and alert incorrect configurations as early as possible to help service grid operation and maintenance personnel find and solve problems in a timely manner. As the network infrastructure for cloud-native applications, service grid can help applications connect with observable platforms. To this end, ASM provides the ability to quickly connect to multiple cloud services with one click, helping users quickly connect to the cloud-native ecosystem.

Finally, ASM provides enterprise-level multi-cluster mode support. Through ASM, users can quickly implement multi-data plane cluster grids.

ASM currently makes some control plane components serverless. Serverless components can realize automatic elastic expansion and be used on demand. Based on the capabilities of the Serverless base, higher scheduling efficiency and startup speed optimization are achieved, and through image caching, the readiness time and startup time are significantly reduced.

03 New data plane model

Sidecar pattern is very intuitive and effective, but it still has some disadvantages:

1) Sidecar injection is intrusive to the workload. Injecting or canceling the injection requires restarting the workload. Adjusting Sidecar configuration (such as resources) may also require restarting the workload. Sidecar and workload are strongly bound.

2) Resource utilization is not ideal. In order to cope with the worst case scenario, each Sidecar needs to reserve a portion of resources. The larger the cluster size, the more idle resources.

3) The computational cost of layer 7 processing such as traffic capture and protocol identification is high, but not all requests are of HTTP protocol or need to be processed by Sidecar.

For the above reasons, we need a less intrusive and lower-cost method to make the service mesh suitable for more scenarios. Therefore, in September 2022, the Istio community launched the Sidecarless Ambient mode, which splits the functions of Sidecar into Layer 4 and Layer 7 proxies, and the deployment of Layer 4 and Layer 7 proxies is separated from the workload, which is very good This makes up for the shortcomings of Sidecar mode in some scenarios. ASM is the industry's first managed service grid that supports Ambient mode.

In Ambient mode, the 4-layer proxy mainly focuses on observability, routing and communication encryption of the transport layer, while the 7-layer proxy performs more complex behavior processing in the traffic management, security and observability dimensions based on the 7-layer protocol. Users can progressively choose whether to enable proxies for applications and which layer of proxies to enable based on actual business needs.

We call the L4 proxy on the data plane Ztunnel, which is an L4 processing layer. The L4 processing layer will be responsible for forwarding all 4-layer communications of the application. With the ability of ztunnel, the application can obtain zero-trust security immediately after joining the grid. Capabilities, including MTLS, authentication and authorization policies. After ztunnel is deployed in DaemonSet mode, use CNI to configure traffic interception rules on the node to intercept the traffic of Pods in the grid to the ztunnel instance. ztunnel encrypts the traffic through MTLS before transmitting it. The peer ztunnel decrypts the traffic and then forwards it to the application. With the help of the above path, ztunnel can also collect TCP monitoring indicators, access logs, etc.

We call the 7-layer proxy the Waypoint proxy. The Waypoint proxy, like the Sidecar in the classic architecture, is an Envoy-based proxy and is used to implement more advanced capabilities based on the 7-layer protocol in the Ambient Mesh mode. For example, it can apply advanced policies for service meshes based on request headers and credentials, such as circuit breaking, traffic shaping, traffic segmentation, retries, fault injection, role-based access control authorization policies, and more. Compared with the Ztunnel agent, which is deployed at the node level, the Waypoint agent is deployed at the service level. Users can enable or disable the 7-layer agent for a certain service, or scale the deployment scale arbitrarily, deploy on demand, and increase the resources in the cluster. utilization rate.

After understanding the specific capabilities of L4 and L7 agents, let's take a look at the network topology in the Ambient mode where L4 and L7 are decoupled:

Next, let's take a look at the traffic path in Ambient mode, starting with the L4 proxy:

1. When the application Pod in Ambient mode starts, the CNI plug-in will write its IP address into the ipset under the node network namespace.

2. When a request is initiated, the traffic data packet reaches the node network namespace through the veth pair interface of the Pod. The data packet from the address in the ipset will be captured and processed by the iptables rules on the node.

3. The Iptables rule marks the packet as 0x100.

4. The policy routing rules on the node specify that any packets marked 0x100 are directed to the destination 192.168.127.2 through the istio outgoing network interface.

5. The transparent proxy iptables rule in the ztunnel proxy Pod sends the packets from pistioout to the ztunnel outbound port 15001.

6. ztunnel processes the packet and forwards it to the IP address of the target service (httpbin). This address is the veth device address of httpbin on Node B, so the packet is routed to Node B.

7. After the packet reaches Node B, the rules for inbound traffic ensure that the packet is routed to the istioin interface.

8. The data packet enters the ztunnel pod through the tunnel composed of istioin and pistioin.

9. The iptables rules in the Ztunnel Pod capture packets from pistioin and direct them to port 15008 based on the tags.

10. The ztunnel pod processes the packet and sends it to the target pod.

When the Layer 4 proxy forwards Outbound data, if the target application has enabled the Layer 7 proxy, the Layer 4 proxy will forward the traffic to the Layer 7 proxy of the target application through the HBONE tunnel, and the traffic will enter Layer 7 through the connect terminate Listener listening on port 15008. Agent, this Listener is processed through a specific Filter. After unpacking the HBOONE traffic and completing the identity authentication, the traffic is sent to the main internal internal listener for subsequent processing. On the main listener, the traffic is matched based on the Service IP+Port, and the Layer 7 traffic policy is executed to determine the target Cluster. Finally, the traffic enters an internal Listener named connect_originate, which continues forwarding the traffic to the upstream destination using the HBONE tunnel.

Readers who are interested in the traffic path of Ambient Mesh can also refer to another article of the author , in which a more in-depth and detailed analysis of the traffic path in Ambient mode is given.

04 A new form of integration of Sidecarless and Sidecar modes

In the new architecture, Sidecar mode does not conflict with Sidecarless mode. Users can mix the two modes for mixed deployment. With this feature, it is possible to gradually complete the switch according to needs. ASM's managed Ambient mode can reduce resource overhead by up to 60%, reduce operation and maintenance work by 50%, and reduce request latency by 40% in certain scenarios.

ASM provides a configuration management platform for the data plane through a unified control plane API, and delivers different configurations on demand to sidecars in sidecar mode in the converged form, layer 4 agents, layer 7 agents in sidecarless mode, and even managed agents. .

Since the seven-layer proxy carries richer service grid capabilities, in production practice, the seven-layer proxy is more likely to need to expand and contract simultaneously with the expansion and contraction of business applications. Based on the flexibility of layer 7 agent deployment under the new architecture, ASM provides managed layer 7 agents, deploying layer 7 agents in a serverless form, shielding the operation and maintenance complexity of layer 7 agents, and users do not need to plan capacity in advance for layer 7 agents. , and there is no need to undertake any operation and maintenance work for the Layer 7 agent. According to business needs, the Layer 7 agent can be deployed, recycled, and expanded and reduced with one click at any time.

Let's take a look at the technical architecture of hosting the 7-layer proxy. Users can declare the 7-layer proxy through the K8s standard Gateway API. The Waypoint Proxy Controller located on the ASM hosting side will Watch the Gateway API. When the Gateway CR is created or changed, the Waypoint Proxy The Controller performs life cycle management of the Waypoint Proxy workload according to the specification of the Gateway API. Users can specify through the Gateway API to deploy the seven-layer proxy in the Waypoint Proxy pool hosted by ASM, or on the ECI Serverless node in the user cluster.

05 Customer Cases

Lixun Logistics is a service provider under Belle that focuses on the fashion industry and provides enterprises with professional logistics and supply chain solutions. Lixun Logistics has 70+ omni-channel physical cloud warehouses and 6 central e-commerce warehouses across the country, with a total area of ​​1 million+ square meters. Its services cover 300+ cities and 3000+ business districts, and provide services to many well-known fashion brands and their brands. The store provides omni-channel delivery services.

Currently, Luxeon Logistics has switched 100% of its core production system to ASM. ASM has helped Luxeon Logistics shorten the implementation cycle by at least 50% during the grid implementation process, with a managed and operation-free architecture and rich productization capabilities. , and also helped the Luxeon Logistics operation and maintenance team to increase the operation and maintenance efficiency by at least 40% in aspects such as network traffic management and security configuration management. After Luxi Logistics switched to ASM, operation and maintenance personnel completed all configurations related to traffic rules, security, and observability through various APIs provided by ASM. Using ASM gateway to access the custom authentication service enhances the security of the entrance. With the help of ASM's external service registration capability, it opens up the communication between services inside and outside the grid, and efficiently accesses the unified observability platform through the productization capabilities provided by ASM. , obtaining a complete set of observable solutions from observable data generation, to collection, query, and search; to dashboard visual display, instant insights into grid topology, and grid service health assessment.

Finally, let us first take an overview of the enterprise-level service grid capabilities provided by Alibaba Cloud Service Grid. From the service grid capability level, it includes: unified management of heterogeneous services, network management of multi-cluster and cross-cluster applications, grayscale release and deployment of applications integrated with CI/CD tools, smooth evolution of the application cloud architecture, Kserve-based AI elastic service management, etc. In terms of integration and compatibility, ASM supports the Web user interface and provides a complete OpenAPI, providing powerful and flexible integration capabilities. At the same time, ASM is fully compatible with Istio usage specifications and supports services through the standard K8s API. Make configuration changes to the configuration of the grid instance.

The core components of the ASM control plane adopt a fully managed design. The standard version and the enterprise version use a unified flexible architecture, providing complete multi-version support. It also provides a number of features including traffic management enhancement, protocol enhancement, adaptive XDS optimization, software and hardware integration Powerful customization capabilities for optimization, grid diagnostic analysis, expansion center, heterogeneous service registration integration, etc.

ASM is a network platform for cloud-native applications that provides unified grid governance capabilities for application services running on heterogeneous computing infrastructure. Based on ASM's powerful heterogeneous computing facility support capabilities, ASM helps users integrate workloads running on K8s nodes, Serverless workloads running on ECI nodes, managed Serverless components, and even workloads in other public clouds or IDCs, heterogeneously The facilities are connected and unified management and operation and maintenance are carried out. For more ASM capabilities, please go to the ASM homepage or learn more through the ASM official website documents.

ASM homepage:
https://www.aliyun.com/product/servicemesh?spm=5176.28508143.J_4VYgf18xNlTAyFFbOuOQe.183.e939154a6Av1f8

Author: Shi Zehuan, Yin Hang

Original link

This article is original content from Alibaba Cloud and may not be reproduced without permission.

OpenAI opens ChatGPT Voice Vite 5 for free to all users. It is officially released . Operator's magic operation: disconnecting the network in the background, deactivating broadband accounts, forcing users to change optical modems. Microsoft open source Terminal Chat programmers tampered with ETC balances and embezzled more than 2.6 million yuan a year. Used by the father of Redis Pure C language code implements the Telegram Bot framework. If you are an open source project maintainer, how far can you endure this kind of reply? Microsoft Copilot Web AI will be officially launched on December 1, supporting Chinese OpenAI. Former CEO and President Sam Altman & Greg Brockman joined Microsoft. Broadcom announced the successful acquisition of VMware.
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/yunqi/blog/10150797