[Cloud Native~Cost Reduction and Efficiency Increase] The era is coming

1. The origin of the activity:

One day, by email, I found out that CSDN and Tencent are jointly holding an event where Tencent Cloud is natively speaking. After a closer look, isn’t this a summary of the live broadcast that has been held before? I have also watched several episodes of the previous live broadcast, including "The Practice of Homework Helps Cloud Native to Reduce Costs and Increase Efficiency" and "Should I Spend or Save Money on the Cloud on Game Platforms", which are pretty good to be honest.

 So, I participated in this event. After signing up for the form, an e-book will be downloaded automatically. When I open it, the pdf with more than 150 pages is indeed an e-book. . . Fortunately, I read the whole book in my spare time. Although the number of pages in the e-book is relatively large, but after a closer look, I found that the layout of the e-book is also quite careful. The whole book revolves around the core theme---reduce cost and increase effect. From the initial introduction of the current situation of enterprise cloud native, to throw out why cost reduction and efficiency increase are the greatest value of enterprise cloud native, and then analyze and optimize technical resources, improve utilization, etc., analyze one by one, and finally through the real enterprise. Using cloud cases to highlight cost reduction and efficiency increase is an inevitable way, completing an overall closed loop.

2. Introduction to e-books

1. The beginning of the e-book introduces the five major current situations and three major trends facing the current enterprise cloud native.

The five major status quo of cloud native:

1. The cloud-native industry is developing rapidly, and the proportion of investment in cloud-native construction is obvious

2. The adoption rate of cloud-native technologies in enterprise production environments continues to rise

3. The penetration of cloud native in various industries is accelerating, and the focus of the industry is slightly different

4. Cost reduction and efficiency increase are the greatest value of enterprise cloud-native applications.
5. The deepening of enterprise cloud use leads to waste of expenditure, and cloud-native cost governance has attracted attention

  Three major trends in cloud native:

1. Cloud native has become an inclusive technology, with a hundred flowers blooming and integrated development

2. The transformation of cloud native to the software industry will continue to deepen

3. Cloud native originates from open source, thrives on open source, and pushes open source to a new height

It also proposes how to optimize costs through cloud-native technology, and analyzes that in the cloud-native era, cost management is facing many challenges, such as decentralization, serious waste, business stability, and so on. Through an example of how Tencent itself is facing many problems, how to achieve the best solution, we will achieve an opening chapter of reducing costs and increasing efficiency.

  2. Through the explanation and analysis of a series of technologies. To achieve better cost reduction and efficiency increase.

Analysis and optimization of resources on the Kubernetes cloud: by analyzing three scenarios: waste of resources caused by resource reservation, waste of resources caused by resource shortage, and waste of resources caused by a large number of resources being unusable, to schedule Crane, an open source project of Tencent . Further analysis of the resource analysis and optimization of Crane's Kubernetes has achieved a significant improvement in the effect.

Kubernetes cluster utilization improvement practice: low cluster utilization will lead to high costs. Unreasonable use of configurations such as clusters and applications may not be able to continuously improve the utilization of cluster resources, leading to a vicious circle. This chapter shares several ideas and implementation methods for improving the utilization of Kubernetes clusters, including two-level expansion mechanism, two-level dynamic oversold, dynamic scheduling, and dynamic eviction, which ensure the stable operation of nodes under high load conditions.

Interpretation of the cloud-native hybrid deployment standard: Through the reflection and practical case sharing of Tencent's product Caelus in the offline hybrid deployment, discuss the solution of the cloud-native hybrid deployment standard, what capability requirements, and what key technologies must be possessed.

Mixed ability requirements:

Business—the amount that has been applied for but not used. In the process of cost reduction, the usage of this part can be reduced. In the cloud-
               native field, refined resource management can be implemented based on containers.

System—the amount that has been allocated but not used. Traditionally, resources allocated based on virtual machines can only be used by the system. If it is not flexible enough, this part of the resources cannot be shared. Using container-based Request and Limit can effectively judge and control resource usage.

Application—the idle amount of the peak-valley effect. The application will generate a large amount of idle resources during the trough stage of resource use. At this time, some resources can be filled, and through horizontal and vertical expansion and contraction, flexible resource supply and service scheduling can be realized. .

Mixed key technologies:

Infrastructure: priority preemption, load sensing, interference identification, and QoS guarantee, etc.;

Platform mixed department: refined resource arrangement, intelligent resource overselling, service task perception and customized conflict handling, etc.;

Business applications: Spark, Flink, Hadoop, AI Jobs, etc.

Attached below is the offline hybrid diagram of the entire scene of Caelus.

Manage Kubernetes GPU resources through cloud native: introduces how to manage GPU resources using cloud native methods and improve GPU utilization through qGPU sharing, which reduces the difficulty of managing GPU resources at the cluster level and improves usage efficiency.

Three major functions of qGPU:

• Multi-container shared GPU;
• Computing power/memory strong isolation;
• Offline hybrid deployment.

 Features of qGPUs:

• Flexibility: finely configure the ratio of GPU computing power and memory size;
• Strong isolation: support strict isolation of video memory and computing power;
• Offline: support the industry's only offline mixing capability, and maximize GPU utilization;
• Coverage Degree: Supports coverage of mainstream cards T4, V100 and Ampere architecture A10, A100, etc.;
• Cloud native: supports standard Kubernetes and NVIDIA Docker;
• Compatibility: No reprogramming of services, no replacement of CUDA libraries, no sense of business;
• High performance: The bottom layer of the GPU device is virtualized, and the convergence is efficient, and the throughput is close to zero loss.

The qGPU technical framework diagram is attached below.

 Kubernetes fine-grained scheduling facilitates container resource allocation: it revolves around the topic of Kubernetes resource topology-aware scheduling. From the perspective of CPU architecture and noisy neighbors, it then expounds the insufficiency of native Kubernetes and the limitations of computing power perception in mixed deployment scenarios. Aspects of the corresponding solutions are given. After the strategy is optimized, the resources are utilized more reasonably.

 At this time, an Exclusive strategy needs to be adopted:

• Offline CVM obtains low-quality CPU time slices through the kernel VMF scheduler;
• Offline Pods monopolize CPU cores to ensure mutual non-interference;
• Kernel VMF schedulers ensure that offline Pods can achieve core drift when they are busy, fully Utilizes CPU resources.

At this time, a NUMA strategy needs to be adopted:

• Offline Pods obtain low-quality CPU time slices by limiting Cgroups;
• Offline Pods bind the entire NUMA node to prevent certain CPU cores from being suppressed;
• Offline Pods share the entire NUMA node to make full use of CPU resources.

 3. Typical examples: Through the analysis of three typical industry fields, 【Education industry: The practice of homework help cloud native cost reduction and efficiency increase, Game industry: Is it costing or saving money to use the cloud on the game platform, E-commerce industry: JD.com The Road to Cloud Native Large-Scale Practice] lists why each industry needs to reduce costs and increase efficiency, where are the pain points, difficulties, and key points, and analyze the status quo of each industry one by one. Technology research, through cost reduction and efficiency increase, the cost has been greatly reduced, perfectly embracing cloud native.

4. Eunomia cloud-native resource orchestration optimization: Unreasonable application resource usage settings, differences in resources of the same Pod, and severe fragmentation of multi-dimensional idle resources have caused waste of resources on the cloud. Faced with pain points and difficulties in resource usage . Eunomia can better orchestrate and optimize resources, and can greatly improve cost and stability.

3. Summary

Cloud native is an emerging software development method that utilizes cloud computing principles for development, deployment and management. It can improve development efficiency, reduce deployment costs, and better support development processes such as continuous integration and continuous delivery. Through the cloud-native approach, we can develop and deploy software more flexibly and efficiently, thereby improving our efficiency and competitiveness.

Cost reduction and efficiency increase are one of the benefits of cloud native, because it can reduce the IT infrastructure cost, software development cost, and operation and maintenance cost of enterprises through cloud native methods. Cloud native technology can package enterprise applications and microservices into containers, so that they can be deployed and updated faster, thereby reducing the time and cost of development, testing and deployment. At the same time, cloud-native automated deployment and elastic expansion capabilities can also allow enterprises to quickly adapt to market changes and respond to business growth, thereby improving productivity and competitiveness.

In conclusion, cloud native is a very promising technology that can help us better deal with current and future IT challenges and improve our efficiency and innovation capabilities.

Guess you like

Origin blog.csdn.net/m0_58954887/article/details/130061044