Insight into GaussDB fine-grained resource management and control technology

The open source China community team made its first live broadcast, telling the story behind the open source China community in the name of sharing."

This article is shared from Huawei Cloud Community " [GaussTech Express] Technical Interpretation of Fine-grained Resource Management and Control ", author: GaussDB database.

background

Resource control and resource isolation within database clusters have been long-standing demands of enterprise customers. As an enterprise-level distributed database, Huawei Cloud GaussDB has been committed to meeting the management needs of enterprises for large database clusters.

The resources that the database can manage include computing resources and storage resources. Computing resources include CPU, memory, IO, and network. Storage resources include data storage space, log storage space, and temporary files.

From a user's perspective, resource management and control ensures commitment to service level agreements by setting thresholds or priority limits on the use of resources, while also meeting resource isolation between different users and achieving the purpose of sharing database resources among multiple tenants.

From a system perspective, the introduction of resource monitoring and control methods can achieve the purpose of rational utilization of resources under controllable conditions, avoid resource exhaustion, and prevent the system from stopping response and crashing. Job priority can ensure the smooth operation of jobs, prevent a job from affecting other jobs when its resource usage is too high, and maximize resource utilization when resources are abundant. In addition, it can also meet external expectations and ensure maximum use of system resources. By controlling the job, we can ensure that the job is stable and avoid uncontrollable behavior during job execution.

In order to solve the above goals, Huawei Cloud GaussDB database provides a solution for fine-grained management and control of resources in the database cluster - fine-grained resource management and control. This solution provides corresponding management and control capabilities at different control granularities (such as user level, session level, and statement level) and different control dimensions (CPU, memory, and IO). Users can adopt appropriate control dimensions and control granularity according to their own business needs to achieve the goals of resource control and resource isolation, and meet the needs of resource control in different scenarios.

Technology Architecture

Let’s first look at the technical architecture and operating principles of fine-grained resource management and control:

Slide 1.PNG

As you can see from the picture above, GaussDB provides a resource pool module to complete the management and control logic of CPU, memory and IO. Users can create a resource pool and specify the CPU, memory and IO shares it can use, and bind the resource pool to the user. Afterwards, the job initiated by the user will be subject to real-time resource management and control during the operation of the database kernel optimization and parsing, execution engine, and storage engine modules to ensure that its CPU, memory, and IO are within the scope of the corresponding resource pool.

Assume that Company A deploys a GaussDB instance, and three different applications access the instance at the same time, such as OLTP business, report business, and other low-priority businesses. Company A hopes to reasonably manage and control the resources of the three businesses so that the system can run smoothly while ensuring maximum resource use. We can use the system administrator to execute the following command to set the resource share ratio for the three business users to 50:30:10, and the remaining 10% is reserved for the system.

Here are only simple usage examples. The specific meaning of each parameter will be explained in detail in the following chapters.

create resource pool respool_tp with(control_group="cgroup_tp", max_dynamic_memory="5GB", max_shared_memory="5GB", io_limits=50, io_priority="High");
alter role tp_user RESOURCE POOL 'respool_tp';

create resource pool respool_report with(control_group="cgroup_report", max_dynamic_memory="3GB", max_shared_memory="3GB", io_limits=30, io_priority="Medium");
alter role report_user RESOURCE POOL 'respool_report';

create resource pool respool_other with(control_group="cgroup_other", max_dynamic_memory="1GB", max_shared_memory="1GB", io_limits=10, io_priority="Low");
alter role other_user RESOURCE POOL 'respool_other';

After the above operations, when OLTP business, report business and other low-priority businesses use tp_user, report_user and other_user to connect to GaussDB to execute jobs, these three businesses will be controlled by the corresponding resource pools respool_tp, respool_report and respool_other. In the resource When contention occurs, it is guaranteed that the three businesses can use 50%, 30% and 10% of the resources of the GaussDB cluster respectively.

key capabilities

After understanding the overall architecture and usage of fine-grained resource management and control, let's take a look at its key capabilities and what business value these capabilities can bring to customers.

CPU control

GaussDB's CPU management and control is based on resource pool granularity for user resource management and control. Each resource pool is bound to a control group, and CPU management and control is implemented through the control group (Control Group, CGroup). CGroup is a mechanism provided by the Linux kernel to limit, record, and isolate the physical resources (such as CPU, Memory, IO, etc.) used by process groups.

Taking into account the isolation and configurability of database systems, users, and jobs in different dimensions, GaussDB uses the hierarchical characteristics of control groups to construct a model that conforms to the database scenario (see the figure below), which meets the key characteristics of customer SLA and supports three dimensions. Hierarchical isolation and control: isolation between database programs and non-database programs, isolation between database resident backup threads and execution job threads, and isolation between multiple database users.

Slide 2.PNG

The GaussDB control group can set the CPU percentage and the upper limit of the number of cores. The root node is responsible for controlling the CPU share available for the GaussDB process; the Backend control group is responsible for controlling the CPU share of the database's resident background threads (Vacuum, DefaultBackend); the Class control group is responsible for controlling The CPU share of the user's job thread (UserClass1, UserClass2,...UserClassN); Workload control group (TopWD, RemainWD...) can also be created within the Class control group for more fine-grained control.

Continuing the above example, we use the CGroup tool provided by GaussDB to create control groups for company A's OLTP business, reporting business and other low-priority businesses, with the CPU allocation ratios of 50%, 30% and 10%.

gs_cgroup -c -S cgroup_tp -s 50;
gs_cgroup -c -S cgroup_report -s 30;
gs_cgroup -c -S cgroup_other -s 10;

Executing the above command means that we have successfully created three control groups. We can then specify the name of the control group when creating a resource pool. Jobs initiated by users bound to the resource pool will be controlled by the corresponding CPU share of the control group.

There are two issues you need to pay attention to when controlling CPU with CGroup:

First, if the CPU of a thread needs to be controlled by a CGroup, then the system API of the CGroup needs to be executed to bind the corresponding CGroup to the thread, which is a time-consuming operation;

Second, CGroup's CPU control effect is best when the number of threads is proportional to the CPU.

Based on these problems, GaussDB proposed the concept of thread group. Each resource pool corresponds to a thread group, and the threads in the thread group are bound to the CGroup corresponding to the resource pool. At the same time, GaussDB will adjust the number of threads in each thread group to be consistent with the CPU share of the corresponding CGroup. See the figure below for details:

Each user-initiated job will be distributed to the threads in the corresponding thread group for execution. Since the thread has been bound to the corresponding Cgroup node, the operating system will complete CPU management and control during thread scheduling.

GaussDB provides a two-tier user mechanism. The resource pool bound to the Class control group is called a group resource pool, and the corresponding users are group users. The resource pool bound to the Workload control group is called a business resource pool, and the corresponding users are business users. . Group users generally correspond to a department, while business users correspond to different businesses of this department. Will the resource share of each resource dimension in the business resource pool exceed the share of the resource pool of the group to which it belongs, thereby achieving the goal of two-level resource management and control.

CPU control also provides a GUC named session_respool to limit the CPU of a single session to not exceed the CPU limit of the corresponding resource pool.

Memory management

GaussDB provides management and control of dynamic memory and shared cache. When creating a resource pool, you can specify max_dynamic_memory and max_shared_memory to complete the threshold settings of dynamic memory and shared cache respectively.

Dynamic memory management does not change its original memory resource allocation mechanism. It only adds a logical judgment layer before allocating memory to account for the excessively allocated memory, and checks whether the accounting value reaches the upper limit of the allowed memory. Complete memory management and control. When the dynamic memory exceeds the upper limit, the job will fail to apply for memory. When a job exits, the memory requested by the job will be released to ensure that other jobs can execute normally. In the same way, when the shared cache used by a job exceeds the upper limit of the resource pool, and you apply for the shared cache again, you need to first release the shared cache that you have already occupied, such as BufferPool. When the job applies for a page, the page it has already occupied will be eliminated. The remaining pages after elimination are available for your continued use.

In addition to user-granular memory control, GaussDB also provides two GUC parameters, session_max_dynamic_memory and query_max_mem, to complete session-level and statement-level dynamic memory control. When the dynamic memory used by a session or statement reaches the GUC threshold, the job fails to apply for memory. .

IO control

GaussDB's disk read and write IO are all completed by a background thread. This thread cannot distinguish the owner of the page. It only writes to the disk in chronological order and cannot control different IO usage for different users. Based on this, consider that the IO management and control function uses logical IO statistics to control and limit the read and write IO of users or sessions. A logical IO count is added between the worker thread and the shared cache. For row storage tables, every 6000 (can be passed io_control_unit GUC (modified by GUC) is counted as one IO. When the number of read and write IO requests generated in one second exceeds the threshold set by the resource pool, the IO request will be added to a waiting queue of the background thread, and the background thread will respond to the waiting queue. These IO requests are monitored, and when their waiting time meets the conditions, these IO requests are awakened from the waiting queue.

GaussDB supports two modes of IO resource management and control. The online numerical mode controls IO resources by setting a fixed value for triggering IO times; the priority mode means that when the current disk usage reaches more than 95% for a long time, all jobs cannot be When reaching the online value mode, users can control IO through this mode to control the priority ratio of the job that originally triggered IO. The priority includes three levels: High, Medium and Low.

Continuing from the above example, we create resource pools for company A's OLTP business, reporting business, and other low-priority businesses, and set the IO weights to High, Medium, and Low respectively. Then, the OLTP business can use 50% of the IO requests to read or write data to the BufferPool, and a small number of IO requests will enter the waiting queue to wait; the reporting business can use 20% of the IO requests to read or write data to the BufferPool. More IO requests will enter the waiting queue to wait; other low-priority businesses can use 10% of the IO requests to read or write data to the BufferPool, and more IO requests will enter the waiting queue to wait; the background monitoring thread will periodically Traverse the IO waiting queue, and wake up IO requests that meet the required waiting time to read or write data from the BufferPool.

In addition to supporting user-granular IO control, GaussDB also supports setting the session-level GUC parameters io_limits and io_priority to complete the IO control of jobs allowed on a specified session.

Number of connections and concurrency management

GaussDB provides connection number control and concurrency control based on resource pools. When creating a resource pool, you can specify max_connections and max_concurrency to complete the settings of the number of connections and concurrency respectively. You can use the following SQL to provide the resources corresponding to the three businesses of Company A in the previous example. The pool completes the management and control of the number of connections and the number of concurrencies:

alter resource pool respool_tp with(max_connections=-1, max_concurrency = -1);
alter resource pool respool_report with(max_connections=200, max_concurrency = 100);
alter resource pool respool_other with(max_connections=100, max_concurrency = 50);

After the above SQL is successfully executed, it will take effect in real time. The number of connections and concurrency of company A's OLTP business is not limited. It can use it as long as the cluster has resources; the maximum number of connections for the reporting business is 200, and the maximum number of connections for other low-priority businesses is 100. When these two businesses When the number of established connections exceeds this value, the GaussDB kernel will automatically intercept and report that the current number of connections is insufficient and the link fails; the maximum number of concurrencies for the report business is 100, and the maximum number of concurrencies for other low-priority businesses is 50. When these two businesses When the number of jobs initiated at the same time exceeds this value, the excess jobs will enter the waiting queue. GaussDB will not wake it up to continue executing the job until the existing jobs are completed.

Storage space management and control

Storage space management and control is used to limit the space quotas that different users can use to prevent the entire database business from being blocked due to excessive use of storage space by a single user. GaussDB manages and controls storage resources by specifying the size of the storage space when creating a user.

Storage space resources are divided into three types: permanent table space (Perm Space), temporary table space (Temp Space) and operator bottom disk space (Spill Space).

You can use the following SQL to complete disk space management and control for the users corresponding to the three businesses of company A in the previous example.

alter user tp_user PERM SPACE '200G' TEMP SPACE '20G' SPILL SPACE '20G';
alter user report_user PERM SPACE '100G' TEMP SPACE '10G' SPILL SPACE '10G';
alter user other_user PERM SPACE '100G' TEMP SPACE '10G' SPILL SPACE '10G';

Storage space management supports storage space management for group users and business users. When the group user corresponding to the business user has space restrictions, the space of the business user is also limited by the space limit of the user group. After specifying the size of the storage space, all write operations by the user on the DN will increase the user's used space, and deletion operations will decrease the user's used space. CN will periodically obtain the total used space from the DN, and calculate the user's used space. The space is judged. When the maximum value is exceeded, the writing job is canceled (insert/create table as/copy), and subsequent writing jobs report an error and exit.

Feature demonstration

Feature Demonstration Here we will simply demonstrate the CPU control effect for everyone, because the CPU has the greatest impact on the business.

Create two resource pools and set 20% and 60% of the CPU respectively, and then use two users bound to the resource pools to start running the business. Observe actual CPU usage.

1. Create a control group:

gs_cgroup -c -S class1 -s 20;
gs_cgroup -c -S class2 -s 60;

2. Create a resource pool:

CREATE RESOURCE POOL xuuer_pool with(control_group = "class1");
CREATE RESOURCE POOL xyuser1_pool with(control_group = "class2");

3. Create a user-bound resource pool:

create role user1 RESOURCE POOL 'xuuer_pool';
create role user2 RESOURCE POOL 'xyuser1_pool';

4. Observe the system CPU status through Top, and fine-grained resource management and control provides the gs_wlm_respool_cpu_info function to observe the real-time CPU status of each resource pool.

Slide 5.PNG

As shown in the figure above, it can be seen that the initial system CPU is idle, and the CPU monitoring view of the resource pool also shows that the CPU usage is 0. Let user1 start running services. It can be seen that the system CPU is occupied by certain services. Querying the CPU monitoring view of the resource pool shows that user1 can use 80% of the CPU. At this time, let user2 also start running business. Observation shows that the system CPU has entered a busy state. Querying the system function of CPU resource monitoring of the resource pool shows that user1's CPU usage begins to decrease and user2's CPU usage begins to increase.

Sort out the CPU usage of the two users and draw the curve chart as shown below. It can be seen that the CPU usage of user1 and user2 will eventually balance to a state of 3:1, which is consistent with the 20% and 60 settings in the CGroup control group corresponding to the resource pool. % ratio, achieving the effect of CPU control.

Slide 6.PNG

Summarize

The fine-grained resource management and control feature currently supports centralized and distributed. Distributed computing resource management and control means that each node independently controls the resources of its own node, while storage resource management and control is managed as a whole from the cluster dimension.

Fine-grained resource management and control serves as the basis for multi-tenant resource isolation, enabling precise division and control of resources, and solving the problem of cluster unserviceability caused by insufficient resources in high-load scenarios. This feature is suitable for scenarios where data isolation is not sensitive, but resource isolation is required for different businesses. If customers have requirements for both resource isolation and data isolation, you can pay attention to the multi-tenant database feature we will share later!

Click to follow and learn about Huawei Cloud’s new technologies as soon as possible~