AntDB3.1 adds the use and effect of several performance parameters

Compared with version 2.2, AntDB 3.1 has a great improvement in performance

1. Support parallel computing (inherit the new parallel function of Postgres9.6);

2. Optimize the execution plan, put the calculation on the datanode as much as possible, and then summarize it on the coordinator, instead of pulling the data up to the coordinator for calculation as in 2.2;

3. Support data reduction between datanodes and between datanodes and coordinators. When the distribution of data to be queried is unbalanced, the data is reduced to one node for calculation to maximize query efficiency.

 

The following are important parameters related to improving performance. The following will illustrate the use of parameters.

enable_cluster_plan

Compared with 2.2, 3.1 has done a lot of optimization in performance. Turning on this switch will take the execution plan of 3.1, which will greatly improve the performance.

The same query statement below has different execution plans when the switch is turned on and off. After the switch is turned on, the query time is greatly shortened.

pgxc_enable_remote_query = off

Turn on:

Off switch:

max_parallel_workers_per_gather

This parameter determines how many work processes are allowed to be enabled per gatherer at most

max_worker_processes

This parameter determines how many work processes each node is allowed to start at the same time.

These two parameters together determine how many workers to start on the datanode and coordinator, and the number of workers determines

In addition, the execution plan is generated on the coordinator. It is recommended to set the same parameters on the coordinator and datanode. Otherwise, the final value on the datanode is the smaller of the two.

 

The following examples illustrate the impact of these two parameter configurations on the execution plan and efficiency.

 

postgres=# create table aa(a1 int, a2 int);

CREATE TABLE

postgres=# copy aa from '/home/mass/data/big_ranint_int_10million.sql' with delimiter as ',';

COPY 10000000

postgres=# analyze aa;

ANALYZE

postgres=# explain(verbose, analyze) select count(*) from aa;

Let's start by setting parameters to control the number of workers to test the relationship between query efficiency and the number of workers and datanodes.

max_parallel_workers_per_gather =2

max_worker_processes = 3

Finally start 2 workers

2 datanodes:

3 datanodes:

4 datanodes:

max_parallel_workers_per_gather =4

max_worker_processes = 3

Finally start 3 workers

 

2 datanodes:

3 datanodes:

4 datanodes:

max_parallel_workers_per_gather =4

max_worker_processes =4

Finally start 4 workers

2 datanodes:

3 datanodes:

4 datanodes:

As can be seen from the figure below, as the number of datanodes increases, the number of workers increases, and the query time decreases.

 

refer to:

QQ exchange group: 496464280

Source address: http://github.com/ADBSQL 

Postgresql enthusiasts are welcome to use and communicate.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324523718&siteId=291194637