Compared with version 2.2, AntDB 3.1 has a great improvement in performance
1. Support parallel computing (inherit the new parallel function of Postgres9.6);
2. Optimize the execution plan, put the calculation on the datanode as much as possible, and then summarize it on the coordinator, instead of pulling the data up to the coordinator for calculation as in 2.2;
3. Support data reduction between datanodes and between datanodes and coordinators. When the distribution of data to be queried is unbalanced, the data is reduced to one node for calculation to maximize query efficiency.
The following are important parameters related to improving performance. The following will illustrate the use of parameters.
enable_cluster_plan
Compared with 2.2, 3.1 has done a lot of optimization in performance. Turning on this switch will take the execution plan of 3.1, which will greatly improve the performance.
The same query statement below has different execution plans when the switch is turned on and off. After the switch is turned on, the query time is greatly shortened.
pgxc_enable_remote_query = off
Turn on:
Off switch:
max_parallel_workers_per_gather
This parameter determines how many work processes are allowed to be enabled per gatherer at most
max_worker_processes
This parameter determines how many work processes each node is allowed to start at the same time.
These two parameters together determine how many workers to start on the datanode and coordinator, and the number of workers determines
In addition, the execution plan is generated on the coordinator. It is recommended to set the same parameters on the coordinator and datanode. Otherwise, the final value on the datanode is the smaller of the two.
The following examples illustrate the impact of these two parameter configurations on the execution plan and efficiency.
postgres=# create table aa(a1 int, a2 int);
CREATE TABLE
postgres=# copy aa from '/home/mass/data/big_ranint_int_10million.sql' with delimiter as ',';
COPY 10000000
postgres=# analyze aa;
ANALYZE
postgres=# explain(verbose, analyze) select count(*) from aa;
Let's start by setting parameters to control the number of workers to test the relationship between query efficiency and the number of workers and datanodes.
max_parallel_workers_per_gather =2
max_worker_processes = 3
Finally start 2 workers
2 datanodes:
3 datanodes:
4 datanodes:
max_parallel_workers_per_gather =4
max_worker_processes = 3
Finally start 3 workers
2 datanodes:
3 datanodes:
4 datanodes:
max_parallel_workers_per_gather =4
max_worker_processes =4
Finally start 4 workers
2 datanodes:
3 datanodes:
4 datanodes:
As can be seen from the figure below, as the number of datanodes increases, the number of workers increases, and the query time decreases.
refer to:
QQ exchange group: 496464280
Source address: http://github.com/ADBSQL
Postgresql enthusiasts are welcome to use and communicate.