2024 Postgres Conference Conference丨One article explains in detail the essence of Tuoshupai’s informative speech

From April 17th to 19th, Postgres Conference 2024, one of the world's largest PostgreSQL conferences, was grandly held in San Jose, USA. This conference included four tracks: Ops, Dev, Essentials and Google Cloud. The topics centered on the PostgreSQL kernel and database. Topics such as management and application, user examples and experience were launched, and senior lecturers from Google, AWS, EDB, Yugabyte, DBeaver and other companies were invited to attend the conference. With its strong influence in the international technology community, Tuoshupai was invited to participate in this event as a conference sponsor and deliver a technical speech.

At the conference, PieCloudDB technical expert Richard Guo, as a new PostgreSQL Contributor, was invited to deliver a technical speech "A high-level introduction to the query planner in PostgreSQL". Combining the experience of building the PieCloudDB Database optimizer, he explained the PostgreSQL optimizer from a developer's perspective. How it works, and the process of converting a query tree into a plan tree is introduced in detail. Richard's speech received positive feedback from the audience and led to in-depth interaction and communication.

In a database management system (DBMS), SQL query processing is a complex and critical process. For PostgreSQL, a SQL statement needs to go through the following five main steps from receipt to execution:

  • Parser: responsible for checking syntax errors and generating a parser tree;
  • Analysis (Analyzer): Perform semantic analysis based on the parsing tree and generate a query tree;
  • Rewriter: Rewrite the query tree according to the rules that exist in the system;
  • Planning/optimization (Planner): Generate a plan tree with the highest execution efficiency based on the query tree;
  • Executor: Access tables and indexes in the order in the plan tree and execute the corresponding query.

The same query statement can generally be executed in multiple ways. As an important component of the database, the query optimizer's role is to find the query plan with the lowest cost from every possible execution mode and convert it into Executable plan tree.

The following will focus on the planning/optimization phase in PostgreSQL query processing, which is also the most important and complex phase in the entire process. The process is generally divided into four stages: preprocessing stage, scanning/connection optimization stage, optimization stage outside scanning/connection, and post-processing stage.

1. Preprocessing stage

In the early stage of the preprocessing phase, queries are generally simplified as much as possible by simplifying constant expressions (functions, Boolean, CASE, etc.) and inlining simple SQL functions. At the same time, the join tree will be simplified by converting IN, EXISTS and other types of subqueries into semi-joins, promoting subqueries, and eliminating outer joins (converting them to inner joins or anti-joins).

In addition to these methods, a variety of optimization methods are used later in the preprocessing stage, including:

  • Distribute WHERE and JOIN/ON constraints
  • Build equivalence classes
  • Gather information about join order restrictions
  • Eliminate useless connections
  • ...

2. Scanning/connection optimization phase

The scan/connection optimization phase mainly processes the FROM and WHERE parts of the query statement, and also considers the ORDER BY information. This part is all driven by cost.

This stage first determines the scan path for the base table, estimates the cost of the scan path, and then uses dynamic programming and genetic algorithms to search the entire connection sequence space and generate the connection path. When searching the connection order space, it is also necessary to consider the connection order restrictions caused by outer joins.

In dynamic programming, connection search will proceed as follows:

  • First generate a scan path for each base table
  • Generate join paths for all possible joins of two tables
  • Generate join paths for all possible joins of the three tables
  • Generate join paths for all possible joins of four tables
  • ...
  • Until all base tables are connected together

However, the cost of this process is very high. In theory, there are n! different connection orders for the connection of n tables. It is unrealistic to traverse all possible connection orders. Therefore, some heuristic methods are usually used to reduce the search space. For tables that do not have join conditions, try not to join them; decompose a large problem into multiple sub-problems to reduce complexity.

3. Optimization phase beyond scanning/connection

At this stage, the optimizer will prioritize GROUP BY, aggregation, window functions and DISTINCT, then process collection (UNION/INTERSECT/EXCEPT) operations, and finally process ORDER BY. Each of the above steps will generate one or more paths. The optimizer will filter these paths based on cost and add LockRows, Limit and ModifyTable nodes to the filtered paths.

4. Post-processing stage

At this stage, the optimizer needs to convert the least-cost path into a plan tree and adjust some details in the plan tree:

  • Flatten the range table of a subquery
  • Change the variables in the upper plan node to the form of OUTER_VAR or INNER_VAR to point to the output of the subplan
  • Delete unnecessary SubqueryScan, Append, MergeAppend and other nodes

After completing this step, the optimizer will obtain the complete plan tree, and can hand the plan tree to the executor for execution, and finally obtain the query results.

As a high-tech innovation enterprise based in China, Tuoshupai has been deeply involved in the international open source technology and ecosystem through code contributions, lectures, conference sponsorship and participation, ecological cooperation and other forms in recent years. In the future, Tuoshupai will continue to broaden its international horizons, actively integrate into the wave of global technological innovation, expand its international influence , and build an international technology-driven enterprise.

I decided to give up on open source Hongmeng. Wang Chenglu, the father of open source Hongmeng: Open source Hongmeng is the only architectural innovation industrial software event in the field of basic software in China - OGG 1.0 is released, Huawei contributes all source code Google Reader is killed by the "code shit mountain" Fedora Linux 40 is officially released Former Microsoft developer: Windows 11 performance is "ridiculously bad" Ma Huateng and Zhou Hongyi shake hands to "eliminate grudges" Well-known game companies have issued new regulations: employee wedding gifts must not exceed 100,000 yuan Ubuntu 24.04 LTS officially released Pinduoduo was sentenced for unfair competition Compensation of 5 million yuan
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5944765/blog/11059181