Detailed explanation of the vectorized execution engine of data warehouse

[Live broadcast preview] Will large models replace programmers? "

This article is shared from Huawei Cloud Community " GaussDB (DWS) Vectorized Execution Engine Detailed Explanation ", author: yd_212508532.

Preface

Applicable version: [Baseline function]

Most traditional row execution engines adopt a one-tuple-at-a-time execution mode. In this way, most of the time during the execution process, the CPU is not used to process data, but more to traverse the execution tree, which will lead to low effective utilization of the CPU. In the face of a huge number of function calls in OLAP scenarios, huge overhead is required. In order to solve this problem, a vectorization engine was added to GaussDB (DWS). The vectorization engine uses a batch-at-a-time execution mode of tuples, which can greatly reduce the cost of traversing execution nodes. At the same time, the vectorization engine is also naturally connected to column storage, making it easier to load vectorized column data in the underlying scanning nodes. Column storage + vectorized execution engine is one of the golden keys that opens the door to OLAP performance!

About row storage and column storage tables

The row storage table stores tuples into Page pages by rows. It is mostly used in TP scenarios where the data is frequently updated, there are many additions, deletions and modifications, and the query results involve multiple columns of the table.

Row storage table storage method

Column storage tables are stored in columns, and each column's data is stored in a file. Mostly used in AP scenarios.

The number of table columns is large, the number of accessed columns is small, and the number of IO operations is reduced.
Column data is homogeneous, improving data compression ratio
Operations based on column batch data, the CPU cache hit rate is high

Column storage table storage method

execution framework

The executor is the hub of interaction between the optimizer and the storage engine. Taking the execution plan tree generated by the optimizer as input, the data is accessed from the storage engine, and various execution operators are operated according to the plan to realize data processing. Using the Pipeline mode, the row executor operates one tuple at a time, and the column executor operates one batch at a time. The upper layer drives the lower layer, allowing data to flow up the execution tree. Provides execution operators for various data processing. The figure below shows the top-down control flow and bottom-up data flow.

Pipeline mode of executor

The execution process of the executor can be divided into these three steps:

Executor initialization: Construct the executor's global status information estate, recursively traverse each node of the plan tree, and initialize its execution status information planstate
Execution of the executor: The entrances of the row engine and the vectorization engine are independent. Starting from the root node of the plan tree, it recursively traverses to the leaf nodes to obtain a tuple/batch. After processing by layer-by-layer node operators, a result tuple/batch is returned until No more tuple/batch.
Cleaning up the executor: Recycle the global status information of the executor and clean up the execution status of each plan node.

The execution process of the executor

column executor

The problem with the row executor is that most of the CPU processing is in the process of traversing the Plan Tree instead of actually processing the data, and the effective CPU utilization is low. The unique application scenarios of column storage tables require a supporting vectorization engine to truly take advantage of its performance improvement in OLAP scenarios. Therefore, the basic idea of transforming the column executor is to process one column of data at a time.

Like the row executor, the vectorized execution engine scheduler follows the Pipeline mode, but each processing and data transfer between operators is one batch at a time (that is, 1000 rows of data), which improves the CPU hit rate and reduces IO read operations. The data flow structure of the column executor VectorBatch is shown in the figure below.

Column Executor Data Flow Structure VectorBatch

Mixing rows and columns: Adapter operator

Some scenarios of column storage tables do not support vectorized execution engines, such as: string_to_array, listagg, string_agg, etc.
GaussDB has the ability to automatically switch between two sets of row and column engines.

Automatic switching of row and column engines

For column storage data, if there is only a row engine, it is usually necessary to reconstruct the column data into tuples for the execution engine to process row by row. The Tuple deform process affects the performance of column storage data query processing.

Vectorized execution engine performance

Comparing the calculation performance of the same expression x*(1-y) by the row and column storage engine, we can see that the Cstore Scan operator of the column storage engine takes 85% less time than the Seq Scan operator of the row storage engine.

Row/column engine performance comparison

The characteristics of vector computing are: calculating multiple values at one time, reducing function calls and context switches, and making full use of the CPU cache and vectorized execution instructions to improve performance.

Performance advantages of vectorized execution engines:

One batch at a time, read more data and reduce the number of IO reads
Due to the large number of records in the Batch, the cache hit rate of the corresponding CPU increases.
The number of function calls during Pipeline mode execution is reduced.
Matched with column storage tables to reduce tuple deform, that is, the time overhead of reconstructing tuples from column storage data

Comparison of operators of row/column executors

The execution operators of the vectorization engine are similar to the row execution engine, including control operators, scan operators, materialization operators and connection operators. It will also be represented by nodes, inherited from row execution nodes, and the execution process will be recursive. The main nodes included are: CStoreScan (sequential scan), CStoreIndexScan (index scan), CStoreIndexHeapScan (using Bitmap to obtain tuples), VecMaterial (materialization), VecSort (sorting), VecHashJoin (vectorized hash connection), etc., which will be discussed one by one below. Introduce these execution operators.

Scan operator

The scan operator is used to scan the data in the table, and each time it obtains a tuple as the input of the upper node, it exists in the leaf node of the query plan tree. It can not only scan the table, but also scan the result set of the function, the linked list structure, and the child. Query result set. Some of the more common scan operators are shown in the table.

Operator (row/column storage operator)	meaning	Appear scene
SeqScan/ CStoreScan	sequential scan	The most basic scan operator, used to scan physical tables (sequential scan without index assistance)
IndexScan/CStoreIndexScan	index scan	An index is created on the attributes involved in the selection criteria
IndexOnlyScan/CStoreIndexOnlyScan	Return tuple directly from index	Index columns completely cover result set columns
BitmapScan(BitmapIndexScan, BitmapHeapScan) / CStoreIndexHeapScan (CStoreIndexAnd, CStoreIndexOr，CStoreIndexCtidScan)	Use Bitmap to get tuples	BitmapIndexScan uses the index on the attribute to scan and returns the result as a bitmap; BitmapHeapScan obtains the tuple from the bitmap output by BitmapIndexScan
TidScan	Get tuple by tuple tid	1.WHERE conditions(like CTID = tid or CTID IN (tid1, tid2, …)) ；2.UPDATE/DELETE … WHERE CURRENT OF cursor
SubqueryScan/VecSubqueryScan	subquery scan	Use another query plan tree (subplan) as the scan object to scan tuples
FunctionScan	function scan	FROM function_name
ValuesScan	Scan the values linked list	Scan the collection of tuples given by the VALUES clause
ForeignScan/VecForeignScan	External table scan	Query external table
CteScan/VecCteScan	CTE table scan	Scan subqueries defined with WITH clause in a SELECT query

connection operator

The join operator corresponds to the join operation in relational algebra. Taking table t1 join t2 as an example, the main centralized join types are as follows: inner join, left join, right join, full join, semi join, anti join , and their implementation methods include Nestloop ,HashJoin,MergeJoin ;

Operator (row/column storage operator)	meaning	Appear scene
NestLoop/VecNestLoop	Nested loop connection, violent connection, scan the internal table for each row	Inner Join, Left Outer Join, Semi Join, Anti Join
MergeJoin/VecMergeJoin	Merge connection (input order), sorting of inner and outer tables, positioning the first and last ends, and connecting tuples at once. Equijoin	Inner Join, Left Outer Join, Right Outer Join, Full Outer Join, Semi Join, Anti Join
HashJoin/VecHashjoin	Hash join, the inner and outer tables use the hash value of the join column to create a hash table, and the same values must be in the same hash bucket. Equijoin	Inner Join, Left Outer Join, Right Outer Join, Full Outer Join, Semi Join, Anti Join

materialization operator

Materialized operators are a type of node that can cache tuples. During execution, many extended physical operators need to first obtain all tuples before they can operate (such as aggregate function operations, sorting without index assistance, etc.). This requires using materialization operators to cache the tuples;

Operator (row/column storage operator)	meaning	Appear scene
Material/VecMaterial	materialize	Cache child node results
Sort/VecSort	sort	ORDER BY clause, connection operation, grouping operation, set operation, with Unique
Group/VecGroup	Grouping operations	GROUP BY child clause
Agg/VecAggregation	Execute aggregate function	1. Aggregation functions such as COUNT/SUM/AVG/MAX/MIN; 2. DISTINCT clause; 3. UNION to remove duplicates; 4. GROUP BY clause
WindowAgg/VecWindowAgg	window function	WINDOW clause
Unique/VecUnique	Deduplication (the lower level has been sorted)	1. DISTINCT clause; 2. UNION deduplication
Hash	HashJoin helper node	Construct a hash table and cooperate with HashJoin
SetOp/VecSetOp	Handling collection operations	INTERSECT/INTERSECT ALL, EXCEPT/EXCEPT ALL
LockRows	Handling row-level locks	SELECT … FOR SHARE/UPDATE

control operator

Control operators are a type of node used to handle special situations and implement special execution processes.

Operator (row/column storage operator)	meaning	Appear scene
Result/VecResult	Calculate directly	1. Does not include table scan; 2. There is only one VALUES clause in the INSERT statement; 3. When Append/MergeAppend is the plan root node (projection push-up)
ModifyTable	INSERT/UPDATE/DELETE upper node	INSERT/UPDATE/DELETE
Append/VecAppend	addition	1. UNION(ALL); 2. Inheritance table
MergeAppend	Append (input ordered)	1. UNION(ALL); 2. Inheritance table
RecursiveUnion	Handling UNION subqueries defined recursively in WITH clause	WITH RECURSIVE … SELECT … statement
BitmapAnd	Bitmap logical AND operation	BitmapScan for multidimensional index scanning
BitmapOr	Bitmap logical OR operation	BitmapScan for multidimensional index scanning
Limit/VecLimit	Handling LIMIT clauses	OFFSET … LIMIT …

Other operators

Other operators include Stream operators, and operators such as RemoteQuery

Operator (row/column storage operator)	meaning	Appear scene
Stream	Multi-node data exchange	Execute a distributed query plan, and there is data exchange between nodes
Partition Iterator	Partitioned iterator	Partition table scan, iteratively scan each partition
VecToRow/RowToVec	Column to row/Row to column	Mixed scene of ranks and ranks
DfsScan / DfsIndexScan	HDFS table (index) scan	HDFS table scan

The evolution of Gaussdb vectorization

After the first generation vectorization engine, GaussDB evolved vectorization engines with higher performance: Sonic vectorization engine and Turbo vectorization engine.
In order to improve OLAP execution performance, GaussDB continues to evolve on the road of column storage + vectorized execution engine and batch calculation:

Stream operator + distributed execution framework supports data flow between multiple nodes
SMP, multi-thread parallelism within the node, making full use of idle hardware resources
LLVM technology, a new code generation framework, JIT (just in time) compiler, eliminates tuple deform bottlenecks
Sonic vectorization engine further vectorizes HashAgg and HashJoin operators, and implements different Arrays to calculate data according to different types of each column.
The new generation Turbo vectorization engine further vectorizes most operators. Based on the Sonic engine, Null optimization, large integer optimization, Stream optimization, Sort optimization, etc. are added to further improve performance.

Summarize

This article introduces the GaussDB vectorized execution engine, and elaborates on its framework, principles, overview of each operator, and performance improvement.

Click to follow and learn about Huawei Cloud’s new technologies as soon as possible~