A few pictures to understand columnar storage

Recently I saw a very good information, which explained Column-based Storage in a few words and a few pictures, ah! What I like most is that this kind of simple and easy-to-understand makes the background knowledge clear and clear, rather than a long talk about concepts.

1 Why store by column

Columnar or column-based storage is relative to the row-based storage (Row-based storage ) of traditional relational databases . Simply put, the difference between the two is how to organize the table (the translation is not good, so I copied the original text directly):

Ø  Row-based storage stores atable in a sequence of rows.

Ø  Column-based storage storesa table in a sequence of columns.

 

Here's an example:

 

It can be clearly seen from the above figure that the data of the next table in row storage is put together, but the data in column storage is stored separately. So they have the following advantages and disadvantages:

                             

row storage

columnar storage

advantage

Ø Data is kept together

Ø INSERT/UPDATE is easy

Ø When querying, only the columns involved will be read

Ø Projection is very efficient

Ø Any column can be used as an index

shortcoming

Ø When selecting (Selection), even if only a few columns are involved, all data will be read

Ø When the selection is complete, the selected columns are to be reassembled

Ø INSERT/UPDATE is more troublesome

Note: Relational Database Theory Review - Selection and Projection


 

2 Supplement: Data Compression

I actually skipped another technique mentioned in the information just now: compressing data through dictionary tables. For the sake of explanation later, this part is also mentioned by the way.

Below is what the table looks like. After the dictionary table is compressed, the strings in the table become numbers. Because each string appears only once in the dictionary table, it achieves the purpose of compression (a bit like normalize and denormalize Normalize and Denomalize)

 

3 Query execution performance

The following is the best picture, illustrating the advantages of columnar storage (and data compression) through the execution of a query:

 

The key steps are as follows:

1. Go to the dictionary table to find the number corresponding to the string (only one string comparison is performed).

2. Use numbers to match in the list, and set the position on the match to 1.

3. Perform bitwise operations on the matching results of different columns to obtain the record subscripts that meet all the conditions.

4. Use this subscript to assemble the final result set.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326658557&siteId=291194637