Series of articles
- Real-time storage engine and real-time calculation engine
- Meituan Dianping Hadoop/Spark System Practice
- Meituan Big Data Query Technology
Article Directory
This article mainly related to data resources and services in data products and data services section.
The contents of this article are as follows:
1. Application scenarios
Background: I want to understand how this business affects the entire Meituan App.
"Growth Hacker" mentioned a pirate model method, which is essentially a funnel-shaped disassembly and analysis of traffic conversion. Contains the steps from user acquisition to user conversion and activation and corresponding analysis methods.
But only methods are not enough, and there must be corresponding data support. How is the data organized? This is divided into 5 parts.
What should we do when we do these analyses?
It depends on the following SQL.
First look at FROM, associate order table, city table and city dimension table,
then look at WHERE, select the bus business and repurchase orders between August 18 and August 19, then
look at GROUP BY and SUM, it’s basically clear Up.
The OLAP analysis mentioned earlier is used here. What are the methods for OLAP analysis? There are five kinds mentioned here.
- Drilling (drilling down): Increase the dimension and analyze the problem with a finer granularity. (Assume that a rectangular parallelepiped has one layer and expands into three layers)
- Rolling up (drilling up): Reduce dimensions and be able to look at problems from a macro (relative) perspective. (Assuming a cube has three layers, compressed into one layer)
- Slice: The same dimension, only one value is looked at. (Assuming a cube has three layers, only one layer is retained)
- Dicing: The same dimension, only a few values. (Assuming that a cube has three layers, only two layers remain)
- Rotation: row and column transformation.
Some commercial BI systems
Two, system architecture
2.1. System Architecture Review-Presto
Focus on introducing the widely used and representative databases to see how distributed SQL statements run.
First introduce the database selection ideas of Meituan. Mainly divided into the following three scenarios:
Presto we will introduce in the next stage is mainly in the ad hoc query part. Let's take a look at the evolutionary history of Presto, if you are interested.
The design concept in a nutshell is to trade reliability for performance . The Spark and Map Reduce we talked about before are placed in the shuffle process, so that even if a node is down, it can be quickly built on the previous basis. Run, reuse the previous data as much as possible. But Presto did not consider this issue, his positioning is relative to Hive and Spark ultra-large-scale scenarios.
Then if you can't hold it, you will fail fast. You can also associate data from other types of databases.
From a macro perspective, the overall structure is as shown in the figure above. The blue one is the service of Presto itself, and the front is a client-type control, similar to the MySQL command line. MetaStore basically stores the metadata of the tables and libraries on HDFS. (Dolls are prohibited)
Presto is still a master-slave structure, Cordinator is the coordinator, and Worker is the worker.
The detailed structure is shown in the figure:
The client submits a SQL, analyzes, plans, splits, and schedules in the Coordinator, and then submits the modules managed by each worker. After each worker gets the task, it scans the data from HDFS and then calculates it. After each worker aggregates, it passes one The stream interface is returned to the Client.
The important part is the internal processing of Coordinator.
How to parse the grammar? This actually involves knowledge of compilation principles, and generates syntax trees through lexical analysis and grammatical analysis.
What does the syntax tree look like? Probably like
to make a statement in the SQL explain
database will tell you how to perform next. According to the syntax tree obtained above, a logical plan of the execution process is constructed.
Here is the abstraction of Presto's access to the data source. The SQL itself is shielded at the layer of reading data. This requires some configuration to access the data source.
This is the optimization part of SQL. pv is the table viewed by the user.
Normally, the process of writing a SQL is to associate pv and user, and then select eligible rows. Can it be optimized here? If so, how to optimize it?
When there is a lot of data, it would be great if we filter out the data we need before the association table. In Presto, before associating, when scanning the table, not all the data is taken out, only the siteId and userid are used for association, which becomes more efficient.
How to achieve these optimizations? First, you will define some rules, and then run these optimized strategies over and over again based on the tree structure of these logical plans, but whenever you find a similar structure, you can adjust the tree of the logical plan to make the entire SQL more efficient.
After optimizing the logical plan, we must consider how to physically implement the logical plan.
The division is mainly based on the abstraction of Optimizer. Then after the segmentation is finished, some Shuffle and Group by and HashJoin will be done.
The Shuffle here is not sold.
Question: If you don't place an order during Shuffle, what restrictions will SQL impose? (Hint: when joining / groupby)
Answer: When associating or merging, we want to
calculate the data of the same key at the same node, but if data skew occurs, the memory of a node will explode if there is too much data.
Then we get a distributed execution plan, which is scheduled to be executed on each node separately. This is the detailed execution logic inside Coordinator.
Below is the scheduling of the physical plan.
Presto will also do some physical level programs, such as codegen, the main feature is the run-time compiler .
Another optimization is partial storage layer, data index and data organization structure optimization. Derive column-based storage structures (such as ORC, parquet, etc.) from the row-based storage structure.
2.2. Distributed OLAP system expansion technology
Introduce some trade-offs when implementing architecture design of some systems. It mainly introduces four systems: Kylin, Druid, Clickhouse and Doris.
2.2.1 Kylin and Cube pre-aggregation
There are open source and commercial versions. The specific feature is pre-polymerization. What does that mean?
Assuming that a fact table has many dimensions, it is often necessary to aggregate according to these dimensions, and then Kylin will aggregate before we find it. When we find it, go to Kylin and select it. It is commonly used in data cube analysis.
2.2.2 Druid and streaming write isolation, the dimension column is inverted
Druid itself is also a columnar storage, and the most extreme is an inverted index-an index that stores the value of a column with a bitmap.
For example, there are two values in the column advertiser, we scan the entire column to get a bitmap,{"bing.com" :[0,0,0,1], "google.com": [1, 1, 1, 0] }. The length of the array is the number of rows in the column. [0,0,0,1] corresponding to bing.com means the value of bing.com that appears in the fourth row. Why is it 4? Because the position/index of 1 in the array is 4 (start at index 1). We can also easily see that [1, 1, 1, 0] corresponding to google.com means that google.com appears in lines 1, 2, and 3.
Thus when we WHERE
when there are a plurality of conditions can be taken directly after the and
operation, the operational efficiency becomes.
2.2.3 Clickhouse and SIMD
It is mainly through the use of hardware and memory to improve on-site computing capabilities, which can make full use of the CPU.
2.2.4 Doris and our integration plan
Doris is cohesive and has no major external dependencies. Compatible with MySQL protocol. (Previously called Palo, the reverse of OLAP)
Frontend can be considered as Presto's horizontal expansion Coordinator, and Bankend can be considered as horizontal expansion Worker plus some storage.
Use the LSM-Tree mode. Solved the problem that I hope to have a faster result for KV queries while improving the overall throughput capacity when writing fast and large batches.
Three, transformation case
3.1 Presto on Yarn
Bind Presto's elastic scaling, query scheduling and YARN together.
3.2 Unified ADhoc query One SQL
Mainly to solve the problem of different dialects of multiple engines. The
reconstructed architecture is shown in the figure:
(There is even a training decision tree model, and the extraction of feature judgment sentences is faster on that database, and then the corresponding engine dialect is generated)
3.3 Unified OLAP construction
3.4 Database comparison method
By comparing databases from a multi-dimensional perspective, there are two methods that can help you build the content of the database and the structure of SQL, and then test the implementation of the database with the structure.
This is the performance comparison after the transformation:
Learning is not easy, and praise and collection.