GreenPlum is a distributed database is the bottom part table PostgreSQL multiple sub-library, it has the following characteristics
- Supports standard SQL, PostgreSQL supports almost all SQL, greenplum support
- Support for ACID, distributed transactions
- Support hundreds of clusters (this is a little bad, hadoop can million units)
system structure
Master Host
- Processing user requests, execution plan, and to perform the necessary polymerization operation (avg) ordering the execution plan or
- The interior has a PostgreSQL database, save all metadata, indexing information
- Monitor the status of all segment
Segment host
- Each Segment host a plurality of segment, segment is generally equal to the number of core
- segment is a PostgreSQL database, is responsible for storing specific data
Internal Network
Internal network udp GreenPlum use, but will Greenplum check packets, the reliability is equivalent to TCP. When using TCP, it supports up to 1000 segment
Implementation plan
When the master receives a SQL statement, this statement will resolve to implement the plan DAG, DAG does not need to be divided into slice data exchange, multi-table joins, aggerate, sort of, that it will involve the redistribution of slice, there will be a motion task to perform re-distribution of the data. The slice issued to the associated segment involved.
I think that the slice is similar to the concept stage of the Spark, no data shuffle
motion mode
- gather motion (N-> 1): on the master node to all the segment data gathered, generally sort, sort group, sort join
- boardcast motion (N-> N): Each segment is broadcast to all of the remaining data segment
- redistribute motion (N-> N): the data of each segment according to the way hash redistribution
- Chapter One Introduction greenplum
- Chapter greenplum Quick Start
- Chapter III Greenplum combat
- Chapter IV Detailed Data Dictionary
- Chapter V Detailed Implementation Plan
- Chapter VI Greenplum Advanced Applications
- Chapter VII Greenplum architecture introduced
- Chapter VIII Greenplum online environment deployment
- Chapter IX Database Management
Author: Liangqiu _ but not the arrival of spring
link: https: //www.jianshu.com/p/9be1439f5bd3
Source: Jane book
Jane book copyright reserved by the authors, are reproduced in any form, please contact the author to obtain authorization and indicate the source.