Phoenix brief

Reprinted from: https: //blog.csdn.net/carolzhang8406/article/details/79455684

1. Phoenix defined

Phoenix was originally an open source project saleforce, and later became a top-level project of the Apache Foundation.

Phoenix is ​​to build a SQL layer on HBase, and allows us to use standard JDBC APIs rather than HBase client APIs to create tables, insert data to HBase and query the data.

put the SQL back in NoSQL

Phoenix written entirely in Java, as HBase embedded JDBC driver. Phoenix SQL query engine will query into one or more HBase scans, and to perform scheduling result sets JDBC standard. Directly HBase API, a coprocessor with a custom filter for simple query, its performance is the order of milliseconds, the number of lines for one million level, its performance is the order of seconds.

HBase query tools are many, such as: Hive, Tez, Impala, Spark SQL, Phoenix and so on.

Phoenix so that we can be less in the following ways to write code, and the performance better than our own to write code:

  • SQL is compiled into HBase scans native.
  • Scan to determine the best keywords start and end
  • Let scan parallel execution
  • ...

 

2. Phoenix architecture

 

 

Phoenix Architecture.png
  • Phoenix location in the Hadoop ecosystem
 
Location .png

characteristic

 

3.1 Transactions (Beta) Services

This feature is still in beta version, not the official version. By integrating tephra , Phoenix can support the ACID properties. Tephra is an Apache project, is a transaction manager, which provides a global agreement on such matters as HBase distributed data storage. HBase itself supports a strong consistency in the row level and district level, Tephra provide additional intersection, cross-table consistency to support scalability.

3.2 the User-defined Functions (of UDFs) user-defined function

3.2.1 Overview

Phoenix from the 4.4.0 version began to support user-defined functions.

Users can create temporary or permanent user-defined functions. These user-defined functions can be built as create, upsert, delete as is called. Temporary function for a specific session or connection, not visible to other sessions or connections. Meta information permanently in a function will be called a SYSTEM.FUNCTION system tables are visible storage of any session or connection.

3.3 Secondary the Indexing secondary index

In HBase, only a single rowKey indexed by lexicographical sort, when using rowKey for data queries faster, but if you do not use rowKey they would use to query filter to scan the entire table, large reduce the retrieval performance degree. Providing two Phoenix and indexing techniques to deal with such a use condition than rowKey retrieval scenario.

  • Covered Indexes

Only you need to be able to return the data to be queried by the index, so the index column must contain columns (SELECT column and WHRER column) the required query

  • Functional Indexes

Phoeinx4.3 support from more than just a functional index, its index is not limited to columns, any expression may be appropriate to create the index, when the use of these expressions in the query expression returns the result directly

  • Global Indexes

Global indexing for multi-read write less business scenarios.
Use Global indexing words when writing data can consume a lot of overhead, because all the updates to the data table (DELETE, UPSERT VALUES and UPSERT SELECT ), will cause the updated index table, and the index table is distributed across different data nodes data transfer across the nodes brought greater performance overhead. When reading data Phoenix will choose the index table to reduce the time consumed by the query. If you want to query the field is not indexed fields in the table, then the index will not be used by default, that will not improve query speed.

  • Local Indexes

Local indexing apply to write the scene of frequent operation.
Like the Global indexing, Phoenix will automatically determine whether a query when making use of the index. When you use Local indexing, data and index data stored in the data table is the same server to avoid the write index brings to the table a different index server at the time of the write operation overhead. When using Local indexing fields even if the query is not the index table index fields will be used, which will bring improve query speed, this is different with Global indexing. All index data of a data table are stored in a separate single shareable table.

3.4 Statistics Collection statistics collection

UPDATE STATISTICS can update the statistics on a particular table to improve query performance

3.5 Row timestamp timestamp

From the beginning of 4.6 version, Phoenix provides a method for the row HBase native timestamp column mapped to Phoenix. This is conducive to full use for storing files HBase provides a variety of time optimization, as well as Phoenix built a variety of query optimization.

3.6 Paged Queries paging query

Phoenix supports paging query:

  • Row Value Constructors (RVC)
  • OFFSET with limit

3.7 Salted the Tables walking table

If the row key is automatically increased, then the sequential write HBase region server can lead to problems of data hot spots, Phoenix's Salted Tables technology can solve the hot issues of the region server

3.8 Skip Scan Skip Scan

Can improve performance when scanning range

3.9 Views view

View now standard SQL syntax in Phoenix also supported. This makes it possible to create multiple virtual table on the same underlying physical HBase table.

3.10 Multi tenancy Multi-tenancy

By specifying different tenants connected data access isolation

3.11 Dynamic the Columns dynamic columns

Phoenix 1.2, specifying columns dynamically is now supported by allowing column definitions to included in parenthesis after the table in the FROM clause on a SELECT statement. Although this is not standard SQL, it is useful to surface this type of functionality to leverage the late binding ability of HBase.

3.12 Bulk CSV the Data Loading a large number of CSV data loading

CSV data table is loaded into Phoenix in two ways: 1. in a single-threaded manner by loading psql command for the case where a small amount of data. 2. MapReduce-based bulk load tools for large volumes of data case

3.13 Query Server Query servers

Phoenix4.4 introducing a separate server to provide connection thin client

3.14 Tracing track

Phoenix increased from 4.1 version to start this feature to track the trail each query, which allows users to see each query or inserted behind the operation from the client every step of the HBase side execution.

3.15 Metrics index

Phoenix offers a variety of indicators allows us to know the Phoenix client inside what is happening in the implementation of different SQL statements. These indicators in the client JVM is collected in two ways:

  • Request level metrics - collected at an individual SQL statement
    level
  • Global metrics - collected at the client JVM level

Guess you like

Origin www.cnblogs.com/xibuhaohao/p/11855389.html