phoenix Explain Plan 翻译

Explain Plan
An EXPLAIN plan tells you a lot about how a query will be run:
一个执行计划会告诉你一个执行计划怎么执行
All the HBase range queries that will be executed
所有hbase 范围查询会被执行
An estimate of the number of bytes that will be scanned
评估多少字节会被扫描
An estimate of the number of rows that will be traversed
评估多少行被遍历
Time at which the above estimate information was collected
收集上述评估时间
Which HBase table will be used for each scan
每次扫描那个表会使用
Which operations (sort, merge, scan, limit) are executed on the client versus the server
在客户端和服务端，那个操作（排序，合并，扫描，限制列数）会被执行
Use an EXPLAIN plan to check how a query will run, and consider rewriting queries to meet the following goals:
用explain plan 执行计划，检测一个查询如何执行，考虑如下目标，来重写查询：
Emphasize operations on the server rather than the client. Server operations are distributed across the cluster and operate in parallel, while client operations execute within the single client JDBC driver.
强调在服务器上执行，而不是在客户端执行。服务器操作是通过分布式集群并发执行，而客户端是通过单一的jdbc驱动执行。
Use RANGE SCAN or SKIP SCAN whenever possible rather than TABLE SCAN.
使用范围扫描和跳跃扫描，而不是表级别的扫描。
Filter against leading columns in the primary key constraint. This assumes you have designed the primary key to lead with frequently-accessed or frequently-filtered columns as described in “Primary Keys,” above.
对主导主键的列进行过滤。
If necessary, introduce a local index or a global index that covers your query.
如果可以的话，在你的查询语句中引入本地索引和全局索引覆盖在你的
If you have an index that covers your query but the optimizer is not detecting it, try hinting the query: SELECT /*+ INDEX() */ …
如果你的查询语句包含一索引，但是优化器，没有发现他，你可以试隐示使用select/*+index()*/ 指明
See also: http://phoenix.apache.org/language/index.html#explain

Anatomy of an Explain Plan
剖析执行计划
An explain plan consists of lines of text that describe operations that Phoenix will perform during a query, using the following terms:
解释计划由Phoenix查询解释性执行的操作的文本行组成，使用以下术语:
AGGREGATE INTO ORDERED DISTINCT ROWS—aggregates the returned rows using an operation such as addition.

列如使用加操作，就会返回聚合成不同有序的行。
When ORDERED is used, the GROUP BY operation is applied to the leading part of the primary key constraint, which allows the aggregation to be done in place rather than keeping all distinct groups in memory on the server side.
当使用ORDERED时，GROUP BY操作应用于主键约束的主要部分，这允许在适当的位置进行聚合，而不是将所有不同的组保存在服务器端内存中。
AGGREGATE INTO SINGLE ROW—aggregates the results into a single row using an aggregate function with no GROUP BY clause.
不使用用group by 聚合函数，返回一行的聚合结果集
For example, the count() statement returns one row with the total number of rows that match the query.
例如：count 返回一行匹配查询条件总记录数
CLIENT—the operation will be performed on the client side. It’s faster to perform most operations on the server side, so you should consider whether there’s a way to rewrite the query to give the server more of the work to do.
CLIENT -操作将在客户端执行。在服务端执行会更快，因此你应该考虑重写使服务做更多的工作。
FILTER BY expression—returns only results that match the expression.
FLETER过滤器只返回匹配表达式的结果
FULL SCAN OVER tableName—the operation will scan every row in the specified table.
全表扫描，这个操作会扫描指定表的每一行
INNER-JOIN—the operation will join multiple tables on rows where the join condition is met.
内连接-这个操作使用关联条件，关联多个表
MERGE SORT—performs a merge sort on the results.
合并排序-对结果集进行合并排序
RANGE SCAN OVER tableName [ … ]—The information in the square brackets indicates the start and stop for each primary key that’s used in the query.
对tablename 表进行范围扫描-方括号的信息表示，用于查询的开始和结束主键。
ROUND ROBIN—when the query doesn’t contain ORDER BY and therefore the rows can be returned in any order, ROUND ROBIN order maximizes parallelization on the client side.
ROUND ROBIN-当查询语句没有 order by 关键字，返回的行顺序会是随机的，轮询调度会再客户端最大并行的执行
x-CHUNK—describes how many threads will be used for the operation. The maximum parallelism is limited to the number of threads in thread pool.
x-CHUNK-描述这个操作使用了多少个线程。线程池中最大的并行线程数。
The minimum parallelization corresponds to the number of regions the table has between the start and stop rows of the scan. The number of chunks will increase with a lower guidepost width, as there is more than one chunk per region.
表开始和结束行的之间，最小并行分区数。每一个分区有多个块，随着导柱宽度的减小，块的数量也会增加。
PARALLELx-WAY—describes how many parallel scans will be merge sorted during the operation.
PARALLELx-WAY-在操作期间将合并多少并行扫描会被归并排序。
SERIAL—some queries run serially. For example, a single row lookup or a query that filters on the leading part of the primary key and limits the results below a configurable threshold.
串行——有些查询是串行运行的。例如，在主键的前导部分进行筛选并将结果限制在可配置阈值以下的单个行查找或查询。
EST_BYTES_READ - provides an estimate of the total number of bytes that will be scanned as part of executing the query
提供执行查询的一部分扫描的总字节数的估计值
EST_ROWS_READ - provides an estimate of the total number of rows that will be scanned as part of executing the query
提供执行查询的一部分扫描的行总数的估计值
EST_INFO_TS - epoch time in milliseconds at which the estimate information was collected
以毫秒为单位的元时间，评估信息收集时间。
Example
+-----------------------------------------------------------------------------------------------------------------------------------
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-----------------------------------------------------------------------------------------------------------------------------------
| CLIENT 36-CHUNK 237878 ROWS 6787437019 BYTES PARALLEL 36-WAY FULL SCAN
| OVER exDocStoreb | 237878 | 6787437019 | 1510353318102|
| PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) | 237878 | 6787437019 | 1510353318102|
| CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER indx_exdocb
| [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf']
| - [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bg' | 237878 | 6787437019 | 1510353318102|
| SERVER FILTER BY FIRST KEY ONLY | 237878 | 6787437019 | 1510353318102|
| SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] | 237878 | 6787437019 | 1510353318102|
| CLIENT MERGE SORT | 237878 | 6787437019 | 1510353318102|
| DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, [A.ID](http://a.id/))
IN ((TMP.MCT, TMP.TID)) | 237878 | 6787437019 | 1510353318102|
+-----------------------------------------------------------------------------------------------------------------------------------
JDBC Explain Plan API and the estimates information
The information displayed in the explain plan API can also be accessed programmatically through the standard JDBC interfaces. When statistics collection is enabled for a table, the explain plan also gives an estimate of number of rows and bytes a query is going to scan. To get hold of the info, you can use corresponding columns in the result set returned by the explain plan statement. When stats collection is not enabled or if for some reason Phoenix cannot provide the estimate information, the columns return null. Below is an example:

String explainSql = "EXPLAIN SELECT * FROM T";
Long estimatedBytes = null;
Long estimatedRows = null;
Long estimateInfoTs = null;
try (Statement statement = conn.createStatement(explainSql)) {
int paramIdx = 1;
ResultSet rs = statement.executeQuery(explainSql);
rs.next();
estimatedBytes =
(Long) rs.getObject(PhoenixRuntime.EXPLAIN_PLAN_ESTIMATED_BYTES_READ_COLUMN);
estimatedRows =
(Long) rs.getObject(PhoenixRuntime.EXPLAIN_PLAN_ESTIMATED_ROWS_READ_COLUMN);
estimateInfoTs =
(Long) rs.getObject(PhoenixRuntime.EXPLAIN_PLAN_ESTIMATE_INFO_TS_COLUMN);
}

https://phoenix.apache.org/explainplan.html

翻译不准确的地方，还请大家指正，谢谢

phoenix Explain Plan 翻译

猜你喜欢