初步理解直方图

引用
当系统中的某些表存在高度不均匀的数据分布时,使用直方图能够产生更好的选择性评估,从而产生更加优化的执行计划。


通过下面的例子来感受直方图的作用

基础数据
drop user sure cascade;
create user sure identified by oracle;
grant resource to sure;
create table sure.tab (a number, b number);


插入1万条数据
begin
for i in 1..10000 loop
insert into sure.tab values (i, i);
end loop;
commit;
end;
/


制造 不均匀的情况
update sure.tab set b=5 where b between 6 and 9995;
commit;


此时b列的数据分布为
select b, count(*) from sure.tab group by b order by b;
         B   COUNT(*)
---------- ----------
         1          1
         2          1
         3          1
         4          1
         5       9991
      9996          1
      9997          1
      9998          1
      9999          1
     10000          1
    

【1】在创建索引之前,无论是查询b=1或者是b=5,都只能走 全表扫描
explain plan for select * from sure.tab where b=1;
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    26 |     6   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TAB  |     1 |    26 |     6   (0)| 00:00:01 |
--------------------------------------------------------------------------


explain plan for select * from sure.tab where b=5;
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  9991 |   253K|     6   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TAB  |  9991 |   253K|     6   (0)| 00:00:01 |
--------------------------------------------------------------------------


【2】在b列上创建一个索引
create index sure.ix_tab_b on sure.tab(b);

explain plan for select * from sure.tab where b=1;
select * from table(dbms_xplan.display);
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    26 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TAB      |     1 |    26 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IX_TAB_B |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
   - dynamic sampling used for this statement
18 rows selected.


explain plan for select * from sure.tab where b=5;
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  9991 |   253K|     6   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TAB  |  9991 |   253K|     6   (0)| 00:00:01 |
--------------------------------------------------------------------------
   - dynamic sampling used for this statement
17 rows selected


网上的例子说,在有了索引以后,应该都走INDEX RANGE SCAN,但是实际情况(10201)却是,b=5时,依然选择到了正确的路径-- 全表扫描。这是因为没有统计信息,Oracle进行了 动态采样,相当于临时收集了一小份统计信息,所以这时反而挺准的。

【3】收集统计信息,但不收集直方图
analyze table sure.tab compute statistics;

与统计信息相关的视图
select num_rows, blocks, empty_blocks, avg_space, chain_cnt, avg_row_len
from dba_tables where table_name = 'TAB';
  NUM_ROWS     BLOCKS EMPTY_BLOCKS  AVG_SPACE  CHAIN_CNT AVG_ROW_LEN
---------- ---------- ------------ ---------- ---------- -----------
     10000         20            4       2080          0          10
   
col low_value format a16  
col high_value format a16
col column_name format a16
select column_name, num_distinct, low_value, high_value, density, num_buckets
from dba_tab_columns where table_name = 'TAB';
COLUMN_NAME      NUM_DISTINCT LOW_VALUE        HIGH_VALUE          DENSITY NUM_BUCKETS
---------------- ------------ ---------------- ---------------- ---------- -----------
A                       10000 C102             C302                  .0001           1
B                          10 C102             C302                     .1           1

select table_name, column_name, endpoint_number, endpoint_value
from dba_tab_histograms where table_name = 'TAB';
TABLE_NA COLUMN_NAME      ENDPOINT_NUMBER ENDPOINT_VALUE
-------- ---------------- --------------- --------------
TAB      B                              0              1
TAB      A                              0              1
TAB      B                              1          10000
TAB      A                              1          10000


观察执行计划
explain plan for select * from sure.tab where b=1;
select * from table(dbms_xplan.display);
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |  1000 |  6000 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TAB      |  1000 |  6000 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IX_TAB_B |  1000 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------


explain plan for select * from sure.tab where b=5;
select * from table(dbms_xplan.display);
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |  1000 |  6000 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TAB      |  1000 |  6000 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IX_TAB_B |  1000 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------


有了统计信息, b=5的查询反而跑偏了

【4】创建tab表b列的柱状图统计信息,使得优化器能够知道该列每个值的具体分布情况。
analyze table sure.tab compute statistics for columns b size 10;

直方图中的ENDPOINT_VALUE表示列值,ENDPOINT_NUMBER表示累积的行数。
select table_name, column_name, endpoint_number, endpoint_value
from dba_histograms where table_name = 'TAB';
TABLE_NA COLUMN_NAME      ENDPOINT_NUMBER ENDPOINT_VALUE
-------- ---------------- --------------- --------------
TAB      B                              1              1
TAB      B                              2              2
TAB      B                              3              3
TAB      B                              4              4
TAB      B                           9995              5
TAB      B                           9996           9996
TAB      B                           9997           9997
TAB      B                           9998           9998
TAB      B                           9999           9999
TAB      B                          10000          10000
TAB      A                              0              1
TAB      A                              1          10000


观察执行计划
explain plan for select * from sure.tab where b=1;
select * from table(dbms_xplan.display);
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |     6 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TAB      |     1 |     6 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IX_TAB_B |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------


explain plan for select * from sure.tab where b=5;
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  9991 | 59946 |     6   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TAB  |  9991 | 59946 |     6   (0)| 00:00:01 |
--------------------------------------------------------------------------


【总结】对于b=5的查询来说,全表扫描比之索引范围扫描更加合理,有直方图,优化器就可以做出正确的判断。(没有统计信息,因为动态采样而选对路径,属于歪打正着)

猜你喜欢

转载自jcat.iteye.com/blog/2229140
今日推荐