引用
当系统中的某些表存在高度不均匀的数据分布时,使用直方图能够产生更好的选择性评估,从而产生更加优化的执行计划。
通过下面的例子来感受直方图的作用
基础数据
drop user sure cascade; create user sure identified by oracle; grant resource to sure; create table sure.tab (a number, b number);
插入1万条数据
begin for i in 1..10000 loop insert into sure.tab values (i, i); end loop; commit; end; /
制造 不均匀的情况
update sure.tab set b=5 where b between 6 and 9995; commit;
此时b列的数据分布为
select b, count(*) from sure.tab group by b order by b; B COUNT(*) ---------- ---------- 1 1 2 1 3 1 4 1 5 9991 9996 1 9997 1 9998 1 9999 1 10000 1
【1】在创建索引之前,无论是查询b=1或者是b=5,都只能走 全表扫描。
explain plan for select * from sure.tab where b=1; select * from table(dbms_xplan.display); -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 26 | 6 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TAB | 1 | 26 | 6 (0)| 00:00:01 | --------------------------------------------------------------------------
explain plan for select * from sure.tab where b=5; select * from table(dbms_xplan.display); -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9991 | 253K| 6 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TAB | 9991 | 253K| 6 (0)| 00:00:01 | --------------------------------------------------------------------------
【2】在b列上创建一个索引
create index sure.ix_tab_b on sure.tab(b);
explain plan for select * from sure.tab where b=1; select * from table(dbms_xplan.display); ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 26 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TAB | 1 | 26 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IX_TAB_B | 1 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- - dynamic sampling used for this statement 18 rows selected.
explain plan for select * from sure.tab where b=5; select * from table(dbms_xplan.display); -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9991 | 253K| 6 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TAB | 9991 | 253K| 6 (0)| 00:00:01 | -------------------------------------------------------------------------- - dynamic sampling used for this statement 17 rows selected
网上的例子说,在有了索引以后,应该都走INDEX RANGE SCAN,但是实际情况(10201)却是,b=5时,依然选择到了正确的路径-- 全表扫描。这是因为没有统计信息,Oracle进行了 动态采样,相当于临时收集了一小份统计信息,所以这时反而挺准的。
【3】收集统计信息,但不收集直方图
analyze table sure.tab compute statistics;
与统计信息相关的视图
select num_rows, blocks, empty_blocks, avg_space, chain_cnt, avg_row_len from dba_tables where table_name = 'TAB'; NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE CHAIN_CNT AVG_ROW_LEN ---------- ---------- ------------ ---------- ---------- ----------- 10000 20 4 2080 0 10 col low_value format a16 col high_value format a16 col column_name format a16 select column_name, num_distinct, low_value, high_value, density, num_buckets from dba_tab_columns where table_name = 'TAB'; COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_BUCKETS ---------------- ------------ ---------------- ---------------- ---------- ----------- A 10000 C102 C302 .0001 1 B 10 C102 C302 .1 1 select table_name, column_name, endpoint_number, endpoint_value from dba_tab_histograms where table_name = 'TAB'; TABLE_NA COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE -------- ---------------- --------------- -------------- TAB B 0 1 TAB A 0 1 TAB B 1 10000 TAB A 1 10000
观察执行计划
explain plan for select * from sure.tab where b=1; select * from table(dbms_xplan.display); ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1000 | 6000 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TAB | 1000 | 6000 | 4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IX_TAB_B | 1000 | | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------------------
explain plan for select * from sure.tab where b=5; select * from table(dbms_xplan.display); ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1000 | 6000 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TAB | 1000 | 6000 | 4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IX_TAB_B | 1000 | | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------------------
有了统计信息, b=5的查询反而跑偏了。
【4】创建tab表b列的柱状图统计信息,使得优化器能够知道该列每个值的具体分布情况。
analyze table sure.tab compute statistics for columns b size 10;
直方图中的ENDPOINT_VALUE表示列值,ENDPOINT_NUMBER表示累积的行数。
select table_name, column_name, endpoint_number, endpoint_value from dba_histograms where table_name = 'TAB'; TABLE_NA COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE -------- ---------------- --------------- -------------- TAB B 1 1 TAB B 2 2 TAB B 3 3 TAB B 4 4 TAB B 9995 5 TAB B 9996 9996 TAB B 9997 9997 TAB B 9998 9998 TAB B 9999 9999 TAB B 10000 10000 TAB A 0 1 TAB A 1 10000
观察执行计划
explain plan for select * from sure.tab where b=1; select * from table(dbms_xplan.display); ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TAB | 1 | 6 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IX_TAB_B | 1 | | 1 (0)| 00:00:01 | ----------------------------------------------------------------------------------------
explain plan for select * from sure.tab where b=5; select * from table(dbms_xplan.display); -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9991 | 59946 | 6 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TAB | 9991 | 59946 | 6 (0)| 00:00:01 | --------------------------------------------------------------------------
【总结】对于b=5的查询来说,全表扫描比之索引范围扫描更加合理,有直方图,优化器就可以做出正确的判断。(没有统计信息,因为动态采样而选对路径,属于歪打正着)