oracle 子查询走hash内连接的问题

环境:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
“CORE 11.2.0.1.0 Production”
TNS for 64-bit Windows: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

说明:我们知道,子查询展开几乎没有直接hash join的,遇到走hash join的,说明cbo认为子查询里面的的值没有重复值
或者说不考虑有没有重复值,可以直接关联.此时一定要注意这类的连接列基数问题,很可能存在问题,下面的例子会具体说明;

由2个distinct字段触发:


drop table a purge;
drop table b purge;
create table a as select * from dba_objects;  
create table b as select * from dba_objects;  

select count(1) from a;
--场景1:很慢
A:select count(distinct owner), count(distinct object_name)  
  from a  
 where owner in (select owner from b);

这里写图片描述
可以看到,此时cbo选择的是hash_join, 这里的owner连接列的基数很低;
直接hash_join,比如这里的sys:30925,那么hash join一走记录数为30925*30925,
这就是一个小型笛卡尔积了

为啥cbo要走hash_join,估计人家认为,我不管你子查询有木有重复值,反正你是要count(distinct),又不影响结果;
直接跟你join;

场景2:这个很快:因为使用了hash右半连接
select count(owner), count(distinct object_name)
from a
where owner in (select owner from b);
这里写图片描述
为啥走右半连接,因为它没的选

–场景3:很快
select count(distinct owner), count(distinct object_name)
from a
where object_id in (select object_id from b);
这里写图片描述
这里也是走的hash join,为啥那么快,因为人家的是object_id,连接基数高,走hash也不影响;

由此我们知道了如何优化场景1的sql了:
1.不改变hash的连接方式,将子查询的结果集去重,变成n:1,这样返回的就是主表的记录数;
2.改变hash的连接方式走右半连接就对了,因为右半连接只返回主表数据;

解决方式1:去重
select count(distinct owner), count(distinct object_name)
from a
where owner in (select owner from b group by b.owner ) ;
这里写图片描述
可以发现,此时子查询还是展开的,只不过是第二形态而已;

解决方式2:走右半连接
select count(distinct owner), count(distinct object_name)
from a
where owner in (select /+ hash_sj/ owner from b );
可惜这里的hint没有用;

于是就手动改写吧
2.1不让视图合并
select count(distinct owner), count(distinct object_name)
from (select owner, object_name
from a
where owner in (select owner from b)
and rownum > 0);
这里写图片描述

2.2使用with固化
with tt as
(select /+ materialize / owner, object_name
from (select owner, object_name
from a
where owner in (select owner from b)))
select count(distinct owner), count(distinct object_name) from tt;
这里写图片描述
这里with语句,必须加materialize,否则就视图合并了;

2.3 加一个count(1)
select count(1), count(distinct owner), count(distinct object_name)
from a
where owner in (select owner from b);
这里写图片描述
为啥加多加一个count(1),cbo就乖乖走右半连接了呢.因为此时要返回的是主表的count数,此时cbo如果要想转join就必须知道,in子查询有没有重复值;或者有没有重复值都不影响结果
cbo掐指一算,这里in如果有重复值,对结果是有影响的,于是他肯定不会选择走hash join的;
因为cbo转换的基础是等价转换!

注意:这里的场景1 cbo选择走hash join跟是否直方图收集是没有的关系的;
在12c这隔小bug已经修复;

参考来源:
https://blog.csdn.net/robinson1988/article/details/51148332

oracle 子查询走hash内连接的问题

猜你喜欢