读书人

(转)关于semi-join/anti-join的1点探讨

发布时间: 2013-08-27 10:20:47 作者: rapoo

(转)关于semi-join/anti-join的一点探讨

? 原博客地址:http://space.itpub.net/15480802/viewspace-703260

?

create?table?dept (deptno number(5), dname varchar2(20));
create table emp (deptno number(5), name varchar2(20));
insert into dept values(1,'IT');
insert into dept values(2,'HR');
insert into dept values(3,'Marketplace');
insert into emp values(1,'justin1');
insert into emp values(1,'justin2');
insert into emp values(2,'justin3');
commit;

Semi-join
通常出现在使用了exists或in的sql中,所谓semi-join即在两表关联时,当第二个表中存在一个或多个匹配记录时,返回第一个表的记录;
与普通join的区别在于semi-join时,第一个表里的记录最多只返回一次;

例1
现在要查询出职员不为空的部门
普通的表连接会返回重复值
SQL> select d.deptno,d.dname from dept d, emp e where e.deptno = d.deptno;
?
DEPTNO DNAME
------ --------------------
???? 1 IT
???? 1 IT
???? 2 HR
---------------------------------------
| Id? | Operation????????? | Name | Rows? | Bytes | Cost (%CPU)| Time???? |
---------------------------------------
|?? 0 | SELECT STATEMENT?? |????? |???? 3 |?? 114 |???? 7? (15)| 00:00:01 |
|*? 1 |? HASH JOIN???????? |????? |???? 3 |?? 114 |???? 7? (15)| 00:00:01 |
|?? 2 |?? TABLE ACCESS FULL| DEPT |???? 3 |??? 75 |???? 3?? (0)| 00:00:01 |
|?? 3 |?? TABLE ACCESS FULL| EMP? |???? 3 |??? 39 |???? 3?? (0)| 00:00:01 |
---------------------------------------
而使用semi-join时候,不会返回重复记录
SQL> select d.deptno,d.dname from dept d where d.deptno in (select e.deptno from emp e);
?
DEPTNO DNAME
------ --------------------
???? 1 IT
???? 2 HR
---------------------------------------
| Id? | Operation????????? | Name | Rows? | Bytes | Cost (%CPU)| Time???? |
---------------------------------------
|?? 0 | SELECT STATEMENT?? |????? |???? 2 |??? 76 |???? 7? (15)| 00:00:01 |
|*? 1 |? HASH JOIN SEMI??? |????? |???? 2 |??? 76 |???? 7? (15)| 00:00:01 |
|?? 2 |?? TABLE ACCESS FULL| DEPT |???? 3 |??? 75 |???? 3?? (0)| 00:00:01 |
|?? 3 |?? TABLE ACCESS FULL| EMP? |???? 3 |??? 39 |???? 3?? (0)| 00:00:01 |
---------------------------------------

?

Anti-join
而anti-join则与semi-join相反,即当在第二张表没有发现匹配记录时,才会返回第一张表里的记录;
当使用not exists/not in的时候会用到,两者在处理null值的时候会有所区别


何时选择anti-join
1 使用not in且相应列有not null约束
2 not exists,不保证每次都用到anti-join
当无法选择anti-join时,oracle常会采用filter替代

例2
查询职员为空的部门
普通sql
SQL> select d.deptno,d.dname from dept d
? 2? minus
? 3? select d.deptno,d.dname from dept d where d.deptno in (select e.deptno from emp e);
?
DEPTNO DNAME
------ --------------------
???? 3 Marketplace
-----------------------------------------
| Id? | Operation??????????? | Name | Rows? | Bytes | Cost (%CPU)| Time???? |
-----------------------------------------
|?? 0 | SELECT STATEMENT???? |????? |???? 3 |?? 189 |??? 12? (75)| 00:00:01 |
|?? 1 |? MINUS?????????????? |????? |?????? |?????? |??????????? |????????? |
|?? 2 |?? SORT UNIQUE??????? |????? |???? 3 |??? 75 |???? 4? (25)| 00:00:01 |
|?? 3 |??? TABLE ACCESS FULL | DEPT |???? 3 |??? 75 |???? 3?? (0)| 00:00:01 |
|?? 4 |?? SORT UNIQUE??????? |????? |???? 3 |?? 114 |???? 8? (25)| 00:00:01 |
|*? 5 |??? HASH JOIN???????? |????? |???? 3 |?? 114 |???? 7? (15)| 00:00:01 |
|?? 6 |???? TABLE ACCESS FULL| EMP? |???? 3 |??? 39 |???? 3?? (0)| 00:00:01 |
|?? 7 |???? TABLE ACCESS FULL| DEPT |???? 3 |??? 75 |???? 3?? (0)| 00:00:01 |
-----------------------------------------

使用anti join
SQL> select d.deptno,d.dname from dept d
? 2? where d.deptno not in (select deptno from emp);
?
DEPTNO DNAME
------ --------------------
???? 3 Marketplace
---------------------------------------
| Id? | Operation????????? | Name | Rows? | Bytes | Cost (%CPU)| Time???? |
---------------------------------------
|?? 0 | SELECT STATEMENT?? |????? |???? 3 |?? 114 |???? 7? (15)| 00:00:01 |
|*? 1 |? HASH JOIN ANTI NA |????? |???? 3 |?? 114 |???? 7? (15)| 00:00:01 |
|?? 2 |?? TABLE ACCESS FULL| DEPT |???? 3 |??? 75 |???? 3?? (0)| 00:00:01 |
|?? 3 |?? TABLE ACCESS FULL| EMP? |???? 3 |??? 39 |???? 3?? (0)| 00:00:01 |
---------------------------------------

注:倘若subquery返回一条null值,则整个not in都变为false,即不返回任何值
If the subquery of a NOT IN clause returns at least one row with a null value, the entire NOT IN clause evaluates to false for all rows
SQL> select d.deptno,d.dname from dept d where d.deptno not in (select deptno from emp);

??? DEPTNO DNAME
---------- --------------------
???????? 3 Marketplace

SQL> insert into emp values(null,'NULL');

1 row created.

SQL> commit;

Commit complete.

SQL> select d.deptno,d.dname from dept d where d.deptno not in (select deptno from emp);

no rows selected

?

而换成not exists时候,hash join anti na变成了hash join anti

select d.deptno,d.dname from dept d where not exists (select deptno from emp where emp.deptno = d.deptno);
---------------------------------------
| Id? | Operation????????? | Name | Rows? | Bytes | Cost (%CPU)| Time???? |
---------------------------------------
|?? 0 | SELECT STATEMENT?? |????? |???? 3 |?? 114 |???? 7? (15)| 00:00:01 |
|*? 1 |? HASH JOIN ANTI??? |????? |???? 3 |?? 114 |???? 7? (15)| 00:00:01 |
|?? 2 |?? TABLE ACCESS FULL| DEPT |???? 3 |??? 75 |???? 3?? (0)| 00:00:01 |
|?? 3 |?? TABLE ACCESS FULL| EMP? |???? 3 |??? 39 |???? 3?? (0)| 00:00:01 |
---------------------------------------
Not exists与not in的区别在于not exists不受null值影响


注意
Oracle在解析sql时候会尽可能的把子查询转换为表连接
Oracle在以下情况不会使用semi-join
1、??? 使用distinct或者union
2、??? Exists/in子查询中使用了or

Hash_sj/merge_sj/nl_sj都是关于semi-join的一些hint

读书人网 >其他数据库

热点推荐