对NESTED LOOP进行Hash Join优化
在网上看到一篇不错的讨论贴 记录下来
大概意思是这样:
? 现有两个表:A表(3-4w记录) B表(1-2w记录)
? 要求:找出A表name字段 前面部分与B表name相同的记录
?
一般我们会这样写
SQL> select a.*,b.object_name from TEST_OBJECT a, TEST_OBJ1 b 2 where substr(a.object_name,1,length(b.object_name))=b.object_name 3 and substr(a.object_name,1,4)=substr(b.object_name,1,4) and length(b.object_name)>3 5 union all 6 select a.*,b.object_name from TEST_OBJECT a, TEST_OBJ1 b 7 where substr(a.object_name,1,length(b.object_name))=b.object_name 8 and length(b.object_name)<4;已选择108612行。已用时间: 00: 00: 21.54执行计划----------------------Plan hash value: 4080738151-----------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |-----------------------------------------------| 0 | SELECT STATEMENT | | 3468 | 365K| 90997 (100)| 00:18:12 || 1 | UNION-ALL | | | | | ||* 2 | HASH JOIN | | 34 | 3672 | 176 (2)| 00:00:03 ||* 3 | TABLE ACCESS FULL| TEST_OBJ1 | 659 | 12521 | 35 (0)| 00:00:01 || 4 | TABLE ACCESS FULL| TEST_OBJECT | 52544 | 4566K| 139 (1)| 00:00:02 || 5 | NESTED LOOPS | | 3434 | 362K| 90821 (1)| 00:18:10 ||* 6 | TABLE ACCESS FULL| TEST_OBJ1 | 659 | 12521 | 35 (0)| 00:00:01 ||* 7 | TABLE ACCESS FULL| TEST_OBJECT | 5 | 445 | 138 (1)| 00:00:02 |-----------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - access(SUBSTR("A"."OBJECT_NAME",1,4)=SUBSTR("B"."OBJECT_NAME",1,4)) filter("B"."OBJECT_NAME"=SUBSTR("A"."OBJECT_NAME",1,LENGTH("B"."OBJE CT_NAME"))) 3 - filter(LENGTH("B"."OBJECT_NAME")>3) 6 - filter(LENGTH("B"."OBJECT_NAME")<4) 7 - filter("B"."OBJECT_NAME"=SUBSTR("A"."OBJECT_NAME",1,LENGTH("B"."OBJE CT_NAME")))统计信息---------------------- 1 recursive calls 0 db block gets 33946 consistent gets 0 physical reads 0 redo size 4491423 bytes sent via SQL*Net to client 80055 bytes received via SQL*Net from client 7242 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 108612 rows processed?已用时间:? 00: 00: 21.54
? 相差不是一个数据级的
?? 这是如何做到的?
执行计划虽然变复杂了,但是耗时大幅减少,consistent gets也大幅降低,作出巨大贡献的是Hash Join的引入。
????? 这里再展开一下,Hash Join的复杂度是O(A+B),简单来说就是对A、B表各扫描一次,如果A、B都比较大的情况来看,无疑Hash Join要比Nested Loop 优越很多。
????? 扯远了,回到anlinew的具体方法上吧,导致Hash Join出现的关键因素是一个谓词的引入:
?1and substr(a.object_name,1,4)=substr(b.object_name,1,4) ????? 套用我剽窃的那个词来说,这是“不相关值”的对比!
????? anlinew的核心思想是将数据“分片”,该例子中分片的依据是多少位首字母(这里是4),其中“大头”由Hash Join处理,而“小头”走Nested Loop,这种“抓大放小”的做法直接就从复杂度上进行了优化。
????? 这种“分片”的思想非常值得借鉴,将“相关值”判断转化成“不相关值”的判断也是处理问题的一种有效手法。