今天,某省的同事来告诉我,表重组后,他用于统计的一个sql脚本运行变慢了,之前只需要17、8分钟能出来的结果,现在1小时40分钟左右才能出来结果。
我们一起来看看脚本中的一个sql:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
SQL> explain plan for 2 select a.startdate,b.subsid from tab_1 a,tab_2 b where 3 a.servid='025001003681' and a.status!='C' and a.mid=b.mid; Explained. Elapsed: 00:00:00.03 SQL> select * from table(dbms_xplan.display) SQL> / PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 369 | 23985 | 980 | | | | 1 | NESTED LOOPS | | 369 | 23985 | 980 | | | | 2 | PARTITION HASH ALL | | | | | 1 | 4 | |* 3 | TABLE ACCESS BY LOCAL INDEX ROWID| tab_1 | 369 | 14022 | 242 | 1 | 4 | |* 4 | INDEX RANGE SCAN | IDX_tab_1_SERVID | 492 | | 10 | 1 | 4 | | 5 | PARTITION HASH ITERATOR | | | | | KEY | KEY | | 6 | TABLE ACCESS BY LOCAL INDEX ROWID| tab_2 | 1 | 27 | 2 | KEY | KEY | |* 7 | INDEX UNIQUE SCAN | PK_tab_2_MID | 1 | | 1 | KEY | KEY | ---------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("A"."STATUS"<>'C') 4 - access("A"."SERVID"='025001003681') 7 - access("A"."MID"="B"."MID") Note: cpu costing is off 22 rows selected. Elapsed: 00:00:00.56 |
我们看到这个sql是通过索引后在走nested loops,我们做一个sqltrace来观察一下它的执行时间和consistent gets:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
SQL> set timing on SQL> set autotrace traceonly SQL> select a.startdate,b.subsid from tab_1 a,tab_2 b where 2 a.servid='025001003681' and a.status!='C' and a.mid=b.mid; 627965 rows selected. Elapsed: 00:36:13.04 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=980 Card=369 Bytes=2 3985) 1 0 NESTED LOOPS (Cost=980 Card=369 Bytes=23985) 2 1 PARTITION HASH (ALL) 3 2 TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'tab_1' (Cost=242 Card=369 Bytes=14022) 4 3 INDEX (RANGE SCAN) OF 'IDX_tab_1_SERVID' (NON -UNIQUE) (Cost=10 Card=492) 5 1 PARTITION HASH (ITERATOR) 6 5 TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'tab_2' (C ost=2 Card=1 Bytes=27) 7 6 INDEX (UNIQUE SCAN) OF 'PK_tab_2_MID' (UNIQUE) (Cost=1 Card=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 2725638 consistent gets 406711 physical reads 1120 redo size 20890925 bytes sent via SQL*Net to client 461160 bytes received via SQL*Net from client 41866 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 627965 rows processed SQL> |
我们看到有270多万个consistent gets,并且这个sql也确实要36分钟才能运行出来,那为什么同事说之前10多分钟就能跑出来?我们来试试hash连接:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
SQL> select /*+ use_hash(a,b) */a.startdate,b.subsid from tab_1 a,tab_2 b where a.servid='025001003681' and a.status!='C' and a.mid=b.mid; 2 628514 rows selected. Elapsed: 00:04:21.90 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=20594 Card=369 Bytes =23985) 1 0 HASH JOIN (Cost=20594 Card=369 Bytes=23985) 2 1 PARTITION HASH (ALL) 3 2 TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'tab_1' (Cost=242 Card=369 Bytes=14022) 4 3 INDEX (RANGE SCAN) OF 'IDX_tab_1_SERVID' (NON -UNIQUE) (Cost=10 Card=492) 5 1 PARTITION HASH (ALL) 6 5 TABLE ACCESS (FULL) OF 'tab_2' (Cost=20251 Card=7 199100 Bytes=194375700) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 400061 consistent gets 308119 physical reads 560 redo size 20909078 bytes sent via SQL*Net to client 461556 bytes received via SQL*Net from client 41902 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 628514 rows processed SQL> |
呀,走hash连接的话,只要4分多钟就能跑出来,且consistent gets也仅仅只有40万,物理读也小了不少。看来oracle确实没有选择合适的执行计划了。
可oracle为什么没有选择合适的执行计划?难道是统计信息的问题?
1 2 3 4 5 6 |
SQL> select TABLE_NAME,LAST_ANALYZED from user_tables where table_name in ('tab_2','tab_1'); TABLE_NAME LAST_ANAL ------------------------------ --------- tab_2 29-JAN-08 tab_1 11-JUN-08 |
难道是tab_2的统计信息太久了导致执行计划出错?尝试备份统计信息并重新分析tab_2之后,再次查看执行计划 :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
SQL> exec dbms_stats.gather_table_stats(OWNNAME=>'MISC',tabname=>'tab_2',cascade=>TRUE,estimate_percent=> 50); PL/SQL procedure successfully completed. Elapsed: 00:18:37.95 SQL> SQL> SQL> SQL> explain plan for 2 select a.startdate,b.subsid from tab_1 a,tab_2 b where 3 a.servid='025001003681' and a.status!='C' and a.mid=b.mid; Explained. Elapsed: 00:00:00.00 SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 369 | 23985 | 980 | | | | 1 | NESTED LOOPS | | 369 | 23985 | 980 | | | | 2 | PARTITION HASH ALL | | | | | 1 | 4 | |* 3 | TABLE ACCESS BY LOCAL INDEX ROWID| tab_1 | 369 | 14022 | 242 | 1 | 4 | |* 4 | INDEX RANGE SCAN | IDX_tab_1_SERVID | 492 | | 10 | 1 | 4 | | 5 | PARTITION HASH ITERATOR | | | | | KEY | KEY | | 6 | TABLE ACCESS BY LOCAL INDEX ROWID| tab_2 | 1 | 27 | 2 | KEY | KEY | |* 7 | INDEX UNIQUE SCAN | PK_tab_2_MID | 1 | | 1 | KEY | KEY | ---------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("A"."STATUS"<>'C') 4 - access("A"."SERVID"='025001003681') 7 - access("A"."MID"="B"."MID") Note: cpu costing is off 22 rows selected. Elapsed: 00:00:00.02 |
天,竟然还是走nl!!
难道是因为servid中的倾斜度的问题?再次查了一下tab_1表中servid=’025001003681′
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
SQL> select count(*) from tab_1 a where a.servid='025001003681' 2 / COUNT(*) ---------- 1564299 #### 而其他的servid的值: SQL> select servid from tab_1 where rownum<=10; SERVID ------------ 010001000983 010001001058 010001001327 010001001327 010001001327 010001001327 010001001327 010001001327 010001001327 010001001327 10 rows selected. Elapsed: 00:00:00.02 SQL> select count(*) from tab_1 a where a.servid='010001001327'; COUNT(*) ---------- 33 Elapsed: 00:00:00.06 SQL> select count(*) from tab_1 a where a.servid='010001000983'; COUNT(*) ---------- 2 Elapsed: 00:00:00.00 SQL> select count(*) from tab_1 a where a.servid='010001000983'; COUNT(*) ---------- 2 |
可以看到其他值的返回数据量确实很小,在返回数据量小的情况下,我们走索引确实是一种高效的查询方式,但是当返回数据很多时,由于nl的比较要花大量的时间,因此时间也就基本消耗在这里了。
对于某一列中的不同数值有不同的倾斜度(skew),要选择不同的执行计划,我们需要在收集直方图:
1 2 3 4 5 |
SQL> exec dbms_stats.gather_table_stats(OWNNAME=>'MISC',tabname=>'tab_1',degree=>4,cascade=>TRUE,estimate_percent=> 50,method_opt => 'FOR COLUMNS servid SIZE auto'); PL/SQL procedure successfully completed. Elapsed: 00:12:48.22 |
此时,我们再次检查其执行计划的时候,看到已经走了hash连接了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
SQL> explain plan for 2 select a.startdate,b.subsid from tab_1 a,tab_2 b where 3 a.servid='025001003681' and a.status!='C' and a.mid=b.mid; Explained. Elapsed: 00:00:00.01 SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost | Pstart| Pstop | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1172K| 72M| | 39071 | | | |* 1 | HASH JOIN | | 1172K| 72M| 55M| 39071 | | | | 2 | PARTITION HASH ALL | | | | | | 1 | 4 | |* 3 | TABLE ACCESS FULL | tab_1 | 1172K| 42M| | 13663 | 1 | 4 | | 4 | PARTITION HASH ALL | | | | | | 1 | 4 | | 5 | TABLE ACCESS FULL | tab_2 | 7199K| 185M| | 20251 | 1 | 4 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("A"."MID"="B"."MID") 3 - filter("A"."SERVID"='025001003681' AND "A"."STATUS"<>'C') Note: cpu costing is off 19 rows selected. Elapsed: 00:00:00.02 SQL> SQL> |
这个时候,我们不加hints,oracle已经正确的选择了执行计划。至此,不走hash join是因为索引列的倾斜度的问题,收集直方图后,问题搞定。
4条评论
写的很好!可以让我学很多的东西,我先去查查什么是索引列的倾斜度去!
写的非常好!
不错,学习了:)
Study it! Thank you!