遇到这样一个问题,客户的某系统在进入2014年后,某sql执行就非常慢,看到这个sql的执行计划发生了改变,原来很好的走hash连接的执行计划变成了走索引的nest loop的执行计划。在收集统计信息后,恢复正常。
这个故障的测试案例,我们可以这样模拟:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
--创建表和索引: create table t1 (a number,b varchar2(20)); create index idx_a on t1(a); --加载数据: INSERT INTO T1 SELECT TO_CHAR(TO_DATE(TRUNC(DBMS_RANDOM.VALUE(TO_NUMBER(TO_CHAR(TO_DATE('20130101', 'yyyymmdd'), 'j')), TO_NUMBER(TO_CHAR(TO_DATE('20131231', 'yyyymmdd'), 'j')))), 'j'), 'yyyymmdd') AS RANDOM_DATE, DBMS_RANDOM.STRING('u', TRUNC(DBMS_RANDOM.VALUE(1, 20))) AS RANDOM_CHAR FROM DUAL CONNECT BY ROWNUM <= 100000; |
我们看一下数据大致的情况:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
SQL> select a,count(*) from t1 group by a order by 1; A COUNT(*) ---------- ---------- 20130101 304 20130102 242 20130103 314 20130104 288 20130105 279 20130106 265 20130107 281 20130108 240 …… 20130403 277 …… 20131224 267 20131225 280 20131226 253 20131227 268 20131228 263 20131229 298 20131230 273 364 rows selected. SQL> |
注意这里建立的a字段是number,然后我放入的数字是类似年月的形式,但是是number型的表示数字表示日期。另外,从count也可以看到,此次insert的数据基本均匀,每个值基本在250个左右。
我们收集下统计信息,看看统计信息中的信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
--收集统计信息: exec dbms_stats.gather_table_stats(user,'T1',method_opt=>'FOR ALL COLUMNS SIZE 1',cascade=>TRUE); --看起统计信息: SQL> SELECT 2 a.column_name, 3 a.low_value,display_raw(a.low_value,b.data_type) AS dislay_low_val, 4 a.high_value,display_raw(a.high_value,b.data_type) AS display_high_val, 5 b.data_type, 6 a.density, 7 a.histogram, 8 a.num_buckets 9 FROM 10 user_tab_col_statistics a, user_tab_cols b 11 WHERE 12 a.table_name='T1' AND 13 a.table_name=b.table_name AND 14 a.column_name=b.column_name; COLUMN_NAME LOW_VALUE DISLAY_LOW_VAL HIGH_VALUE DISPLAY_HIGH_VAL DATA_TYPE DENSITY HISTOGRAM NUM_BUCKETS ------------------------------ -------------------- ------------------------------ ----------------------------------- ------------------------------ ---------------------------------------------------------------------------------------------------------- ---------- --------------- ----------- A C4150E0202 20130101 C4150E0D1F 20131230 NUMBER .002747253 NONE 1 B 41 A 5A5A5A5955574350474E535651 ZZZYUWCPGNSVQ VARCHAR2 .00001122 NONE 1 SQL> |
这里用了个函数display_raw。不是oracle自带的,其定义如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
--创建function: create or replace function display_raw (rawval raw, type varchar2) return varchar2 is cn number; cv varchar2(32); cd date; cnv nvarchar2(32); cr rowid; cc char(32); cbf binary_float; cbd binary_double; begin if (type = 'VARCHAR2') then dbms_stats.convert_raw_value(rawval, cv); return to_char(cv); elsif (type = 'DATE') then dbms_stats.convert_raw_value(rawval, cd); return to_char(cd); elsif (type = 'NUMBER') then dbms_stats.convert_raw_value(rawval, cn); return to_char(cn); elsif (type = 'BINARY_FLOAT') then dbms_stats.convert_raw_value(rawval, cbf); return to_char(cbf); elsif (type = 'BINARY_DOUBLE') then dbms_stats.convert_raw_value(rawval, cbd); return to_char(cbd); elsif (type = 'NVARCHAR2') then dbms_stats.convert_raw_value(rawval, cnv); return to_char(cnv); elsif (type = 'ROWID') then dbms_stats.convert_raw_value(rawval, cr); return to_char(cr); elsif (type = 'CHAR') then dbms_stats.convert_raw_value(rawval, cc); return to_char(cc); else return 'UNKNOWN DATATYPE'; end if; end; / --创建公共同义词,授权给其他用户使用: create public synonym display_raw for display_raw; grant execute on display_raw to public; |
我们可以看到在统计信息中A列的最小值(DISLAY_LOW_VAL)是20130101,最大值(DISPLAY_HIGH_VAL)是20131230。
我们来模拟下故障场景,先用a=20130403查询,注,这是在最大值和最小值范围内的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
SQL> explain plan for 2 select * from t1 where a=20130403; Explained. SQL> select * from table(dbms_xplan.display(null,null,'advanced')); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------ Plan hash value: 3617692013 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 275 | 4675 | 103 (1)| 00:00:02 | |* 1 | TABLE ACCESS FULL| T1 | 275 | 4675 | 103 (1)| 00:00:02 | -------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - SEL$1 / T1@SEL$1 Outline Data ------------- /*+ BEGIN_OUTLINE_DATA FULL(@"SEL$1" "T1"@"SEL$1") OUTLINE_LEAF(@"SEL$1") ALL_ROWS DB_VERSION('11.2.0.4') OPTIMIZER_FEATURES_ENABLE('11.2.0.4') IGNORE_OPTIM_EMBEDDED_HINTS END_OUTLINE_DATA */ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("A"=20130403) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "A"[NUMBER,22], "T1"."B"[VARCHAR2,20] 37 rows selected. SQL> |
可以看到估算的值rows是275,是在我们的实际值277行附近,是符合实际情况的。
然后我们继续insert数据,insert跨年的数据,但不收集统计信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
SQL> SQL> INSERT INTO t1 2 SELECT to_char(to_date(TRUNC(DBMS_RANDOM.VALUE(to_number(to_char(to_date('20140101', 3 'yyyymmdd'), 4 'J')), 5 to_number(to_char(to_date('20140111', 6 'yyyymmdd'), 7 'J')))), 8 'J'), 9 'yyyymmdd') AS random_date, 10 dbms_random.string('U', TRUNC(DBMS_RANDOM.VALUE(1, 20))) AS random_char 11 FROM dual 12 CONNECT BY ROWNUM <= 2900; 2900 rows created. SQL> commit; Commit complete. SQL> SQL> select a,count(*) from t1 group by a order by 1; A COUNT(*) ---------- ---------- 20130101 304 20130102 242 20130103 314 20130104 288 20130105 279 …… 20140101 260 20140102 292 20140103 302 20140104 289 20140105 297 20140106 287 20140107 307 20140108 289 20140109 289 20140110 288 374 rows selected. SQL> |
我们查询一个跨年的数据,就出现错误的估计值,rows为1行,这是错误的估算值,且由于认为才1行,所以选择了走索引:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
SQL> explain plan for 2 select * from t1 where a=20140101; Explained. SQL> select * from table(dbms_xplan.display(null,null,'advanced')); PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------- Plan hash value: 2435036509 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 17 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 17 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX_A | 1 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - SEL$1 / T1@SEL$1 2 - SEL$1 / T1@SEL$1 Outline Data ------------- /*+ BEGIN_OUTLINE_DATA INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."A")) OUTLINE_LEAF(@"SEL$1") ALL_ROWS DB_VERSION('11.2.0.4') OPTIMIZER_FEATURES_ENABLE('11.2.0.4') IGNORE_OPTIM_EMBEDDED_HINTS END_OUTLINE_DATA */ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("A"=20140101) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "A"[NUMBER,22], "T1"."B"[VARCHAR2,20] 2 - "T1".ROWID[ROWID,10], "A"[NUMBER,22] 40 rows selected. SQL> --查询上一天的,rows的估算值正确: SQL> explain plan for 2 select * from t1 where a=20131230; Explained. SQL> select * from table(dbms_xplan.display(null,null,'advanced')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------- Plan hash value: 3617692013 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 275 | 4675 | 103 (1)| 00:00:02 | |* 1 | TABLE ACCESS FULL| T1 | 275 | 4675 | 103 (1)| 00:00:02 | -------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - SEL$1 / T1@SEL$1 Outline Data ------------- /*+ BEGIN_OUTLINE_DATA FULL(@"SEL$1" "T1"@"SEL$1") OUTLINE_LEAF(@"SEL$1") ALL_ROWS DB_VERSION('11.2.0.4') OPTIMIZER_FEATURES_ENABLE('11.2.0.4') IGNORE_OPTIM_EMBEDDED_HINTS END_OUTLINE_DATA */ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("A"=20131230) Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - "A"[NUMBER,22], "T1"."B"[VARCHAR2,20] 37 rows selected. SQL> |
我们如果测试不同的a的值,可以大致得到rows的估计值为如下曲线:
我们可以看到,在超过边界值20131230之后,rows的值就缓慢下降直至变成1。这个下降是个缓慢的过程,且和10g的算法不同,当超过边界时,10g的算法是density=1/(2*NUM_ROWS)(可见Jonathan Lewis’s 的文章),11g的下降过程平缓的多。
解决方法:收集统计信息,或者将a字段改成date型,因为20131230和20140101虽然在日期上只是相隔了一天,但是在数值上却隔了8871,在算out of range的值的时候,很容易就到了距离边界值很远的地方。