HP的进程优先级和cursor pin S

进程的优先级往往会不可思议的影响数据库的性能，在很多时候，可能突发性的（持续时间只有几秒钟到几分钟）、很难确认原因的latch，都和进程优先级有关。下面就是这样一个例子，在没有正确设置进程优先级的情况下，客户的系统常常时不时的出现高CPU的情况，持续时间几分钟，且伴随着cursor pin S的等待事件。

客户那里的系统是HP IA的系统，之前经常在周一或周二的下午出现瞬间CPU冲高的情况，但是除了发现数据库中大量cursor pin S等待，也没有其他发现。在某天下午，CPU再次突然冲高了，在这3分钟内，CPU几乎达到100%全满，在数据库中也确实看到大量的cursor pin S等待。3分钟后，立马回落成正常。

一、初步分析：
1. 检查当时CPU 100%的1分钟左右的ash：确实当时最多的等待是cursor pin s：

EVENT                                                              COUNT(*)
---------------------------------------------------------------- ----------
LNS wait on SENDREQ                                                       1
latch: library cache                                                      1
null event                                                                1
SQL*Net more data to client                                               3
read by other session                                                     3
latch: cache buffers chains                                               5
enq: TX - index contention                                                7
log file parallel write                                                   8
log file sequential read                                                  8
latch: checkpoint queue latch                                            10
SQL*Net more data from client                                            12
db file parallel write                                                   52
log file sync                                                           395
db file sequential read                                                 760
cursor: pin S                                                           877
                                                                        948

16 rows selected

SQL>

EVENT COUNT(*)

---------------------------------------------------------------- ----------

LNS wait on SENDREQ 1

latch: library cache 1

null event 1

SQL*Net more data to client 3

read by other session 3

latch: cache buffers chains 5

enq: TX - index contention 7

log file parallel write 8

log file sequential read 8

latch: checkpoint queue latch 10

SQL*Net more data from client 12

db file parallel write 52

log file sync 395

db file sequential read 760

cursor: pin S 877

948

16 rows selected

SQL>

2. 该数据库是的版本是10.2.0.5，从lsinventory检查，cursor pin S的在10.2.0.5中相关的bug 6904068和9499302，在该数据库中已经fixed，因此应该不是由bug引起。

3.通过

select count(distinct(session)),sql_id from gv$active_session_history a
where sample_time between
to_date('2013-07-03 17:18:00','yyyy-mm-dd hh24:mi:ss') and 
to_date('2013-07-03 17:18:59','yyyy-mm-dd hh24:mi:ss')
and event='cursor: pin S';

select count(distinct(session)),sql_id from gv$active_session_history a

where sample_time between

to_date('2013-07-03 17:18:00','yyyy-mm-dd hh24:mi:ss') and

to_date('2013-07-03 17:18:59','yyyy-mm-dd hh24:mi:ss')

and event='cursor: pin S';

看到，一分钟内引发cursor pin s 的sql语句只有一个语句。有281个session在同时跑这个sql。

4. 通过

select event,p1 from gv$active_session_history a
where sample_time between
to_date('2013-07-03 17:18:00','yyyy-mm-dd hh24:mi:ss') and 
to_date('2013-07-03 17:18:59','yyyy-mm-dd hh24:mi:ss')
and event='cursor: pin S'
group by event,p1;

select event,p1 from gv$active_session_history a

where sample_time between

to_date('2013-07-03 17:18:00','yyyy-mm-dd hh24:mi:ss') and

to_date('2013-07-03 17:18:59','yyyy-mm-dd hh24:mi:ss')

and event='cursor: pin S'

group by event,p1;

查到 cursor pin S的session的idn都是同一个，为3280391651。

5. 检查那个时间段的mutex的情况，发现block session为846，1264，1546等等的几个session:

select * from v$mutex_sleep_history a where a.mutex_identified='3280391651' and a.sleep_time between
to_date('2013-07-03 17:18:00','yyyy-mm-dd hh24:mi:ss') and 
to_date('2013-07-03 17:18:59','yyyy-mm-dd hh24:mi:ss')

select * from v$mutex_sleep_history a where a.mutex_identified='3280391651' and a.sleep_time between

to_date('2013-07-03 17:18:00','yyyy-mm-dd hh24:mi:ss') and

to_date('2013-07-03 17:18:59','yyyy-mm-dd hh24:mi:ss')

将上述的blocking session带入到ash中，发现部分的等待均在IO上。

select blocking_session ,event,a.*
from gv$session_session_history a
where sample_time between
to_date('2013-07-03 17:18:00','yyyy-mm-dd hh24:mi:ss') and 
to_date('2013-07-03 17:18:59','yyyy-mm-dd hh24:mi:ss')
and session_id in
(select blocking_session from v$mutex_sleep_history a where a.mutex_identified='3280391651' and a.sleep_time between
to_date('2013-07-03 17:18:00','yyyy-mm-dd hh24:mi:ss') and 
to_date('2013-07-03 17:18:59','yyyy-mm-dd hh24:mi:ss')
);

select blocking_session ,event,a.*

from gv$session_session_history a

where sample_time between

to_date('2013-07-03 17:18:00','yyyy-mm-dd hh24:mi:ss') and

to_date('2013-07-03 17:18:59','yyyy-mm-dd hh24:mi:ss')

and session_id in

(select blocking_session from v$mutex_sleep_history a where a.mutex_identified='3280391651' and a.sleep_time between

to_date('2013-07-03 17:18:00','yyyy-mm-dd hh24:mi:ss') and

to_date('2013-07-03 17:18:59','yyyy-mm-dd hh24:mi:ss')

);

检查其中的blocking的进程，sid是1868，发现是lgwr进程。

select blocking_session ,event,a.*
from gv$session_session_history a
where sample_time between
to_date('2013-07-03 17:18:00','yyyy-mm-dd hh24:mi:ss') and 
to_date('2013-07-03 17:18:59','yyyy-mm-dd hh24:mi:ss')
and session_id='1868'

select blocking_session ,event,a.*

from gv$session_session_history a

where sample_time between

to_date('2013-07-03 17:18:00','yyyy-mm-dd hh24:mi:ss') and

to_date('2013-07-03 17:18:59','yyyy-mm-dd hh24:mi:ss')

and session_id='1868'

因此，综上，初步结论应该是IO问题，IO性能变差，导致IO等待，如log file sync（blocking session也有很多处于db file sequence read的等待，也从侧面说明的当时的IO不好），那些因为IO原因变慢的session，堵塞后续的session，造成cursor pin s变高，进一步导致cpu使用率上升。

本来到这里，问题似乎已经明了，但是其实不然，随着进一步的挖掘，我们还找到了更加深入的原因。

二、再次深入分析：
从再次检查的情况看，问题应该和进程优先级设置有关。

1. 首先从整体上看，看到cursor: pin S的等待超过512纳秒的，存在不少，因此从整体上讲，系统确实有cursor: pin S的问题。

SQL> select* from v$event_histogram
  2  where event='cursor: pin S'
  3  /

    EVENT# EVENT                                                            WAIT_TIME_MILLI WAIT_COUNT
---------- ---------------------------------------------------------------- --------------- ----------
       629 cursor: pin S                                                                  1    5499194
       629 cursor: pin S                                                                  2       5032
       629 cursor: pin S                                                                  4       2401
       629 cursor: pin S                                                                  8        681
       629 cursor: pin S                                                                 16        419
       629 cursor: pin S                                                                 32        269
       629 cursor: pin S                                                                 64        274
       629 cursor: pin S                                                                128        239
       629 cursor: pin S                                                                256        159
       629 cursor: pin S                                                                512        198
       629 cursor: pin S                                                               1024         61
       629 cursor: pin S                                                               2048         56
       629 cursor: pin S                                                               4096         11
       629 cursor: pin S                                                               8192         17

14 rows selected

SQL>

SQL> select* from v$event_histogram

2 where event='cursor: pin S'

3 /

EVENT# EVENT WAIT_TIME_MILLI WAIT_COUNT

---------- ---------------------------------------------------------------- --------------- ----------

629 cursor: pin S 1 5499194

629 cursor: pin S 2 5032

629 cursor: pin S 4 2401

629 cursor: pin S 8 681

629 cursor: pin S 16 419

629 cursor: pin S 32 269

629 cursor: pin S 64 274

629 cursor: pin S 128 239

629 cursor: pin S 256 159

629 cursor: pin S 512 198

629 cursor: pin S 1024 61

629 cursor: pin S 2048 56

629 cursor: pin S 4096 11

629 cursor: pin S 8192 17

14 rows selected

SQL>

2. 其次，排除了在初步分析中因为log file syc的怀疑点。Log file syc应该是受害者，取2次awr，第一次为昨天17:15:26~13 17:30:29，第二次为昨天16:45:19~17:00:22：

log file sync                       262,031     1         631       2       1.0
log file sync                       264,795   N/A         273       1       1.0

log file parallel write             258,195   N/A         132       1       1.0
log file parallel write             266,453   N/A         134       1       1.0

log file sync 262,031 1 631 2 1.0

log file sync 264,795 N/A 273 1 1.0

log file parallel write 258,195 N/A 132 1 1.0

log file parallel write 266,453 N/A 134 1 1.0

发现log file parallel write的平均等待时间一样，说明写log file的IO效率一样，log file syc的平均等待时间变长是因为cpu的紧张，导致io操作完之后，无法及时将io完成的消息放入cpu runqueue。

3. 从awr报告看，那个sql的执行次数基本类似（因为cursor pin S很大程度上也是因为并发量太大），因此应该不是sql的问题

Executions   Rows Processed  Rows per Exec   Exec (s)   Exec (s)     SQL Id    
------------ --------------- -------------- ---------- ----------- -------------
     198,585         189,366            1.0       0.00        0.01 b55xptz1sdmg3             
 Executions   Rows Processed  Rows per Exec   Exec (s)   Exec (s)     SQL Id    
------------ --------------- -------------- ---------- ----------- -------------
     196,573         188,267            1.0       0.00        0.00 b55xptz1sdmg3

Executions Rows Processed Rows per Exec Exec (s) Exec (s) SQL Id

------------ --------------- -------------- ---------- ----------- -------------

198,585 189,366 1.0 0.00 0.01 b55xptz1sdmg3

Executions Rows Processed Rows per Exec Exec (s) Exec (s) SQL Id

------------ --------------- -------------- ---------- ----------- -------------

196,573 188,267 1.0 0.00 0.00 b55xptz1sdmg3

4. 从awr报告看，下午17:15:26~13 17:30:29存在logon storm，1050次的logons cumulative，比16:45:19~17:00:22高不少。并且也可以看其他时间段的awr，我抽查了其他3个时间段，logons cumulative也基本在300多。

logons cumulative                             1,050            1.2           0.0
logons cumulative                               301            0.3           0.0

1 2	logons cumulative 1,050 1.2 0.0 logons cumulative 301 0.3 0.0

logon storm是突然间的大量的连接，而连接需要attach到SGA，需要从share pool中allocate内存，需要初始化session结构，这些操作都是消耗CPU的。

5. 最后，应该是最根本的一个问题，就是没有在/etc/privgroup设置sched_noage，
我们说当latch getter在spin之后还是获取不到latch，会sleep，sleep的时候释放cpu。但是10.2采用mutex之后，mutex和latch的机制有所不同，mutex getter在spin之后如果还是获取不到mutex，它不会sleep，而是yield CPU。也就是说它会一直排在runqueue里直到轮到它获得cpu。

所以如果操作系统的进程存在优先级差别的时候，操作系统的scheduler会将cpu占用率高的进程降低其优先级。当mutex holder被降低cpu优先级（这个操作叫priority inversion），那么其他的进程虽然有比较高的优先级，但是它们没有获得相关的资源（如mutex、lock），所以就变成了mutex getter优先级高但是没有获得相关资源，却一直yield cpu，让cpu冲高。

所以我们让要所有的进程处于同一优先级，不让操作系统自动降低进程的优先级。这个操作在HPUX上叫SCHED_NOAGE scheduling Policy。

类似的问题，在国外的一个电信级的数据库中也碰到了。见此。

改sched_noage具体的操作见文档，我在很早很早的时候，也写过HP-UX上数据库调优（OLTP），上面也讲了这个问题。

注意该参数的修改要在oracle初始化参数、和主机层面/etc/privgroup同时修改。一般情况下/etc/privgroup中有3个参数，2个和进程优先级有关(RTPRIO RTSCHED)，一个是和异步IO有关（MLOCK）。

另外需要注意的是，进程优先级需要重启数据库才能生效。之前的进程，在数据库没重启之前，还会保持着原来的优先级，如我们按照文档，在HP下，我们已经设置了oracle进程的优先级是178（在初始化参数中设置的），那么，此时用ps -elf看进程的优先级，大概是在第七个字段，PRI字段，显示的应该也是178，如果oracle进程的这个值还不是178，说明修改还没成功。

三、后续：
客户改了HP的进程优先级之后，持续观察两个星期，再没有出现瞬间高CPU的情况，cursor pin S的平均等待时间也从几百纳秒，下降到几十纳秒。

ps：客户还有另外一个HP的系统，在某天发生高active logfile的情况（40多个redo logfile，有30多个active），当时查到了引发问题的SQL语句，是一个很大的insert语句，也发现了问题发生时ckpt进程处于rdbms ipc message，等待事件正常，但是dbwr处于latch free等待，latch是93和98，大部分的是93。除了insert的事务量大，进程也是有点异常的。怀疑是dbwr写完后没有将信号量返回给ckpt，或者ckpt进程异常，无法接受信号量，导致ckpt无法complete，处于rdbms ipc message，即等待message的传递过来。switch logfile ckpt完成不了，因此日志处于高active状态。但是上述只是猜测了，究竟是哪个进程的问题，后续没有深究，但是我个人怀疑还是HP的进程优先级没有设置好，造成latch free。

HP的进程优先级和cursor pin S

相关文章

oracle无感知修改应用用户密码

阿里云dts过滤测试

图解TSM客户端和服务器端的注册关系

发表回复 取消回复

发表回复取消回复