在12c之前的行为,LGWR主线程负责redo strand的读取,而由spawn出来的thread来模拟异步IO进行redo的写入,然后由main thread通知FG进程而结束log file sync的等待。(可以看到第0个lwp的CPU占据比其他几个lwp稍高。)
12c中有了scalable lgwr的功能,LGWR作为主进程做协调工作,具体的事情有slave进程LGnn来做。LGWR负责保证redo是按照顺序写入的,而slave LGnn根据LGWR的指示来进行redo strand的读取和redo的磁盘写入,并且由LGnn来直接通知FG进程写入完成而结束log file sync的等待。
1. lgwr的子进程lgnn,适用在多CPU的系统中,Oracle由参数_use_single_log_writer控制,(默认值是adaptive,另外还有true和false),当设置为默认值adaptive,会根据系统负载,自动的调节是single log writer还是scalable log writer。调节的时候,在lgwr的trace文件中可以看到:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
*** 2016-03-10 18:35:27.186 *** SESSION ID:(6.30501) 2016-03-10 18:35:27.186 *** CLIENT ID:() 2016-03-10 18:35:27.186 *** SERVICE NAME:() 2016-03-10 18:35:27.186 *** MODULE NAME:() 2016-03-10 18:35:27.186 *** CLIENT DRIVER:() 2016-03-10 18:35:27.186 *** ACTION NAME:() 2016-03-10 18:35:27.186 *** CONTAINER ID:(1) 2016-03-10 18:35:27.186 Created 2 redo writer workers (2 groups of 1 each) *** 2016-03-10 18:36:01.973 kcrfw_slave_adaptive_updatemode: scalable->single group0=612 all=711 rw=21594 single=17982 scalable_nopipe=43188 scalable_pipe=23753 scalable=40044 *** 2016-03-10 18:36:01.974 Adaptive scalable LGWR disabling workers *** 2016-03-10 18:37:03.821 kcrfw_slave_adaptive_updatemode: single->scalable redorate=16125727 switch=6310908 *** 2016-03-10 18:37:03.821 Adaptive scalable LGWR enabling workers *** 2016-03-10 18:39:04.897 kcrfw_slave_adaptive_updatemode: scalable->single group0=1629 all=1995 rw=23472 single=33246 scalable_nopipe=46944 scalable_pipe=25819 scalable=42197 *** 2016-03-10 18:39:04.898 Adaptive scalable LGWR disabling workers *** 2016-03-10 18:46:17.654 Warning: log write elapsed time 513ms, size 7283KB *** 2016-03-10 18:46:49.177 kcrfw_slave_adaptive_updatemode: single->scalable redorate=72735660 switch=35732461 *** 2016-03-10 18:46:49.178 Adaptive scalable LGWR enabling workers |
2. LGnn的最多个数,有_max_outstanding_log_writes决定。
1 2 3 4 5 6 7 |
SQL> select KSPPINM,KSPPDESC,KSPPSTVL from x$ksppi a,x$ksppsv b where a.indx=b.indx and a.KSPPINM like '%outstanding_log%'; KSPPINM KSPPDESC KSPPSTVL ---------------------------------------- -------------------------------------------------------------------------------- ----------- _max_outstanding_log_writes Maximum number of outstanding redo log writes 2 SQL> |
3. Dataguard的SYNC模式不能用到multiple LGWR属性:
LGnn (Log Writer Worker) On multiprocessor systems, LGWR creates worker processes to improve the performance of writing to the redo log.
LGWR workers are not used when there is a SYNC standby destination. Possible processes include LG00-LG99.
参考:New Background Processes In 12c (Doc ID 1625912.1)
4. 存在 scheduling delay for the slaves。在single instance中,high priority和highest priority都没有放LGWR;在RAC中,high priority中有放LGWR,但是没有LG*,可能会导致LGWR虽然有较高优先级,但是子进程没有较高优先级。所以,可能需要设置和lgwr一样priority的lgwr slave进程 。
参考Bug 20055279 : RAC PERF: LGWR CHOOSES TO USE SCALABLE LGWR BUT SINGLE LGWR PERFORMS BETTER
5. multiple LGWR适合多CPU的系统,特别是CPU高于64个以上的系统。我个人猜测是一方面在scalable lgwr情况下,lgwr起到协调者的作用,协调的时候需要消耗CPU进行计算。另外一方面,多个子进程之间也需要CPU资源进行同步信息,以保证其写的顺序。
LGWR workers are used when using a single LGWR would perform better. This applies to small systems (<= 64 cpus).
参考Bug 20055279 : RAC PERF: LGWR CHOOSES TO USE SCALABLE LGWR BUT SINGLE LGWR PERFORMS BETTER
6. LGnn进程之间会进行同步,我想这也是为什么能保证写redo log的时候,保证其一致性的原因。有的时候,LGnn之间不必要的同步,会导致性能变慢。
This means there will be unneeded LGWR slaves in each group and we will incur intra group synchronization costs for these.
参考 Bug 18683889 : SIGNIFICANT WAIT ON ”LGWR INTRA GROUP SYNC” WITH SCALABLE LGWR
7. 在AIX和HPIA环境中,启用scalable lgwr还可能导致数据库起不来,需要提前打好patch 21915719(注:打完patch 21915719之后,_use_single_log_writer=true就自动设置好了。)
参考 ALERT: Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium – ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2] (Doc ID 1957710.1)
注意,这个文档已经变成ALERT类型,说明已经发生过比较多的问题。一般ALERT文档都是值得注意的预警性文档。