不知道大家注意到没有rac中的redo log?
(1)如果关闭这个实例,它的redo log还会被写吗?
答案是会,就算实例关闭了,甚至server关闭了,在共享存储上的redo还是会被其他实例所写(redo必须在共享存储上,以便于恢复时使用)。但是这个由谁写(master node还是所有的node)?在什么时候写?写些什么内容?这个目前还是没想明白,如果您知道,不妨在blog留言。
例子:我有一个3节点的实例,目前node3的server已经关闭。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora.devdb.db application 0/1 0/1 ONLINE ONLINE rac2 ora....b1.inst application 0/5 0/0 ONLINE ONLINE rac1 ora....b2.inst application 0/5 0/0 ONLINE ONLINE rac2 ora....b3.inst application 0/5 0/0 OFFLINE OFFLINE ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1 ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2 ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2 ora....SM3.asm application 0/5 0/0 OFFLINE OFFLINE ora....C3.lsnr application 0/5 0/0 OFFLINE OFFLINE ora.rac3.gsd application 0/5 0/0 OFFLINE OFFLINE ora.rac3.ons application 0/3 0/0 OFFLINE OFFLINE ora.rac3.vip application 0/0 0/0 ONLINE ONLINE rac1 |
switch logfile之前:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
SQL> select * from gv$log order by 1,2,3,4 2 / INST_ID GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ------------------- 1 1 1 30 52428800 2 YES INACTIVE 1715249 2011-12-07 15:14:32 1 2 1 31 52428800 2 NO CURRENT 1715265 2011-12-07 15:14:33 1 3 2 23 52428800 2 YES ACTIVE 1689319 2011-12-07 14:45:35 1 4 2 24 52428800 2 NO CURRENT 1715268 2011-12-07 15:14:36 1 5 3 15 52428800 2 NO CURRENT 1715265 2011-12-07 15:14:34 1 6 3 14 52428800 2 YES INACTIVE 1669323 2011-12-06 16:33:28 2 1 1 30 52428800 2 YES INACTIVE 1715249 2011-12-07 15:14:32 2 2 1 31 52428800 2 NO CURRENT 1715265 2011-12-07 15:14:33 2 3 2 23 52428800 2 YES ACTIVE 1689319 2011-12-07 14:45:35 2 4 2 24 52428800 2 NO CURRENT 1715268 2011-12-07 15:14:36 2 5 3 15 52428800 2 NO CURRENT 1715265 2011-12-07 15:14:34 2 6 3 14 52428800 2 YES INACTIVE 1669323 2011-12-06 16:33:28 12 rows selected. SQL> |
我们在node1上运行switch logfile之后的结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
SQL> alter system switch logfile; System altered. SQL> select * from gv$log order by 1,2,3,4 2 / INST_ID GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ------------------- 1 1 1 32 52428800 2 NO CURRENT 1715319 2011-12-07 15:15:07 1 2 1 31 52428800 2 YES ACTIVE 1715265 2011-12-07 15:14:33 1 3 2 23 52428800 2 YES ACTIVE 1689319 2011-12-07 14:45:35 1 4 2 24 52428800 2 NO CURRENT 1715268 2011-12-07 15:14:36 1 5 3 15 52428800 2 NO CURRENT 1715265 2011-12-07 15:14:34 1 6 3 14 52428800 2 YES INACTIVE 1669323 2011-12-06 16:33:28 2 1 1 32 52428800 2 NO CURRENT 1715319 2011-12-07 15:15:07 2 2 1 31 52428800 2 YES ACTIVE 1715265 2011-12-07 15:14:33 2 3 2 23 52428800 2 YES ACTIVE 1689319 2011-12-07 14:45:35 2 4 2 24 52428800 2 NO CURRENT 1715268 2011-12-07 15:14:36 2 5 3 15 52428800 2 NO CURRENT 1715265 2011-12-07 15:14:34 2 6 3 14 52428800 2 YES INACTIVE 1669323 2011-12-06 16:33:28 12 rows selected. SQL> |
我们看到thread1上变了(从30,31变成31,32),但是其他的thread2,和已经被关闭server3的thread3都没变。
难道不会变么?
我们再来几次switch logfile:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
SQL> SQL> alter system switch logfile; System altered. SQL> alter system checkpoint; System altered. SQL> select * from gv$log order by 1,2,3,4; INST_ID GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ------------------- 1 1 1 32 52428800 2 YES INACTIVE 1715319 2011-12-07 15:15:07 1 2 1 33 52428800 2 NO CURRENT 1715421 2011-12-07 15:16:10 1 3 2 23 52428800 2 YES INACTIVE 1689319 2011-12-07 14:45:35 1 4 2 24 52428800 2 NO CURRENT 1715268 2011-12-07 15:14:36 1 5 3 15 52428800 2 YES INACTIVE 1715265 2011-12-07 15:14:34 1 6 3 16 52428800 2 NO CURRENT 1715421 2011-12-07 15:16:10 2 1 1 32 52428800 2 YES INACTIVE 1715319 2011-12-07 15:15:07 2 2 1 33 52428800 2 NO CURRENT 1715421 2011-12-07 15:16:10 2 3 2 23 52428800 2 YES INACTIVE 1689319 2011-12-07 14:45:35 2 4 2 24 52428800 2 NO CURRENT 1715268 2011-12-07 15:14:36 2 5 3 15 52428800 2 YES INACTIVE 1715265 2011-12-07 15:14:34 2 6 3 16 52428800 2 NO CURRENT 1715421 2011-12-07 15:16:10 12 rows selected. SQL> |
看到thread2依然没变,但是thread1变成了32,33——这个在我们的预料之内,但是thread3变成了15,16。谁发起的?有意思吧,是谁动了我的redo。
(2)如果我新增加一个thread 4,会有人去写它吗?
答案也是会,但是需要enable thread。另外,每个instance中都有一个初始化参数thread=n,该参数规定了某个thread的redo只能有某个instance来写。因此,手工switch logfile的时候,只会影响到instance对应的thread,但是却无法影响到自动写的redo。
我们尝试增加redo thread:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
SQL> select inst_id,THREAD#,STATUS,GROUPS,INSTANCE from gv$thread; INST_ID THREAD# STATUS GROUPS INSTANCE ---------- ---------- ------ ---------- -------------------------------------------------------------------------------- 1 1 OPEN 2 devdb1 1 2 OPEN 2 devdb2 1 3 CLOSED 2 devdb3 2 1 OPEN 2 devdb1 2 2 OPEN 2 devdb2 2 3 CLOSED 2 devdb3 6 rows selected. SQL> --注意这里必须至少要加2个group,不然单个group无法切换。 SQL> alter database add logfile thread 4; Database altered. SQL> alter database add logfile thread 4; Database altered. SQL> select inst_id,THREAD#,STATUS,GROUPS,INSTANCE from gv$thread; INST_ID THREAD# STATUS GROUPS INSTANCE ---------- ---------- ------ ---------- -------------------------------------------------------------------------------- 1 1 OPEN 2 devdb1 1 2 OPEN 2 devdb2 1 3 CLOSED 2 devdb3 1 4 CLOSED 2 UNNAMED_INSTANCE_4 2 1 OPEN 2 devdb1 2 2 OPEN 2 devdb2 2 3 CLOSED 2 devdb3 2 4 CLOSED 2 UNNAMED_INSTANCE_4 8 rows selected. SQL> alter database enable public thread 4; Database altered. SQL> select inst_id,THREAD#,STATUS,GROUPS,INSTANCE from gv$thread; INST_ID THREAD# STATUS GROUPS INSTANCE ---------- ---------- ------ ---------- -------------------------------------------------------------------------------- 2 1 OPEN 2 devdb1 2 2 OPEN 2 devdb2 2 3 CLOSED 2 devdb3 2 4 CLOSED 2 UNNAMED_INSTANCE_4 1 1 OPEN 2 devdb1 1 2 OPEN 2 devdb2 1 3 CLOSED 2 devdb3 1 4 CLOSED 2 UNNAMED_INSTANCE_4 8 rows selected. SQL> create pfile='?/dbs/initdevdb1.ora.bak20111212' from spfile; File created. SQL> alter system set thread=4 scope=spfile sid='devdb1'; System altered. SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. SQL> startup ORACLE instance started. Total System Global Area 209715200 bytes Fixed Size 1218556 bytes Variable Size 121636868 bytes Database Buffers 83886080 bytes Redo Buffers 2973696 bytes Database mounted. Database opened. SQL> SQL> select inst_id,THREAD#,STATUS,GROUPS,INSTANCE from gv$thread; INST_ID THREAD# STATUS GROUPS INSTANCE ---------- ---------- ------ ---------- -------------------------------------------------------------------------------- 2 1 CLOSED 2 UNNAMED_INSTANCE_1 2 2 OPEN 2 devdb2 2 3 CLOSED 2 devdb3 2 4 OPEN 2 devdb1 1 1 CLOSED 2 UNNAMED_INSTANCE_1 1 2 OPEN 2 devdb2 1 3 CLOSED 2 devdb3 1 4 OPEN 2 devdb1 8 rows selected. SQL> SQL> select * from gv$log order by 1,3,2; INST_ID GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- 1 1 1 32 52428800 2 YES INACTIVE 1718779 07-DEC-11 1 2 1 33 52428800 2 NO CURRENT 1714384 07-DEC-11 1 3 2 23 52428800 2 YES INACTIVE 1714380 07-DEC-11 1 4 2 24 52428800 2 NO CURRENT 1718782 07-DEC-11 1 5 3 15 52428800 2 YES INACTIVE 1714384 07-DEC-11 1 6 3 16 52428800 2 NO CURRENT 1718779 07-DEC-11 1 7 4 3 104857600 2 NO CURRENT 1718779 07-DEC-11 1 8 4 2 104857600 2 YES INACTIVE 1718737 07-DEC-11 2 1 1 32 52428800 2 YES INACTIVE 1718779 07-DEC-11 2 2 1 33 52428800 2 NO CURRENT 1714384 07-DEC-11 2 3 2 23 52428800 2 YES INACTIVE 1714380 07-DEC-11 2 4 2 24 52428800 2 NO CURRENT 1718782 07-DEC-11 2 5 3 15 52428800 2 YES INACTIVE 1714384 07-DEC-11 2 6 3 16 52428800 2 NO CURRENT 1718779 07-DEC-11 2 7 4 3 104857600 2 NO CURRENT 1718779 07-DEC-11 2 8 4 2 104857600 2 YES INACTIVE 1718737 07-DEC-11 16 rows selected. SQL> SQL> SQL> SQL> alter system switch logfile; System altered. SQL> alter system switch logfile; System altered. SQL> alter system switch logfile; System altered. SQL> alter system checkpoint; System altered. SQL> select * from gv$log order by 1,3,2; INST_ID GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- 1 1 1 34 52428800 2 YES INACTIVE 1718779 07-DEC-11 1 2 1 33 52428800 2 NO CURRENT 1714384 07-DEC-11 1 3 2 23 52428800 2 NO CURRENT 1714380 07-DEC-11 1 4 2 24 52428800 2 YES INACTIVE 1718782 07-DEC-11 1 5 3 17 52428800 2 NO CURRENT 1714384 07-DEC-11 1 6 3 16 52428800 2 YES INACTIVE 1718779 07-DEC-11 1 7 4 5 104857600 2 YES INACTIVE 1718801 07-DEC-11 1 8 4 6 104857600 2 NO CURRENT 1718803 07-DEC-11 2 1 1 34 52428800 2 YES INACTIVE 1718779 07-DEC-11 2 2 1 33 52428800 2 NO CURRENT 1714384 07-DEC-11 2 3 2 23 52428800 2 NO CURRENT 1714380 07-DEC-11 2 4 2 24 52428800 2 YES INACTIVE 1718782 07-DEC-11 2 5 3 17 52428800 2 NO CURRENT 1714384 07-DEC-11 2 6 3 16 52428800 2 YES INACTIVE 1718779 07-DEC-11 2 7 4 5 104857600 2 YES INACTIVE 1718801 07-DEC-11 2 8 4 6 104857600 2 NO CURRENT 1718803 07-DEC-11 16 rows selected. SQL> |
同样的,虽然没有instance对应于thread1和thread3(因为node3已经宕机),但是还是会被写。
6条评论
在其它instance使用lsof查看下,,谁在打开此文件? 然后在针对性的做个strace跟踪下不就知道了.
re jametong: 谢谢提供思路!很好的主意,我试试。
re jametong: redo在asm上,lsof貌似没有版本看到进程。有什么好办法吗?
做个strace跟踪lgwr进程试试呢?
ASM 确实会比较麻烦.. 因为操作系统文件描述符的管理都被ASM本身给抽象化了.
可以考虑根据v$session_event抽取出有log file parallel write/log file single write等待事件操作的进程, 一般情况下为LGWR/CKPT 两个进程, (RAC需要不断推进每个库的SCN/checkpoint信息, 这样才能确保恢复以及SCN的整体一致性).
确定了有此等待事件的进程, 再在Oracle中对这几个进程打开10046的跟踪事件, 基本上就可以定位具体是谁动了你的Redo..
更进一步的话, 可以使用strace跟踪上一步得到的进程, 还可以知道它(们)都如何动了你的Redo(我个人估计checkpoint信息的可能性大).
如果oracle不是在尝试制造足够的混乱的话,redo内容应该只会由lgwr进程去写。用lsof看也只有lgwr进程会打开online redo。
你可以尝试enable private thread再测试一下是否还有相似的情况。