This is the first time i post blog using English.
Today i get a ticket from EBR team(3rd part backup team), saying that the backup job fail due to ora-235:
1 2 3 4 5 6 7 8 |
…… RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of backup command at 10/12/2011 02:38:13 ORA-00235: controlfile fixed table inconsistent due to concurrent update RMAN-06031: could not translate database keyword Recovery Manager complete. |
so i go to the veritas netbackup path to check the backup log:
1 2 3 4 5 6 7 8 |
au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle>ls -lrt total 1000 -rw-rw-rw- 1 root root 3302 Oct 9 23:13 progress.1318162395.13914.log.Z -rw-rw-rw- 1 root root 120859 Oct 10 11:48 progress.1318165604.231.log -rw-rw-rw- 1 root root 107600 Oct 11 06:49 progress.1318248053.7838.log -rw-rw-rw- 1 root root 102098 Oct 11 23:10 progress.1318334454.10590.log -rw-rw-rw- 1 root root 8139 Oct 12 02:38 progress.1318347478.12109.log -rw-rw-rw- 1 root root 121113 Oct 12 16:57 progress.1318362511.1274.log |
we see there are 2 backup log file today(2011-10-12). And one is backup fail, other is backup successful:
BACKUP FAIL LOG:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle>tail -20 progress.1318347478.12109.log INF - released channel: ch06 INF - released channel: ch07 INF - released channel: ch08 INF - released channel: ch09 INF - released channel: ch10 INF - released channel: ch11 INF - released channel: ch12 INF - released channel: ch13 INF - released channel: ch14 INF - released channel: ch15 INF - RMAN-00571: =========================================================== INF - RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== INF - RMAN-00571: =========================================================== INF - RMAN-03002: failure of backup command at 10/12/2011 02:38:13 INF - ORA-00235: controlfile fixed table inconsistent due to concurrent update INF - RMAN-06031: could not translate database keyword INF - Recovery Manager complete. INF - logout INF - End of Recovery Manager output. INF - End Oracle Recovery Manager. au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle> |
BACKUP SUCCESS LOG:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle>tail -20 progress.1318362511.1274.log INF - released channel: ch09 INF - released channel: ch10 INF - released channel: ch11 INF - released channel: ch12 INF - released channel: ch13 INF - released channel: ch14 INF - released channel: ch15 INF - allocated channel: ch00 INF - channel ch00: starting full datafile backupset INF - including current controlfile in backupset INF - piece handle=ctrl_uapmou8hk_s108889_p1_t764355124 comment=API Version 2.0,MMS Version 5.0.0.0 INF - channel ch00: backup set complete, elapsed time: 00:03:06 INF - Starting Control File and SPFILE Autobackup at 12-OCT-11 INF - piece handle=c-3411474590-20111012-12 comment=API Version 2.0,MMS Version 5.0.0.0 INF - Finished Control File and SPFILE Autobackup at 12-OCT-11 INF - released channel: ch00 INF - Recovery Manager complete. INF - logout INF - End of Recovery Manager output. INF - End Oracle Recovery Manager. au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle> |
The backup fail due to ORA-00235 at 02:38am, and re-run the backup job at another time can be successfully.
The error happen because controlfile fixed table inconsistent due to concurrent update.
When we do the rman backup without catalog, just using controlfile to store backup information, it will read the controlfile and get the information like SCN from the controlfile.
When the database is doing a combination of a high rate of change, it will trigger redo log switch and when log switch, it will trigger checkpoint.
checkpoint operation will update the newest SCN to controlfile.
So the SCN is inconsistent with what we read at first time. ora-235 error raise.
From the netbackup log, we see the error happen at 10/12/2011 02:38:13.
From the log history, we also can see there are some log switch before 02:38:13.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
TO_CHAR(FIRST_TIME, SEQUENCE# ------------------- ---------- 2011-10-12 00:30:18 59728 2011-10-12 00:31:47 59729 2011-10-12 01:28:08 59730 2011-10-12 01:30:30 59731 2011-10-12 02:29:23 59732 2011-10-12 02:34:07 59733 2011-10-12 02:34:45 59734 2011-10-12 03:38:52 59735 2011-10-12 03:40:28 59736 2011-10-12 04:43:04 59737 2011-10-12 04:44:56 59738 ==================================== |
So here we can get the root cause and solution:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
++++++++++++++++ +CAUSE: ++++++++++++++++ As each redo log is archived, the control file will be updated with the latest SCN of the REDO LOG switch. If this is happening very frequently, the control file is never released and made available for RMAN for the resync. +++++++++++++++ +SOLUTION +++++++++++++++ (1) Backup the database at the time which controlfile is not frequently update. (2) Need to reduce the frequency of checkpoint. (2.1) Increase the size of the redologfiles, but due to the redo log file size is already 4G, this solution is not recommend (2.2) Increase the value of fast_start_mttr_target from 300 to 600. |
2条评论
大师您好,请教一下看这篇文章的几个问题。
没能明白触发235这个错误的原因是什么(可能是俺英文水平有限),大概的理解是由于备份时的高数据读写访问导致的控制文件被更新。
我的疑问是当我们做备份时,特别是大的库,可能要备1-2天才能备完的那种,中间不可能不发生检查点和日志切换吧,特别是白天业务高峰的时候,但是却从来没有遇到过这个错误。所以想请问触发这个错误到底是在哪种特殊的情况。
谢谢您!
re hctech:ora-235报错的原因是因为当redolog切出到archive log的时候,控制文件中会更新到最新的scn。更新之后,控制文件和rman会做同步,将最新scn的信息告诉给rman(如果你的rman使用的是nocatalog),如果日志切换的非常频繁,注意是非常频繁,以至于一直在更新scn,没有机会和rman做同步,因此就报错ora-235了。
如果你的系统也有日志切换,但是没有频繁到我所说的这种程度,就不会引起ora-235的报错。