RMAN备份遭遇ORA-235

This is the first time i post blog using English.

Today i get a ticket from EBR team(3rd part backup team), saying that the backup job fail due to ora-235:

……
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 10/12/2011 02:38:13
ORA-00235: controlfile fixed table inconsistent due to concurrent update
RMAN-06031: could not translate database keyword
Recovery Manager complete.

……

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of backup command at 10/12/2011 02:38:13

ORA-00235: controlfile fixed table inconsistent due to concurrent update

RMAN-06031: could not translate database keyword

Recovery Manager complete.

so i go to the veritas netbackup path to check the backup log:

au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle>ls -lrt
total 1000
-rw-rw-rw-   1 root     root        3302 Oct  9 23:13 progress.1318162395.13914.log.Z
-rw-rw-rw-   1 root     root      120859 Oct 10 11:48 progress.1318165604.231.log
-rw-rw-rw-   1 root     root      107600 Oct 11 06:49 progress.1318248053.7838.log
-rw-rw-rw-   1 root     root      102098 Oct 11 23:10 progress.1318334454.10590.log
-rw-rw-rw-   1 root     root        8139 Oct 12 02:38 progress.1318347478.12109.log
-rw-rw-rw-   1 root     root      121113 Oct 12 16:57 progress.1318362511.1274.log

au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle>ls -lrt

total 1000

-rw-rw-rw- 1 root root 3302 Oct 9 23:13 progress.1318162395.13914.log.Z

-rw-rw-rw- 1 root root 120859 Oct 10 11:48 progress.1318165604.231.log

-rw-rw-rw- 1 root root 107600 Oct 11 06:49 progress.1318248053.7838.log

-rw-rw-rw- 1 root root 102098 Oct 11 23:10 progress.1318334454.10590.log

-rw-rw-rw- 1 root root 8139 Oct 12 02:38 progress.1318347478.12109.log

-rw-rw-rw- 1 root root 121113 Oct 12 16:57 progress.1318362511.1274.log

we see there are 2 backup log file today(2011-10-12). And one is backup fail, other is backup successful:

BACKUP FAIL LOG:

au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle>tail -20 progress.1318347478.12109.log
INF - released channel: ch06
INF - released channel: ch07
INF - released channel: ch08
INF - released channel: ch09
INF - released channel: ch10
INF - released channel: ch11
INF - released channel: ch12
INF - released channel: ch13
INF - released channel: ch14
INF - released channel: ch15
INF - RMAN-00571: ===========================================================
INF - RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
INF - RMAN-00571: ===========================================================
INF - RMAN-03002: failure of backup command at 10/12/2011 02:38:13
INF - ORA-00235: controlfile fixed table inconsistent due to concurrent update
INF - RMAN-06031: could not translate database keyword
INF - Recovery Manager complete.
INF - logout
INF - End of Recovery Manager output.
INF - End Oracle Recovery Manager.
au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle>

au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle>tail -20 progress.1318347478.12109.log

INF - released channel: ch06

INF - released channel: ch07

INF - released channel: ch08

INF - released channel: ch09

INF - released channel: ch10

INF - released channel: ch11

INF - released channel: ch12

INF - released channel: ch13

INF - released channel: ch14

INF - released channel: ch15

INF - RMAN-00571: ===========================================================

INF - RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

INF - RMAN-00571: ===========================================================

INF - RMAN-03002: failure of backup command at 10/12/2011 02:38:13

INF - ORA-00235: controlfile fixed table inconsistent due to concurrent update

INF - RMAN-06031: could not translate database keyword

INF - Recovery Manager complete.

INF - logout

INF - End of Recovery Manager output.

INF - End Oracle Recovery Manager.

au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle>

BACKUP SUCCESS LOG:

au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle>tail -20 progress.1318362511.1274.log
INF - released channel: ch09
INF - released channel: ch10
INF - released channel: ch11
INF - released channel: ch12
INF - released channel: ch13
INF - released channel: ch14
INF - released channel: ch15
INF - allocated channel: ch00
INF - channel ch00: starting full datafile backupset
INF - including current controlfile in backupset
INF - piece handle=ctrl_uapmou8hk_s108889_p1_t764355124 comment=API Version 2.0,MMS Version 5.0.0.0
INF - channel ch00: backup set complete, elapsed time: 00:03:06
INF - Starting Control File and SPFILE Autobackup at 12-OCT-11
INF - piece handle=c-3411474590-20111012-12 comment=API Version 2.0,MMS Version 5.0.0.0
INF - Finished Control File and SPFILE Autobackup at 12-OCT-11
INF - released channel: ch00
INF - Recovery Manager complete.
INF - logout
INF - End of Recovery Manager output.
INF - End Oracle Recovery Manager.
au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle>

au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle>tail -20 progress.1318362511.1274.log

INF - released channel: ch09

INF - released channel: ch10

INF - released channel: ch11

INF - released channel: ch12

INF - released channel: ch13

INF - released channel: ch14

INF - released channel: ch15

INF - allocated channel: ch00

INF - channel ch00: starting full datafile backupset

INF - including current controlfile in backupset

INF - piece handle=ctrl_uapmou8hk_s108889_p1_t764355124 comment=API Version 2.0,MMS Version 5.0.0.0

INF - channel ch00: backup set complete, elapsed time: 00:03:06

INF - Starting Control File and SPFILE Autobackup at 12-OCT-11

INF - piece handle=c-3411474590-20111012-12 comment=API Version 2.0,MMS Version 5.0.0.0

INF - Finished Control File and SPFILE Autobackup at 12-OCT-11

INF - released channel: ch00

INF - Recovery Manager complete.

INF - logout

INF - End of Recovery Manager output.

INF - End Oracle Recovery Manager.

au11qap830tels2:SANL01P1:/usr/openv/netbackup/logs/user_ops/dbext/oracle>

The backup fail due to ORA-00235 at 02:38am, and re-run the backup job at another time can be successfully.

The error happen because controlfile fixed table inconsistent due to concurrent update.

When we do the rman backup without catalog, just using controlfile to store backup information, it will read the controlfile and get the information like SCN from the controlfile.

When the database is doing a combination of a high rate of change, it will trigger redo log switch and when log switch, it will trigger checkpoint.

checkpoint operation will update the newest SCN to controlfile.

So the SCN is inconsistent with what we read at first time. ora-235 error raise.

From the netbackup log, we see the error happen at 10/12/2011 02:38:13.

From the log history, we also can see there are some log switch before 02:38:13.

TO_CHAR(FIRST_TIME,  SEQUENCE#
------------------- ----------
2011-10-12 00:30:18      59728
2011-10-12 00:31:47      59729
2011-10-12 01:28:08      59730
2011-10-12 01:30:30      59731
2011-10-12 02:29:23      59732
2011-10-12 02:34:07      59733
2011-10-12 02:34:45      59734
2011-10-12 03:38:52      59735
2011-10-12 03:40:28      59736
2011-10-12 04:43:04      59737
2011-10-12 04:44:56      59738
====================================

TO_CHAR(FIRST_TIME, SEQUENCE#

------------------- ----------

2011-10-12 00:30:18 59728

2011-10-12 00:31:47 59729

2011-10-12 01:28:08 59730

2011-10-12 01:30:30 59731

2011-10-12 02:29:23 59732

2011-10-12 02:34:07 59733

2011-10-12 02:34:45 59734

2011-10-12 03:38:52 59735

2011-10-12 03:40:28 59736

2011-10-12 04:43:04 59737

2011-10-12 04:44:56 59738

====================================

So here we can get the root cause and solution:

++++++++++++++++
+CAUSE:
++++++++++++++++
As each redo log is archived, the control file will be updated with the latest SCN of the REDO LOG switch.  If this is happening very frequently, the control  file is never released and made available for RMAN for the resync.

+++++++++++++++
+SOLUTION
+++++++++++++++
(1) Backup the database at the time which controlfile is not frequently update.

(2) Need to reduce the frequency of checkpoint.
(2.1) Increase the size of the redologfiles, but due to the redo log file size is already 4G, this solution is not recommend
(2.2) Increase the value of fast_start_mttr_target from 300 to 600.

++++++++++++++++

+CAUSE:

++++++++++++++++

As each redo log is archived, the control file will be updated with the latest SCN of the REDO LOG switch. If this is happening very frequently, the control file is never released and made available for RMAN for the resync.

+++++++++++++++

+SOLUTION

+++++++++++++++

(1) Backup the database at the time which controlfile is not frequently update.

(2) Need to reduce the frequency of checkpoint.

(2.1) Increase the size of the redologfiles, but due to the redo log file size is already 4G, this solution is not recommend

(2.2) Increase the value of fast_start_mttr_target from 300 to 600.

2条评论

hctech说道：

2011-10-19 15:37

大师您好，请教一下看这篇文章的几个问题。
没能明白触发235这个错误的原因是什么（可能是俺英文水平有限），大概的理解是由于备份时的高数据读写访问导致的控制文件被更新。
我的疑问是当我们做备份时，特别是大的库，可能要备1-2天才能备完的那种，中间不可能不发生检查点和日志切换吧，特别是白天业务高峰的时候，但是却从来没有遇到过这个错误。所以想请问触发这个错误到底是在哪种特殊的情况。
谢谢您！

回复
小荷说道：

2011-10-25 20:50

re hctech：ora-235报错的原因是因为当redolog切出到archive log的时候，控制文件中会更新到最新的scn。更新之后，控制文件和rman会做同步，将最新scn的信息告诉给rman（如果你的rman使用的是nocatalog），如果日志切换的非常频繁，注意是非常频繁，以至于一直在更新scn，没有机会和rman做同步，因此就报错ora-235了。
如果你的系统也有日志切换，但是没有频繁到我所说的这种程度，就不会引起ora-235的报错。

回复

发表回复取消回复

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据。

RMAN备份遭遇ORA-235

相关文章

非归档下误删数据文件的处理

ocm考试-grid control中的job system

DBMS_AUDIT_MGMT的一些小结

2条评论

发表回复 取消回复

发表回复取消回复