RAC无法启动，报错terminating instance due to error 304

今天在做一个change的时候，change的内容本身比较简单，就是将控制文件冗余到不同的mount point去，当alter system control_file scope=spfile之后，关闭RAC，mv控制文件，将之再次启动，使得spfile中指向的新路径的控制文件生效。问题出在了关闭RAC之后，数据库就启动不了了。

我们从关闭RAC开始，看看当时的场景。

oracle@vus029pa:SIAP1:/opt/app/oracle/admin $ srvctl stop database -d SIAP

1	oracle@vus029pa:SIAP1:/opt/app/oracle/admin $ srvctl stop database -d SIAP

同时我们tail -f节点1和节点2上的alertlog，发现节点2是很快很顺利的关闭了，但是节点1迟迟未关闭。

节点2的alertlog：

Sat Jun 02 14:04:45 GMT 2012
Shutting down instance (immediate)
Sat Jun 02 14:04:45 GMT 2012
Shutting down instance: further logons disabled
Sat Jun 02 14:04:49 GMT 2012
Stopping background process CJQ0
Sat Jun 02 14:04:49 GMT 2012
Stopping background process QMNC
Sat Jun 02 14:04:51 GMT 2012
Stopping background process MMNL
Sat Jun 02 14:04:52 GMT 2012
Stopping background process MMON
License high water mark = 1004
Sat Jun 02 14:04:53 GMT 2012
Job queue slave processes stopped
Sat Jun 02 14:05:59 GMT 2012
ALTER DATABASE CLOSE NORMAL
Sat Jun 02 14:05:59 GMT 2012
SMON: disabling tx recovery
SMON: disabling cache recovery
Sat Jun 02 14:06:00 GMT 2012
Shutting down archive processes
Archiving is disabled
Sat Jun 02 14:06:05 GMT 2012
ARCH shutting down
ARC9: Archival stopped
Sat Jun 02 14:06:10 GMT 2012
ARCH shutting down
ARC8: Archival stopped
Sat Jun 02 14:06:15 GMT 2012
ARCH shutting down
ARC7: Archival stopped
Sat Jun 02 14:06:21 GMT 2012
ARCH shutting down
ARC6: Archival stopped
Sat Jun 02 14:06:31 GMT 2012
ARCH shutting down
ARC4: Archival stopped
Sat Jun 02 14:06:36 GMT 2012
ARCH shutting down
ARC3: Archival stopped
Sat Jun 02 14:06:46 GMT 2012
ARCH shutting down
ARC1: Archival stopped
Sat Jun 02 14:06:51 GMT 2012
ARCH shutting down
Sat Jun 02 14:06:51 GMT 2012
ARC5: Becoming the 'no FAL' ARCH
Sat Jun 02 14:06:51 GMT 2012
ARC0: Archival stopped
Sat Jun 02 14:06:51 GMT 2012
ARC5: Becoming the 'no SRL' ARCH
ARC5: Archiving disabled
Sat Jun 02 14:06:51 GMT 2012
ARCH shutting down
ARC5: Archival stopped
Sat Jun 02 14:06:51 GMT 2012
ARC2: Becoming the heartbeat ARCH
Sat Jun 02 14:06:58 GMT 2012
Thread 2 closed at log sequence 21703
Successful close of redo thread 2
Sat Jun 02 14:07:26 GMT 2012
Completed: ALTER DATABASE CLOSE NORMAL
Sat Jun 02 14:07:26 GMT 2012
ALTER DATABASE DISMOUNT
Completed: ALTER DATABASE DISMOUNT
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
Sat Jun 02 14:07:33 GMT 2012
freeing rdom 0

Sat Jun 02 14:04:45 GMT 2012

Shutting down instance (immediate)

Sat Jun 02 14:04:45 GMT 2012

Shutting down instance: further logons disabled

Sat Jun 02 14:04:49 GMT 2012

Stopping background process CJQ0

Sat Jun 02 14:04:49 GMT 2012

Stopping background process QMNC

Sat Jun 02 14:04:51 GMT 2012

Stopping background process MMNL

Sat Jun 02 14:04:52 GMT 2012

Stopping background process MMON

License high water mark = 1004

Sat Jun 02 14:04:53 GMT 2012

Job queue slave processes stopped

Sat Jun 02 14:05:59 GMT 2012

ALTER DATABASE CLOSE NORMAL

Sat Jun 02 14:05:59 GMT 2012

SMON: disabling tx recovery

SMON: disabling cache recovery

Sat Jun 02 14:06:00 GMT 2012

Shutting down archive processes

Archiving is disabled

Sat Jun 02 14:06:05 GMT 2012

ARCH shutting down

ARC9: Archival stopped

Sat Jun 02 14:06:10 GMT 2012

ARCH shutting down

ARC8: Archival stopped

Sat Jun 02 14:06:15 GMT 2012

ARCH shutting down

ARC7: Archival stopped

Sat Jun 02 14:06:21 GMT 2012

ARCH shutting down

ARC6: Archival stopped

Sat Jun 02 14:06:31 GMT 2012

ARCH shutting down

ARC4: Archival stopped

Sat Jun 02 14:06:36 GMT 2012

ARCH shutting down

ARC3: Archival stopped

Sat Jun 02 14:06:46 GMT 2012

ARCH shutting down

ARC1: Archival stopped

Sat Jun 02 14:06:51 GMT 2012

ARCH shutting down

Sat Jun 02 14:06:51 GMT 2012

ARC5: Becoming the 'no FAL' ARCH

Sat Jun 02 14:06:51 GMT 2012

ARC0: Archival stopped

Sat Jun 02 14:06:51 GMT 2012

ARC5: Becoming the 'no SRL' ARCH

ARC5: Archiving disabled

Sat Jun 02 14:06:51 GMT 2012

ARCH shutting down

ARC5: Archival stopped

Sat Jun 02 14:06:51 GMT 2012

ARC2: Becoming the heartbeat ARCH

Sat Jun 02 14:06:58 GMT 2012

Thread 2 closed at log sequence 21703

Successful close of redo thread 2

Sat Jun 02 14:07:26 GMT 2012

Completed: ALTER DATABASE CLOSE NORMAL

Sat Jun 02 14:07:26 GMT 2012

ALTER DATABASE DISMOUNT

Completed: ALTER DATABASE DISMOUNT

ARCH: Archival disabled due to shutdown: 1089

Shutting down archive processes

Archiving is disabled

Archive process shutdown avoided: 0 active

ARCH: Archival disabled due to shutdown: 1089

Shutting down archive processes

Archiving is disabled

Archive process shutdown avoided: 0 active

Sat Jun 02 14:07:33 GMT 2012

freeing rdom 0

在14：07左右节点2就关闭了。

节点1的alertlog：

Sat Jun 02 14:04:52 GMT 2012
Shutting down instance (immediate)
Sat Jun 02 14:04:52 GMT 2012
Shutting down instance: further logons disabled
Sat Jun 02 14:04:54 GMT 2012
Stopping background process CJQ0
Sat Jun 02 14:04:54 GMT 2012
Stopping background process QMNC
Sat Jun 02 14:04:56 GMT 2012
Stopping background process MMNL
Sat Jun 02 14:04:57 GMT 2012
Stopping background process MMON
License high water mark = 1417
Sat Jun 02 14:04:58 GMT 2012
Job queue slave processes stopped
Sat Jun 02 14:12:41 GMT 2012
IPC Send timeout detected.Sender: ospid 14002
Receiver: inst 2 binc 379475 ospid 17274
Sat Jun 02 14:12:42 GMT 2012
IPC Send timeout to 1.0 inc 4 for msg type 8 from opid 33
Sat Jun 02 14:12:42 GMT 2012
Communications reconfiguration: instance_number 2
Sat Jun 02 14:12:42 GMT 2012
Trace dumping is performing id=[cdmp_20120602141242]
Sat Jun 02 14:13:33 GMT 2012
Shutting down instance (abort)
License high water mark = 1417
Termination issued to instance processes. Waiting for the processes to exit
Sat Jun 02 14:13:43 GMT 2012
Instance termination failed to kill one or more processes
Sat Jun 02 14:14:13 GMT 2012
Instance terminated by USER, pid = 5214

Sat Jun 02 14:04:52 GMT 2012

Shutting down instance (immediate)

Sat Jun 02 14:04:52 GMT 2012

Shutting down instance: further logons disabled

Sat Jun 02 14:04:54 GMT 2012

Stopping background process CJQ0

Sat Jun 02 14:04:54 GMT 2012

Stopping background process QMNC

Sat Jun 02 14:04:56 GMT 2012

Stopping background process MMNL

Sat Jun 02 14:04:57 GMT 2012

Stopping background process MMON

License high water mark = 1417

Sat Jun 02 14:04:58 GMT 2012

Job queue slave processes stopped

Sat Jun 02 14:12:41 GMT 2012

IPC Send timeout detected.Sender: ospid 14002

Receiver: inst 2 binc 379475 ospid 17274

Sat Jun 02 14:12:42 GMT 2012

IPC Send timeout to 1.0 inc 4 for msg type 8 from opid 33

Sat Jun 02 14:12:42 GMT 2012

Communications reconfiguration: instance_number 2

Sat Jun 02 14:12:42 GMT 2012

Trace dumping is performing id=[cdmp_20120602141242]

Sat Jun 02 14:13:33 GMT 2012

Shutting down instance (abort)

License high water mark = 1417

Termination issued to instance processes. Waiting for the processes to exit

Sat Jun 02 14:13:43 GMT 2012

Instance termination failed to kill one or more processes

Sat Jun 02 14:14:13 GMT 2012

Instance terminated by USER, pid = 5214

我们发现节点1是在14：13分，被RAC以shutdown abort的方式关闭的。

至此，关闭RAC的命令以以下报错结束：

oracle@vus029pa:SIAP1:/opt/app/oracle/admin $ srvctl stop database -d SIAP
PRKP-1002 : Error stopping instance SIAP1 on node vus029pa
CRS-0216: Could not stop resource 'ora.SIAP.SIAP1.inst'.
oracle@vus029pa:SIAP1:/opt/app/oracle/admin $

oracle@vus029pa:SIAP1:/opt/app/oracle/admin $ srvctl stop database -d SIAP

PRKP-1002 : Error stopping instance SIAP1 on node vus029pa

CRS-0216: Could not stop resource 'ora.SIAP.SIAP1.inst'.

oracle@vus029pa:SIAP1:/opt/app/oracle/admin $

虽然oracle提示了节点1无法关闭，但是检查了oracle的进程都已经消失了。确认节点1已经关闭，因此开始重新启动数据库。没想到启动的时候，起不来了：

oracle@vus029pa:SIAP1:/ora/db001/data001/SIAP $ srvctl start database -d SIAP
PRKP-1001 : Error starting instance SIAP1 on node vus029pa
CRS-1028: Dependency analysis failed because of:
CRS-0223: Resource 'ora.SIAP.SIAP1.inst' has placement error.
PRKP-1001 : Error starting instance SIAP2 on node vus160po
CRS-0215: Could not start resource 'ora.SIAP.SIAP2.inst'.
oracle@vus029pa:SIAP1:/ora/db001/data001/SIAP $

oracle@vus029pa:SIAP1:/ora/db001/data001/SIAP $ srvctl start database -d SIAP

PRKP-1001 : Error starting instance SIAP1 on node vus029pa

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.SIAP.SIAP1.inst' has placement error.

PRKP-1001 : Error starting instance SIAP2 on node vus160po

CRS-0215: Could not start resource 'ora.SIAP.SIAP2.inst'.

oracle@vus029pa:SIAP1:/ora/db001/data001/SIAP $

发现此时节点是是启动到nomount状态，即有进程存在但是无法启动，节点2是没启动。
对应的alertlog为：
节点1：

Starting ORACLE instance (normal)
Sat Jun 02 14:40:57 GMT 2012
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.5.0.
System parameters with non-default values:
  processes                = 1500
  timed_statistics         = TRUE
  __shared_pool_size       = 6442450944
  shared_pool_size         = 6442450944
  __large_pool_size        = 218103808
  large_pool_size          = 218103808
  __java_pool_size         = 33554432
  java_pool_size           = 33554432
  __streams_pool_size      = 0
  spfile                   = /ora/db001/data001/SIAP/spfileSIAP.ora
  filesystemio_options     = SETALL
  disk_asynch_io           = TRUE
  cluster_interconnects    = 192.168.0.1
  sga_target               = 21474836480
  control_files            = /ora/db001/data001/SIAP/control01SIAP.ctl, /ora/db001/redo002/SIAP/control02SIAP.ctl, /ora/db001/redo001/SIAP/control03SIAP.ctl
  control_file_record_keep_time= 30
  db_block_size            = 8192
  __db_cache_size          = 14763950080
  db_cache_size            = 10737418240
  db_writer_processes      = 10
  compatible               = 10.2.0
  log_archive_config       = DG_CONFIG=(SIAP,SIAP_DG)
  log_archive_dest_1       = location=/ora/db001/dbfra001/SIAP valid_for=(all_logfiles,all_roles) db_unique_name=SIAP
  log_archive_dest_2       = service=SIAP_DG valid_for=(online_logfiles,primary_role) LGWR ASYNC DB_UNIQUE_NAME=SIAP_DG
  log_archive_dest_state_1 = ENABLE
  log_archive_dest_state_2 = ENABLE
  log_archive_max_processes= 10
  standby_archive_dest     = /ora/db001/dbfra001/SIAP/standby
  log_archive_format       = arch_SIAP_%s_%t_%r.arc
  fal_client               = SIAP
  fal_server               = SIAP_DG
  log_buffer               = 14234112
  db_files                 = 1000
  db_file_multiblock_read_count= 16
  cluster_database         = TRUE
  cluster_database_instances= 2
  db_recovery_file_dest    = /ora/db001/dbfra001/SIAP
  db_recovery_file_dest_size= 64424509440
  standby_file_management  = AUTO
  thread                   = 1
  fast_start_mttr_target   = 300
  log_checkpoints_to_alert = TRUE
  instance_number          = 1
  undo_management          = AUTO
  undo_tablespace          = undots
  undo_retention           = 3600
  create_stored_outlines   = FALSE
  remote_login_passwordfile= EXCLUSIVE
  db_domain                = in.telstra.world
  instance_name            = SIAP1
  local_listener           = LISTENER_VUS029PA
  remote_listener          = LISTENER_SIAP
  session_cached_cursors   = 100
  utl_file_dir             = /tmp
  job_queue_processes      = 2
  cursor_sharing           = force
  background_dump_dest     = /opt/app/oracle/admin/SIAP/bdump
  user_dump_dest           = /opt/app/oracle/admin/SIAP/udump
  max_dump_file_size       = 5M
  core_dump_dest           = /opt/app/oracle/admin/SIAP/cdump
  audit_file_dest          = /opt/app/oracle/admin/SIAP/adump
  session_max_open_files   = 200
  open_links               = 4
  audit_trail              = DB
  db_name                  = SIAP
  open_cursors             = 1000
  star_transformation_enabled= true
  query_rewrite_enabled    = true
  query_rewrite_integrity  = trusted
  pga_aggregate_target     = 17179869184
  workarea_size_policy     = auto
Cluster communication is configured to use the following interface(s) for this instance
  192.168.0.1
Sat Jun 02 14:41:01 GMT 2012
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=10522
DIAG started with pid=3, OS id=10524
PSP0 started with pid=4, OS id=10526
LMON started with pid=5, OS id=10528
LMD0 started with pid=6, OS id=10530
LMS0 started with pid=7, OS id=10539
LMS1 started with pid=8, OS id=10544
LMS2 started with pid=9, OS id=10548
LMS3 started with pid=10, OS id=10552
LMS4 started with pid=11, OS id=10556
LMS5 started with pid=12, OS id=10560
LMS6 started with pid=13, OS id=10564
LMS7 started with pid=14, OS id=10576
LMS8 started with pid=15, OS id=10580
LMS9 started with pid=16, OS id=10584
LMSa started with pid=17, OS id=10594
LMSb started with pid=18, OS id=10598
LMSc started with pid=19, OS id=10602
LMSd started with pid=20, OS id=10606
MMAN started with pid=21, OS id=10623
DBW0 started with pid=22, OS id=10653
DBW1 started with pid=23, OS id=10673
DBW2 started with pid=24, OS id=10686
DBW3 started with pid=25, OS id=10690
DBW4 started with pid=26, OS id=10692
DBW5 started with pid=27, OS id=10694
DBW6 started with pid=28, OS id=10696
DBW7 started with pid=29, OS id=10706
DBW8 started with pid=30, OS id=10708
DBW9 started with pid=31, OS id=10710
LGWR started with pid=32, OS id=10712
CKPT started with pid=33, OS id=10714
SMON started with pid=34, OS id=10716
RECO started with pid=35, OS id=10718
CJQ0 started with pid=36, OS id=10720
MMON started with pid=37, OS id=10723
MMNL started with pid=38, OS id=10732
<<<挂死在此处

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

Starting ORACLE instance (normal)

Sat Jun 02 14:40:57 GMT 2012

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Picked latch-free SCN scheme 3

LICENSE_MAX_USERS = 0

SYS auditing is disabled

ksdpec: called for event 13740 prior to event group initialization

Starting up ORACLE RDBMS Version: 10.2.0.5.0.

System parameters with non-default values:

processes = 1500

timed_statistics = TRUE

__shared_pool_size = 6442450944

shared_pool_size = 6442450944

__large_pool_size = 218103808

large_pool_size = 218103808

__java_pool_size = 33554432

java_pool_size = 33554432

__streams_pool_size = 0

spfile = /ora/db001/data001/SIAP/spfileSIAP.ora

filesystemio_options = SETALL

disk_asynch_io = TRUE

cluster_interconnects = 192.168.0.1

sga_target = 21474836480

control_files = /ora/db001/data001/SIAP/control01SIAP.ctl, /ora/db001/redo002/SIAP/control02SIAP.ctl, /ora/db001/redo001/SIAP/control03SIAP.ctl

control_file_record_keep_time= 30

db_block_size = 8192

__db_cache_size = 14763950080

db_cache_size = 10737418240

db_writer_processes = 10

compatible = 10.2.0

log_archive_config = DG_CONFIG=(SIAP,SIAP_DG)

log_archive_dest_1 = location=/ora/db001/dbfra001/SIAP valid_for=(all_logfiles,all_roles) db_unique_name=SIAP

log_archive_dest_2 = service=SIAP_DG valid_for=(online_logfiles,primary_role) LGWR ASYNC DB_UNIQUE_NAME=SIAP_DG

log_archive_dest_state_1 = ENABLE

log_archive_dest_state_2 = ENABLE

log_archive_max_processes= 10

standby_archive_dest = /ora/db001/dbfra001/SIAP/standby

log_archive_format = arch_SIAP_%s_%t_%r.arc

fal_client = SIAP

fal_server = SIAP_DG

log_buffer = 14234112

db_files = 1000

db_file_multiblock_read_count= 16

cluster_database = TRUE

cluster_database_instances= 2

db_recovery_file_dest = /ora/db001/dbfra001/SIAP

db_recovery_file_dest_size= 64424509440

standby_file_management = AUTO

thread = 1

fast_start_mttr_target = 300

log_checkpoints_to_alert = TRUE

instance_number = 1

undo_management = AUTO

undo_tablespace = undots

undo_retention = 3600

create_stored_outlines = FALSE

remote_login_passwordfile= EXCLUSIVE

db_domain = in.telstra.world

instance_name = SIAP1

local_listener = LISTENER_VUS029PA

remote_listener = LISTENER_SIAP

session_cached_cursors = 100

utl_file_dir = /tmp

job_queue_processes = 2

cursor_sharing = force

background_dump_dest = /opt/app/oracle/admin/SIAP/bdump

user_dump_dest = /opt/app/oracle/admin/SIAP/udump

max_dump_file_size = 5M

core_dump_dest = /opt/app/oracle/admin/SIAP/cdump

audit_file_dest = /opt/app/oracle/admin/SIAP/adump

session_max_open_files = 200

open_links = 4

audit_trail = DB

db_name = SIAP

open_cursors = 1000

star_transformation_enabled= true

query_rewrite_enabled = true

query_rewrite_integrity = trusted

pga_aggregate_target = 17179869184

workarea_size_policy = auto

Cluster communication is configured to use the following interface(s) for this instance

192.168.0.1

Sat Jun 02 14:41:01 GMT 2012

cluster interconnect IPC version:Oracle UDP/IP (generic)

IPC Vendor 1 proto 2

PMON started with pid=2, OS id=10522

DIAG started with pid=3, OS id=10524

PSP0 started with pid=4, OS id=10526

LMON started with pid=5, OS id=10528

LMD0 started with pid=6, OS id=10530

LMS0 started with pid=7, OS id=10539

LMS1 started with pid=8, OS id=10544

LMS2 started with pid=9, OS id=10548

LMS3 started with pid=10, OS id=10552

LMS4 started with pid=11, OS id=10556

LMS5 started with pid=12, OS id=10560

LMS6 started with pid=13, OS id=10564

LMS7 started with pid=14, OS id=10576

LMS8 started with pid=15, OS id=10580

LMS9 started with pid=16, OS id=10584

LMSa started with pid=17, OS id=10594

LMSb started with pid=18, OS id=10598

LMSc started with pid=19, OS id=10602

LMSd started with pid=20, OS id=10606

MMAN started with pid=21, OS id=10623

DBW0 started with pid=22, OS id=10653

DBW1 started with pid=23, OS id=10673

DBW2 started with pid=24, OS id=10686

DBW3 started with pid=25, OS id=10690

DBW4 started with pid=26, OS id=10692

DBW5 started with pid=27, OS id=10694

DBW6 started with pid=28, OS id=10696

DBW7 started with pid=29, OS id=10706

DBW8 started with pid=30, OS id=10708

DBW9 started with pid=31, OS id=10710

LGWR started with pid=32, OS id=10712

CKPT started with pid=33, OS id=10714

SMON started with pid=34, OS id=10716

RECO started with pid=35, OS id=10718

CJQ0 started with pid=36, OS id=10720

MMON started with pid=37, OS id=10723

MMNL started with pid=38, OS id=10732

<<<挂死在此处

节点2的alertlog为：

Sat Jun 02 14:22:15 GMT 2012
Starting ORACLE instance (normal)
Sat Jun 02 14:23:25 GMT 2012
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.5.0.
System parameters with non-default values:
  processes                = 1500
  timed_statistics         = TRUE
  __shared_pool_size       = 6442450944
  shared_pool_size         = 6442450944
  __large_pool_size        = 218103808
  large_pool_size          = 218103808
  __java_pool_size         = 33554432
  java_pool_size           = 33554432
  __streams_pool_size      = 0
  spfile                   = /ora/db001/data001/SIAP/spfileSIAP.ora
  filesystemio_options     = SETALL
  disk_asynch_io           = TRUE
  cluster_interconnects    = 192.168.0.2
  sga_target               = 21474836480
  control_files            = /ora/db001/data001/SIAP/control01SIAP.ctl, /ora/db001/redo002/SIAP/control02SIAP.ctl, /ora/db001/redo001/SIAP/control03SIAP.ctl
  control_file_record_keep_time= 30
  db_block_size            = 8192
  __db_cache_size          = 14763950080
  db_cache_size            = 10737418240
  db_writer_processes      = 10
  compatible               = 10.2.0
  log_archive_config       = DG_CONFIG=(SIAP,SIAP_DG)
  log_archive_dest_1       = location=/ora/db001/dbfra001/SIAP valid_for=(all_logfiles,all_roles) db_unique_name=SIAP
  log_archive_dest_2       = service=SIAP_DG valid_for=(online_logfiles,primary_role) LGWR ASYNC DB_UNIQUE_NAME=SIAP_DG
  log_archive_dest_state_1 = ENABLE
  log_archive_dest_state_2 = ENABLE
  log_archive_max_processes= 10
  standby_archive_dest     = /ora/db001/dbfra001/SIAP/standby
  log_archive_format       = arch_SIAP_%s_%t_%r.arc
  fal_client               = SIAP
  fal_server               = SIAP_DG
  log_buffer               = 14234112
  db_files                 = 1000
  db_file_multiblock_read_count= 16
  cluster_database         = TRUE
  cluster_database_instances= 2
  db_recovery_file_dest    = /ora/db001/dbfra001/SIAP
  db_recovery_file_dest_size= 64424509440
  standby_file_management  = AUTO
  thread                   = 2
  fast_start_mttr_target   = 300
  log_checkpoints_to_alert = TRUE
  instance_number          = 2
  undo_management          = AUTO
  undo_tablespace          = undots2
  undo_retention           = 3600
  create_stored_outlines   = FALSE
  remote_login_passwordfile= EXCLUSIVE
  db_domain                = in.telstra.world
  instance_name            = SIAP2
  local_listener           = LISTENER_VUS160PO
  remote_listener          = LISTENER_SIAP
  session_cached_cursors   = 100
  utl_file_dir             = /tmp
  job_queue_processes      = 2
  cursor_sharing           = force
  background_dump_dest     = /opt/app/oracle/admin/SIAP/bdump
  user_dump_dest           = /opt/app/oracle/admin/SIAP/udump
  max_dump_file_size       = 5M
  core_dump_dest           = /opt/app/oracle/admin/SIAP/cdump
  audit_file_dest          = /opt/app/oracle/admin/SIAP/adump
  session_max_open_files   = 200
  open_links               = 4
  audit_trail              = DB
  db_name                  = SIAP
  open_cursors             = 1000
  star_transformation_enabled= true
  query_rewrite_enabled    = true
  query_rewrite_integrity  = trusted
  pga_aggregate_target     = 17179869184
  workarea_size_policy     = auto
Cluster communication is configured to use the following interface(s) for this instance
  192.168.0.2
Sat Jun 02 14:23:28 GMT 2012
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=24777
DIAG started with pid=3, OS id=24779
PSP0 started with pid=4, OS id=24781
LMON started with pid=5, OS id=24783
LMD0 started with pid=6, OS id=24785
LMS0 started with pid=7, OS id=24787
LMS1 started with pid=8, OS id=24794
LMS2 started with pid=9, OS id=24803
LMS3 started with pid=10, OS id=24807
LMS4 started with pid=11, OS id=24811
LMS5 started with pid=12, OS id=24816
LMS6 started with pid=13, OS id=24841
LMS7 started with pid=14, OS id=24852
LMS8 started with pid=15, OS id=24857
LMS9 started with pid=16, OS id=24861
LMSa started with pid=17, OS id=24865
LMSb started with pid=18, OS id=24901
LMSc started with pid=19, OS id=24907
LMSd started with pid=20, OS id=24920
MMAN started with pid=21, OS id=24924
DBW0 started with pid=22, OS id=24926
DBW1 started with pid=23, OS id=24928
DBW2 started with pid=24, OS id=24930
DBW3 started with pid=25, OS id=24932
DBW4 started with pid=26, OS id=24934
DBW5 started with pid=27, OS id=24938
DBW6 started with pid=28, OS id=24946
DBW7 started with pid=29, OS id=24948
DBW8 started with pid=30, OS id=24950
DBW9 started with pid=31, OS id=24952
LGWR started with pid=32, OS id=24954
CKPT started with pid=33, OS id=24956
SMON started with pid=34, OS id=24958
RECO started with pid=35, OS id=24968
CJQ0 started with pid=36, OS id=24970
MMON started with pid=37, OS id=24972
MMNL started with pid=38, OS id=24976
Sat Jun 02 14:23:46 GMT 2012
USER: terminating instance due to error 304
Instance terminated by USER, pid = 23354

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

Sat Jun 02 14:22:15 GMT 2012

Starting ORACLE instance (normal)

Sat Jun 02 14:23:25 GMT 2012

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Picked latch-free SCN scheme 3

LICENSE_MAX_USERS = 0

SYS auditing is disabled

ksdpec: called for event 13740 prior to event group initialization

Starting up ORACLE RDBMS Version: 10.2.0.5.0.

System parameters with non-default values:

processes = 1500

timed_statistics = TRUE

__shared_pool_size = 6442450944

shared_pool_size = 6442450944

__large_pool_size = 218103808

large_pool_size = 218103808

__java_pool_size = 33554432

java_pool_size = 33554432

__streams_pool_size = 0

spfile = /ora/db001/data001/SIAP/spfileSIAP.ora

filesystemio_options = SETALL

disk_asynch_io = TRUE

cluster_interconnects = 192.168.0.2

sga_target = 21474836480

control_files = /ora/db001/data001/SIAP/control01SIAP.ctl, /ora/db001/redo002/SIAP/control02SIAP.ctl, /ora/db001/redo001/SIAP/control03SIAP.ctl

control_file_record_keep_time= 30

db_block_size = 8192

__db_cache_size = 14763950080

db_cache_size = 10737418240

db_writer_processes = 10

compatible = 10.2.0

log_archive_config = DG_CONFIG=(SIAP,SIAP_DG)

log_archive_dest_1 = location=/ora/db001/dbfra001/SIAP valid_for=(all_logfiles,all_roles) db_unique_name=SIAP

log_archive_dest_2 = service=SIAP_DG valid_for=(online_logfiles,primary_role) LGWR ASYNC DB_UNIQUE_NAME=SIAP_DG

log_archive_dest_state_1 = ENABLE

log_archive_dest_state_2 = ENABLE

log_archive_max_processes= 10

standby_archive_dest = /ora/db001/dbfra001/SIAP/standby

log_archive_format = arch_SIAP_%s_%t_%r.arc

fal_client = SIAP

fal_server = SIAP_DG

log_buffer = 14234112

db_files = 1000

db_file_multiblock_read_count= 16

cluster_database = TRUE

cluster_database_instances= 2

db_recovery_file_dest = /ora/db001/dbfra001/SIAP

db_recovery_file_dest_size= 64424509440

standby_file_management = AUTO

thread = 2

fast_start_mttr_target = 300

log_checkpoints_to_alert = TRUE

instance_number = 2

undo_management = AUTO

undo_tablespace = undots2

undo_retention = 3600

create_stored_outlines = FALSE

remote_login_passwordfile= EXCLUSIVE

db_domain = in.telstra.world

instance_name = SIAP2

local_listener = LISTENER_VUS160PO

remote_listener = LISTENER_SIAP

session_cached_cursors = 100

utl_file_dir = /tmp

job_queue_processes = 2

cursor_sharing = force

background_dump_dest = /opt/app/oracle/admin/SIAP/bdump

user_dump_dest = /opt/app/oracle/admin/SIAP/udump

max_dump_file_size = 5M

core_dump_dest = /opt/app/oracle/admin/SIAP/cdump

audit_file_dest = /opt/app/oracle/admin/SIAP/adump

session_max_open_files = 200

open_links = 4

audit_trail = DB

db_name = SIAP

open_cursors = 1000

star_transformation_enabled= true

query_rewrite_enabled = true

query_rewrite_integrity = trusted

pga_aggregate_target = 17179869184

workarea_size_policy = auto

Cluster communication is configured to use the following interface(s) for this instance

192.168.0.2

Sat Jun 02 14:23:28 GMT 2012

cluster interconnect IPC version:Oracle UDP/IP (generic)

IPC Vendor 1 proto 2

PMON started with pid=2, OS id=24777

DIAG started with pid=3, OS id=24779

PSP0 started with pid=4, OS id=24781

LMON started with pid=5, OS id=24783

LMD0 started with pid=6, OS id=24785

LMS0 started with pid=7, OS id=24787

LMS1 started with pid=8, OS id=24794

LMS2 started with pid=9, OS id=24803

LMS3 started with pid=10, OS id=24807

LMS4 started with pid=11, OS id=24811

LMS5 started with pid=12, OS id=24816

LMS6 started with pid=13, OS id=24841

LMS7 started with pid=14, OS id=24852

LMS8 started with pid=15, OS id=24857

LMS9 started with pid=16, OS id=24861

LMSa started with pid=17, OS id=24865

LMSb started with pid=18, OS id=24901

LMSc started with pid=19, OS id=24907

LMSd started with pid=20, OS id=24920

MMAN started with pid=21, OS id=24924

DBW0 started with pid=22, OS id=24926

DBW1 started with pid=23, OS id=24928

DBW2 started with pid=24, OS id=24930

DBW3 started with pid=25, OS id=24932

DBW4 started with pid=26, OS id=24934

DBW5 started with pid=27, OS id=24938

DBW6 started with pid=28, OS id=24946

DBW7 started with pid=29, OS id=24948

DBW8 started with pid=30, OS id=24950

DBW9 started with pid=31, OS id=24952

LGWR started with pid=32, OS id=24954

CKPT started with pid=33, OS id=24956

SMON started with pid=34, OS id=24958

RECO started with pid=35, OS id=24968

CJQ0 started with pid=36, OS id=24970

MMON started with pid=37, OS id=24972

MMNL started with pid=38, OS id=24976

Sat Jun 02 14:23:46 GMT 2012

USER: terminating instance due to error 304

Instance terminated by USER, pid = 23354

我们看到节点2其实也尝试启动过，但是遭遇了error 304，所以被自动关闭了。

手工shutdown abort节点1之后，尝试再次重启，还是同样的问题，节点1能启动进程，但是挂死，节点2启动之后被关闭。
不过报错信息稍有不同，只有节点2无法启动的错误了。

oracle@vus029pa:SIAP1:/ora/db001/data001/SIAP $ srvctl start database -d SIAP
PRKP-1001 : Error starting instance SIAP2 on node vus160po
CRS-0215: Could not start resource 'ora.SIAP.SIAP2.inst'.
oracle@vus029pa:SIAP1:/ora/db001/data001/SIAP $

oracle@vus029pa:SIAP1:/ora/db001/data001/SIAP $ srvctl start database -d SIAP

PRKP-1001 : Error starting instance SIAP2 on node vus160po

CRS-0215: Could not start resource 'ora.SIAP.SIAP2.inst'.

oracle@vus029pa:SIAP1:/ora/db001/data001/SIAP $

多次尝试都是该错误，因此，RAC起不来的原因，应该是节点2无法启动，而节点2无法启动，应该是遭遇304的错误。

我们来查一下error 304是什么：

oracle@vus160po:SIAP2:/opt/app/oracle/product/10.2.0/db_1/dbs $   oerr ora 304
00304, 00000, "requested INSTANCE_NUMBER is busy"
// *Cause:  An instance tried to start by using a value of the
//          initialization parameter INSTANCE_NUMBER that is already in use.
// *Action: Either
//       a) specify another INSTANCE_NUMBER,
//       b) shut down the running instance with this number
//       c) wait for instance recovery to complete on the instance with
//          this number.

oracle@vus160po:SIAP2:/opt/app/oracle/product/10.2.0/db_1/dbs $ oerr ora 304

00304, 00000, "requested INSTANCE_NUMBER is busy"

// *Cause: An instance tried to start by using a value of the

// initialization parameter INSTANCE_NUMBER that is already in use.

// *Action: Either

// a) specify another INSTANCE_NUMBER,

// b) shut down the running instance with this number

// c) wait for instance recovery to complete on the instance with

// this number.

根据alertlog，发现节点2的instance_number是2：

……
fast_start_mttr_target   = 300
log_checkpoints_to_alert = TRUE
instance_number          = 2
……

……

fast_start_mttr_target = 300

log_checkpoints_to_alert = TRUE

instance_number = 2

……

尝试修改pfile中的instance_number，改为：

SIAP1.instance_number=3
SIAP2.instance_number=4

1 2	SIAP1.instance_number=3 SIAP2.instance_number=4

尝试再次重启，还是报错：

oracle@vus029pa:SIAP1:/opt/app/oracle/product/10.2.0/db_1/dbs $ srvctl start service -d SIAP -s SIAP2
PRKP-1030 : Failed to start the service SIAP2.
CRS-0215: Could not start resource 'ora.SIAP.SIAP2.cs'.

oracle@vus029pa:SIAP1:/opt/app/oracle/product/10.2.0/db_1/dbs $ srvctl start service -d SIAP -s SIAP2

PRKP-1030 : Failed to start the service SIAP2.

CRS-0215: Could not start resource 'ora.SIAP.SIAP2.cs'.

此时在节点1的alertlog情况和原来相同，也是到MMNL started with pid=xxx这一步就挂住了，但是在节点2的alertlog完全没有新的内容写出来，也就是节点2根本没尝试启动。

由于改instance_number没效果，只好把instance_number再改回去。

尝试手工启动节点2：

SQL> startup nomount
ORA-00304: requested INSTANCE_NUMBER is busy
SQL>

SQL> startup nomount

ORA-00304: requested INSTANCE_NUMBER is busy

SQL>

此时，同事也在一起看这个问题，他发现了故障原因所在：虽然数据库已经被shutdown了，但是在操作系统中，还仍然存在不少LOCAL=NO的进程。

ps -ef |grep SIAP2
oracle  5074     1   0 17:49:36 ?           0:02 oracleSIAP2 (LOCAL=NO)
oracle 18782     1   0 18:00:05 ?           1:01 oracleSIAP2 (LOCAL=NO)
oracle  4649     1   0 17:49:19 ?           0:46 oracleSIAP2 (LOCAL=NO)
oracle 21736     1   0 18:02:31 ?           0:29 oracleSIAP2 (LOCAL=NO)
oracle  3693     1   0   Mar 23 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26392     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 12509  8593   0 17:31:03 pts/4       0:00 tail -f /opt/app/oracle/admin/SIAP/bdump/alert_SIAP2.log
oracle 26382     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 29425     1   0   Apr 14 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26372     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 23702     1   0   May 19 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26368     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26377     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle  7945     1   0   Apr 28 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26396     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26379     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 14564     1   0   May 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26367     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26373     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 23715     1   0   May 19 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle  6311     1   0   Apr 18 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 29428     1   0   Apr 14 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 27062     1   0   Apr 02 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle  1283     1   0   May 12 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26380     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26374     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26369     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26395     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26370     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 20775     1   0   May 28 ?           0:00 oracleSIAP2 (LOCAL=NO)
oracle 26371     1   0   Jan 25 ?           0:00 oracleSIAP2 (LOCAL=NO)

ps -ef |grep SIAP2

oracle 5074 1 0 17:49:36 ? 0:02 oracleSIAP2 (LOCAL=NO)

oracle 18782 1 0 18:00:05 ? 1:01 oracleSIAP2 (LOCAL=NO)

oracle 4649 1 0 17:49:19 ? 0:46 oracleSIAP2 (LOCAL=NO)

oracle 21736 1 0 18:02:31 ? 0:29 oracleSIAP2 (LOCAL=NO)

oracle 3693 1 0 Mar 23 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26392 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 12509 8593 0 17:31:03 pts/4 0:00 tail -f /opt/app/oracle/admin/SIAP/bdump/alert_SIAP2.log

oracle 26382 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 29425 1 0 Apr 14 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26372 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 23702 1 0 May 19 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26368 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26377 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 7945 1 0 Apr 28 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26396 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26379 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 14564 1 0 May 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26367 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26373 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 23715 1 0 May 19 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 6311 1 0 Apr 18 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 29428 1 0 Apr 14 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 27062 1 0 Apr 02 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 1283 1 0 May 12 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26380 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26374 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26369 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26395 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26370 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 20775 1 0 May 28 ? 0:00 oracleSIAP2 (LOCAL=NO)

oracle 26371 1 0 Jan 25 ? 0:00 oracleSIAP2 (LOCAL=NO)

通过ps -ef看到，确实存在很多没有down下去的LOCAL=N的进程。也因为是这个原因，让oracle认为节点2还是在活动的，因此不能重新启动该实例。

而该系统的架构是2台服务器上跑2个RAC，我们目前只down下去了1个RAC，另外一个RAC还在跑，服务器上还有不少LOCAL=NO的进程，因此当时我也没太注意竟然还有down掉的那个RAC的LOCAL=NO的进程。

ok，问题已经找到，用

kill -9 `ps -ef|grep LOCAL=NO |grep SIAP2|grep -v grep|awk '{print $2}'`

1	kill -9 `ps -ef\|grep LOCAL=NO \|grep SIAP2\|grep -v grep\|awk '{print $2}'`

杀掉进程之后，再次重启数据库，成功。

唉，关于杀LOCAL=NO的进程的问题，其实我在很早的时候就写过一篇文章讲停数据库的技巧，没想到在RAC环境中，还是一样适用，没杀掉LOCAL=N进程，竟然会引起数据库无法启动的问题。

RAC无法启动，报错terminating instance due to error 304

相关文章

导回awr报告中的历史执行计划

11g中autotask disable后应该查哪个表来验证？

Listener中rate_limit参数的一些说明

发表回复 取消回复

发表回复取消回复