今天接到说某省的legato备份无法执行,查看legato的monitor窗口,发现没有明显报错,直接就出备份fail了信息了。
由于monitor中说“……Hostname(s) Unresolved,1 Failed,1 Succeeded(xj_db Failed)”,一开始是怀疑hostname的问题,但是在备份服务器上ping client都没有问题:
1 2 3 4 5 6 7 8 9 10 11 12 |
C:\Documents and Settings\Administrator>ping xj_db Pinging xj_db [10.203.102.11] with 32 bytes of data: Reply from 10.203.102.11: bytes=32 time<1ms TTL=255 Reply from 10.203.102.11: bytes=32 time<1ms TTL=255 Reply from 10.203.102.11: bytes=32 time<1ms TTL=255 Ping statistics for 10.203.102.11: Packets: Sent = 3, Received = 3, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 0ms, Maximum = 0ms, Average = 0ms |
登录client,也是就db主机,用root权限检查相关log:
进/nsr/applogs目录,vi nsrnmostart.log
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
…… (20721) Legato NetWorker Module for Oracle v4.1 (20721) Tue Jan 6 17:12:20 2009 (20721) Entering Function nwora_process_calling_args (20721) argc = 16 (20721) Calling Summary (20721) argv[ 0] = nsrnmostart (20721) argv[ 1] = -s (20721) argv[ 2] = xj_bak01 (20721) argv[ 3] = -g (20721) argv[ 4] = OracleArch (20721) argv[ 5] = -LL (20721) argv[ 6] = -m (20721) argv[ 7] = xj_db (20721) argv[ 8] = -l (20721) argv[ 9] = full (20721) argv[10] = -q (20721) argv[11] = -W (20721) argv[12] = 78 (20721) argv[13] = -N (20721) argv[14] = /oracle/app/oracle/product/9.2.0/bin/OracleArch (20721) argv[15] = /oracle/app/oracle/product/9.2.0/bin/OracleArch (20721) Environment Read by nsrnmostart (20721) ORACLE_SID = xjmisc (20721) ORACLE_HOME = /oracle/app/oracle/product/9.2.0 (20721) PRECMD = (20721) POSTCMD = (20721) PATH = /bin:/usr/sbin:/usr/bin:/nsr/bin:/opt/networker/bin (20721) NSR_RMAN_ARGUMENTS = msglog '/nsr/applogs/msglog.log' append (20721) NSR_RMAN_OUTPUT = /nsr/applogs/msglog.log append (20721) Leaving Function nwora_process_calling_args (20721) Entering Function nwora_scan_rman_script (20721) Checking rman script /oracle/app/oracle/product/9.2.0/bin/OracleArch for validity. (20721) found connect catalog string. (20721) found connect target string. (20721) found allocate channel: allocate channel t1 type 'sbt_tape' (20721) found allocate channel: allocate channel t2 type 'sbt_tape' (20721) found allocate channel: allocate channel t3 type 'sbt_tape' (20721) Completed checking of rman script. (20721) Leaving Function nwora_scan_rman_script (20721) Entering Function nwora_nsrnmostart_rman (20721) nwora_find_rman_version: file /oracle/app/oracle/product/9.2.0/bin/tmp000002 created (20721) nwora_find_rman_version: RMAN version: major 9, minor 2 (20721) RMAN internal version 0 found after send command testing (20721) savegrp information added to 3 channels (20721) exepath = /oracle/app/oracle/product/9.2.0/bin/rman (20721) cmd_args = msglog '/nsr/applogs/msglog.log' append (20721) rman_script = /oracle/app/oracle/product/9.2.0/bin/nmosb000003 (20721) saveset_name = /oracle/app/oracle/product/9.2.0/bin/OracleArch (20721) Launching backup process (20721) Backup process failed: RMAN exited with return code '1'. (20721) nwora_nsrnmostart_rman: RMAN script execution is not successful. RMAN exited with return code '1'. (20721) Leaving Function nwora_nsrnmostart_rman |
发现是rman的脚本没有执行成功:RMAN script execution is not successful。
我们测试一下rman的脚本。根据legato界面中的group-save set,
找到脚本/oracle/app/oracle/product/9.2.0/bin/OracleArch:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
xj_db01:[/nsr/applogs]#cat /oracle/app/oracle/product/9.2.0/bin/OracleArch connect catalog rman/rman@xjrman; connect target sys/pwd111; run { allocate channel t1 type 'sbt_tape' parms 'ENV=(NSR_CLIENT=xj_db)'; allocate channel t2 type 'sbt_tape' parms 'ENV=(NSR_CLIENT=xj_db)'; allocate channel t3 type 'sbt_tape' parms 'ENV=(NSR_CLIENT=xj_db)'; sql 'alter system archive log current'; crosscheck archivelog all; backup format "arch_%d_t%t_s%s_p%p" (archivelog all delete input); release channel t1; release channel t2; release channel t3; } |
在oracle用户下测试能备份成功!
继续检查/nsr/applogs下的msglog.log:vi msglog.log
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
Recovery Manager: Release 9.2.0.6.0 - 64bit Production Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved. RMAN> connect catalog rman/rman@xjrman; 2> connect target sys/pwd111; 3> run { 4> allocate channel t1 type 'sbt_tape' 5> parms 'ENV=(NSR_CLIENT=xj_db, 6> NSR_SERVER=xj_bak01, 7> NSR_GROUP=OracleArch, 8> NSR_SAVESET_NAME=/oracle/app/oracle/product/9.2.0/bin/OracleArch)'; 9> allocate channel t2 type 'sbt_tape' 10> parms 'ENV=(NSR_CLIENT=xj_db, 11> NSR_SERVER=xj_bak01, 12> NSR_GROUP=OracleArch, 13> NSR_SAVESET_NAME=/oracle/app/oracle/product/9.2.0/bin/OracleArch)'; 14> allocate channel t3 type 'sbt_tape' 15> parms 'ENV=(NSR_CLIENT=xj_db, 16> NSR_SERVER=xj_bak01, 17> NSR_GROUP=OracleArch, 18> NSR_SAVESET_NAME=/oracle/app/oracle/product/9.2.0/bin/OracleArch)'; 19> sql 'alter system archive log current'; 20> crosscheck archivelog all; 21> backup 22> format "arch_%d_t%t_s%s_p%p" 23> (archivelog all delete input); 24> release channel t1; 25> release channel t2; 26> release channel t3; 27> } 28> connected to recovery catalog database RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== ORA-01031: insufficient privileges |
发现rman备份的报错信息了。时候ora-1031的报错,legato是在root下安装,执行的时候,是root用户。现在root用户执行rman脚本报错,难道是root调用oracle用户的环境变量出了问题?
继续找legato的环境变量文件:
根据legato界面的backup command中的文件名,找到/opt/networker/bin/nsrnmo1
cat nsrnmo1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 |
xj_db01:[/opt/networker/bin]#cat nsrnmo1 #!/bin/sh # # $Id: nsrnmo.template,v 1.3.52.4 2003/06/25 21:42:19 yozekinc Exp $ Copyright (c) 2003, Legato Systems, Inc. # # All rights reserved. # # nsrnmo.sh # # Legato Networker Module for Oracle 4.1 # # This script is part of the Legato NetWorker Module for Oracle. # Modification of this script should be done with care and only after reading # the administration manual included with this product. # # This script should only be run as part of a scheduled savegroup. # # Returns 0 on success; 1 on failure. # # # REQUIRED Variable: ORACLE_HOME # # Default value: NONE (site specific) # # Description: Specifies where the Oracle Server installation is located. # It is a requirement that rman be located in ORACLE_HOME/bin. # # Samples: # ORACLE_HOME=/disk3/oracle/app/oracle/product/8.1.6 # ORACLE_HOME=/oracle/app/oracle/product/9.2.0 # REQUIRED Variable: PATH # # Default value: NONE (site and platform specific) # # Description: Set up the PATH environment variable. # This must be configured to include the path to "nsrnmostart" # # Samples: # PATH=/bin:/usr/sbin:/usr/bin:/nsr/bin:/opt/networker/bin # PATH=/bin:/usr/sbin:/usr/bin:/nsr/bin:/opt/networker/bin # # Optional Variable: ORACLE_SID # # Default value: NONE (site specific) # # Description: Specifies the SID of the Oracle database being backed up. # It is required by proxy copy backups when catalog synchronization is # enabled. # # Samples: # ORACLE_SID=orcl815 # ORACLE_SID=xjmisc # # Optional Variable: NSR_RMAN_ARGUMENTS # # Default value: NONE (site specific) # # Description: Provide extra rman parameters. # You must enclose the command in quotes or it will not be # passed correctly to rman. # # Samples: # NSR_RMAN_ARGUMENTS="nocatalog msglog '/nsr/applogs/msglog.log' append" # # NSR_RMAN_ARGUMENTS="nocatalog" # NSR_RMAN_ARGUMENTS="msglog '/nsr/applogs/msglog.log' append" # # Optional Variable: NSR_RMAN_OUTPUT # # Default value: NONE (site specific) # # Description: Provide option to capture the RMAN standard output # if RMAN "msglog" or "log" command line option is not set. # The connect strings will be hidden in this file. # # Samples: # NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log append" # # NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log" # NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log append" # # Optional Variable: NSR_SB_DEBUG_FILE # # Default value: NONE (site specific) # # Description: To enable debugging output for NMO scheduled backups set # the following to an appropriate path and file name. # Set this variable for debugging purposes only # # Samples: # NSR_SB_DEBUG_FILE=/nsr/applogs/nsrnmostart.log # NSR_SB_DEBUG_FILE= # # Optional Variable: PRECMD # # Default value: NONE # # Description: This variable can be used to run a command or command script # before nsrnmostart. It will be launched once for every saveset # entered in the client setup. # PRECMD= # # Optional Variable: POSTCMD # # Default value: NONE # # Description: This variable can be used to run a command or command script # after nsrnmostart has completed. It will be launched once for # every saveset entered in the client setup. # POSTCMD= # # Optional Variable: SHLIB_PATH,LD_LIBRARY_PATH # # Default value: NONE # # Description: These variables may have to be set on HP-UX 11.0 (64 bit) operating systems. # We suggest leaving it unset unless you have a scheduled backup problem. # If it is set you must also uncomment the export SHLIB_PATH and LD_LIBRARY_PATH # in the function export_environment_variables below. # # Samples: # SHLIB_PATH=/disk3/oracle/app/oracle/product/8.1.6/lib # LD_LIBRARY_PATH=/disk3/oracle/app/oracle/product/8.1.6/lib64 # # # Optional Variable: TNS_ADMIN # # Default value: NONE # # Description: This variable needs to be set if Oracle Net configuration # files are not located in default locations.If it is set you must also uncomment # the export TNS_ADMIN in the function export_environment_variables below. # # Samples: # TNS_ADMIN=/disk3/oracle/app/oracle/product/8.1.6/network/admin1 # export_environment_variables() { export ORACLE_HOME export ORACLE_SID export NSR_RMAN_ARGUMENTS export NSR_RMAN_OUTPUT export PRECMD export POSTCMD export PATH export NSR_SB_DEBUG_FILE #export SHLIB_PATH #export LD_LIBRARY_PATH #export TNS_ADMIN } ########################################################################### # Do not edit anything below this line. ########################################################################### Pid=0 # process to kill if we are cancelled nsrnmostart_status=0 # did it work? # # Handle cancel signals sent by savegrp when user stops the group. # handle_signal() { if [ $Pid != 0 ]; then kill -2 $Pid fi exit 1 } # # The main portion of this shell. # # # Make sure we respond to savegrp cancellations. # trap handle_signal 2 15 # # Build the nsrnmostart command # opts="" while [ $# -gt 0 ]; do case "$1" in -s ) # server name opts="$opts $1 '$2'" shift 2 ;; -N ) # save set name opts="$opts $1 '$2'" shift 2 ;; -e ) # expiration time opts="$opts $1 '$2'" shift 2 ;; -b ) # Specify pool opts="$opts $1 '$2'" shift 2 ;; -c ) # Specify the client name opts="$opts $1 '$2'" shift 2 ;; -g ) # Specify group opts="$opts $1 '$2'" shift 2 ;; -m ) # Specify masquerade opts="$opts $1 '$2'" shift 2 ;; -A ) # Specify PowerSnap options opts="$opts $1 '$2'" shift 2 ;; *) # rest of options opts="$opts $1" shift ;; esac done if [ "${BACKUP_OPT}" != "" ]; then BACKUP_COMMAND_LINE="nsrnmostart ""$BACKUP_OPT"" $opts" else BACKUP_COMMAND_LINE="nsrnmostart $opts" fi # # Export all necessary environment variables # export_environment_variables # # Call nsrnmostart to do the backups. # #print $BACKUP_COMMAND_LINE eval ${BACKUP_COMMAND_LINE} & Pid=$! wait $Pid nsrnmostart_status=$? if [ $nsrnmostart_status != 0 ] ; then echo "nsrnmostart returned status of "$nsrnmostart_status echo $0 "exiting." exit 1 fi exit 0 |
检查发现里面的环境变量没有问题:ORACLE_SID,ORACLE_HOME,PATH都设置正确了。
在root手工测试了一次指定环境变量,手工连target数据库:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
xj_db01:[/opt/networker/bin]#export ORACLE_SID=xjmisc xj_db01:[/opt/networker/bin]#export ORACLE_HOME=/oracle/app/oracle/product/9.2.0 xj_db01:[/opt/networker/bin]#export PATH=$ORACLE_HOME/bin xj_db01:[/opt/networker/bin]#rman Recovery Manager: Release 9.2.0.6.0 - 64bit Production Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved. RMAN> connect target sys/pwd111 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== ORA-01031: insufficient privileges RMAN> exit |
发现确实root用户无法登录。
检查数据库的登录策略设置:
切换到oracle用户,sqlplus登录后:
1 2 3 4 5 6 7 8 9 10 11 |
SQL> show parameter remote NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ remote_archive_enable string true remote_dependencies_mode string TIMESTAMP remote_listener string remote_login_passwordfile string EXCLUSIVE remote_os_authent boolean FALSE remote_os_roles boolean FALSE SQL> |
上述策略表示除了dba组用户之外,其他用户登录需要通过密码文件验证。
进一步查看密码文件的创建时间:
1 2 3 4 5 6 7 8 9 10 11 12 |
oracle@xj_db01:/oracle/app/oracle/product/9.2.0 > cd dbs oracle@xj_db01:/oracle/app/oracle/product/9.2.0/dbs > ll total 27548 -rw-r--r-- 1 oracle dba 8385 Mar 9 2002 init.ora -rw-r--r-- 1 oracle dba 12920 Mar 9 2002 initdw.ora -rw-r--rw- 1 oracle dba 1041 Jun 19 2005 initxjmisc.bak -rw-rw-rw- 1 oracle dba 70 Apr 28 2008 initxjmisc.ora -rw-rw-rw- 1 oracle dba 36 Dec 26 2005 initxjmisc.ora.20051226 -rw-rw---- 1 oracle dba 24 Dec 3 05:07 lkXJMISC -rwSr----- 1 oracle dba 3072 Jan 5 16:34 orapwxjmisc -rw-rw---- 1 oracle dba 14065664 Jan 6 15:49 snapcf_xjmisc.f oracle@xj_db01:/oracle/app/oracle/product/9.2.0/dbs > |
发现密码文件的时间是最近的,因此判断最近有人改过sys用户的密码!!
咨询驻点后,确认了在5日下午,有人确实改动了sys用户的密码,将密码改成了pwd222,因此,本次故障的原因确认。