legato备份磨难记

华西某省进行数据库主机升级，将一台老机器升级到新机器上，在新机器上安装完数据库软件之后，把存储从老机器往新机器一挂，数据库算是迁移完成了。但是自从升级完成后，legato一直备份不成功。为了这个问题，和华西的dba王术成研究了好几天。

第一天：
发现备份执行完了，但任务状态不恢复正常，检查数据库主机的归档目录，arch日志已经被备份走，但是在legato的GUI界面，发现还是备份作业未完成的闹钟状态：

重启server端的networker服务，无效。重启client端的networker服务，无效。
检查相关进程：

root@gz_dc01:/ # ps -ef |grep nsr
    root 17205 16714  0 12:38:21 ?         0:01 /opt/networker/bin/nsrmmd -n 7 -s gz_bak01
    root 17207 16714  0 12:38:24 ?         0:00 /opt/networker/bin/nsrmmd -n 9 -s gz_bak01
    root 17584 16714  0 12:43:22 ?         0:00 /opt/networker/bin/nsrmmd -n 2 -s gz_bak01
    root 20543     1  0 13:20:12 ?         0:00 /bin/sh /opt/networker/bin/nsrnmo1 -s gz_bak01 -g OracleArch -L
    root 17204 16714  0 12:38:19 ?         0:00 /opt/networker/bin/nsrmmd -n 5 -s gz_bak01
    root 20544 20543 192 13:20:12 ?        47:29 nsrnmostart -s gz_bak01 -g OracleArch -LL -m gz_db -l full -q -
    root 24951 24484  0 14:17:33 pts/tb    0:00 grep nsr
    root 16714 16712  0 12:32:08 ?         0:00 /opt/networker/bin/nsrexecd
    root 16712     1  0 12:32:08 ?         0:00 /opt/networker/bin/nsrexecd
root@gz_dc01:/ # 
root@gz_dc01:/ #

root@gz_dc01:/ # ps -ef |grep nsr

root 17205 16714 0 12:38:21 ? 0:01 /opt/networker/bin/nsrmmd -n 7 -s gz_bak01

root 17207 16714 0 12:38:24 ? 0:00 /opt/networker/bin/nsrmmd -n 9 -s gz_bak01

root 17584 16714 0 12:43:22 ? 0:00 /opt/networker/bin/nsrmmd -n 2 -s gz_bak01

root 20543 1 0 13:20:12 ? 0:00 /bin/sh /opt/networker/bin/nsrnmo1 -s gz_bak01 -g OracleArch -L

root 17204 16714 0 12:38:19 ? 0:00 /opt/networker/bin/nsrmmd -n 5 -s gz_bak01

root 20544 20543 192 13:20:12 ? 47:29 nsrnmostart -s gz_bak01 -g OracleArch -LL -m gz_db -l full -q -

root 24951 24484 0 14:17:33 pts/tb 0:00 grep nsr

root 16714 16712 0 12:32:08 ? 0:00 /opt/networker/bin/nsrexecd

root 16712 1 0 12:32:08 ? 0:00 /opt/networker/bin/nsrexecd

root@gz_dc01:/ #

发现rman进程已经完成，按照原理，当rman备份完的时候，会写index和bootstrap，但是目前却一直停留在rman完成的状态，nsrnmostart -s gz_bak01 -g OracleArch -LL -m gz_db -l full -q -进程一直挂死。如果在GUI界面中stop掉作业，在client端还是能看到上述的进程。

在GUI重新配置group和client，指定scheduel，再次备份，还是挂死，且此次rman备份完之后，磁带都没被eject出来，手工umount磁带，报错了，晕倒！

没有其他什么办法了，只好叫人去机房重启带库。

第二天：
昨晚重启带库后，今天尝试备份还是一样的挂死，按理说应该开始备份index和bootstrap了，而index和bootstrap是需要用tape0或者tape1来写，是否是本地驱动器的问题？检查用tape0和tape1做label，发现能正常完成。但是在检查计算机的硬件时，发现介质变换器被启动了！

介质变换器是带库在windows机器上的硬件显示，带库是HP的MSL 6000的带库。但是由于我们在windows安装legato的备份软件，legato会调用自己的接口去找带库。如果启动了介质变换器，就容易造成legato自身的接口和介质变换器的接口之间的争用。因此必须禁用介质变换器。
ps：如果是用DP的备份软件，则一定不能禁用介质变换器。DP需要通过介质变换器去调用带库。

禁用后，测试备份，还是挂死。
重启windows机器，重启client的networker的服务，重新配置jbconfig。原以为找到原因了，却还是不行。彻底崩溃……

第三天：
为啥还是不能写index和bootstrap，难道真的是本地驱动器有问题？去HP的网站上下载了LTT（HP StorageWorks Library and Tape Tools），用LTT检查driver状态正常，升级driver和library的firmware：

再次尝试备份，还是挂死。

再次核对各个脚本，oraclearch备份脚本，nsrnmo1脚本：

#!/bin/sh
#
# $Id: nsrnmo.template,v 1.3.52.4 2003/06/25 21:42:19 yozekinc Exp $ Copyright (c) 2003, Legato Systems, Inc.
#
# All rights reserved.
#
# nsrnmo.sh
#
# Legato Networker Module for Oracle 4.1
#
# This script is part of the Legato NetWorker Module for Oracle.
# Modification of this script should be done with care and only after reading
# the administration manual included with this product.
#
# This script should only be run as part of a scheduled savegroup.
#
# Returns 0 on success; 1 on failure.
#

#
# REQUIRED Variable: ORACLE_HOME
#
# Default value: NONE (site specific)
#
# Description: Specifies where the Oracle Server installation is located.
# It is a requirement that rman be located in ORACLE_HOME/bin.
#
# Samples: 
#       ORACLE_HOME=/disk3/oracle/app/oracle/product/8.1.6
#
ORACLE_HOME=/oracle/app/oracle/product/9.2.0


# REQUIRED Variable: PATH
#
# Default value: NONE (site and platform specific)
#
# Description: Set up the PATH environment variable.
# This must be configured to include the path to "nsrnmostart"
#
# Samples: 
#       PATH=/bin:/usr/sbin:/usr/bin:/nsr/bin:/opt/networker/bin
#
PATH=/bin:/usr/sbin:/usr/bin:/nsr/bin:/opt/networker/bin

#
# Optional Variable: ORACLE_SID
#
# Default value: NONE (site specific)
#
# Description: Specifies the SID of the Oracle database being backed up.
# It is required by proxy copy backups when catalog synchronization is
# enabled.
#
# Samples: 
#       ORACLE_SID=orcl815
#
ORACLE_SID=gzmisc

#
# Optional Variable: NSR_RMAN_ARGUMENTS
#
# Default value: NONE (site specific)
#
# Description: Provide extra rman parameters.
# You must enclose the command in quotes or it will not be 
# passed correctly to rman.
#
# Samples: 
#       NSR_RMAN_ARGUMENTS="nocatalog msglog '/nsr/applogs/msglog.log' append"
#
#       NSR_RMAN_ARGUMENTS="nocatalog"
#
NSR_RMAN_ARGUMENTS="msglog '/nsr/applogs/msglog.log' append"

#
# Optional Variable: NSR_RMAN_OUTPUT
#
# Default value: NONE (site specific)
#
# Description: Provide option to capture the RMAN standard output
# if RMAN "msglog" or "log" command line option is not set.
# The connect strings will be hidden in this file.
#
# Samples:
#       NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log append"
#
#       NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log"
#
NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log append"

#
# Optional Variable: NSR_SB_DEBUG_FILE
#
# Default value: NONE (site specific)
#
# Description:  To enable debugging output for NMO scheduled backups set
#                               the following to an appropriate path and file name.
#                               Set this variable for debugging purposes only 
#
# Samples:
#       NSR_SB_DEBUG_FILE=/nsr/applogs/nsrnmostart.log
#
NSR_SB_DEBUG_FILE=

#
# Optional Variable: PRECMD
#
# Default value: NONE
#
# Description:  This variable can be used to run a command or command script
#                               before nsrnmostart. It will be launched once for every saveset
#                               entered in the client setup.
#
PRECMD=

#
# Optional Variable: POSTCMD
#
# Default value: NONE
#
# Description:  This variable can be used to run a command or command script
#                               after nsrnmostart has completed. It will be launched once for
#                               every saveset entered in the client setup.
#
POSTCMD=

#
# Optional Variable: SHLIB_PATH,LD_LIBRARY_PATH
#
# Default value: NONE
#
# Description:  These variables may have to be set on HP-UX 11.0 (64 bit) operating systems.
#                               We suggest leaving it unset unless you have a scheduled backup problem.
#                               If it is set you must also uncomment the export SHLIB_PATH and LD_LIBRARY_PATH
#                               in the function export_environment_variables below.
#
# Samples: 
#       SHLIB_PATH=/disk3/oracle/app/oracle/product/8.1.6/lib
#       LD_LIBRARY_PATH=/disk3/oracle/app/oracle/product/8.1.6/lib64
#

#
# Optional Variable: TNS_ADMIN
#
# Default value: NONE
#
# Description:  This variable needs to be set if Oracle Net configuration
#                       files are not located in default locations.If it is set you must also uncomment
#                       the export TNS_ADMIN in the function export_environment_variables below.
#
# Samples: 
#       TNS_ADMIN=/disk3/oracle/app/oracle/product/8.1.6/network/admin1
#

export_environment_variables()
{

export ORACLE_HOME
export ORACLE_SID
export NSR_RMAN_ARGUMENTS
export NSR_RMAN_OUTPUT
export PRECMD
export POSTCMD
export PATH
export NSR_SB_DEBUG_FILE
#export SHLIB_PATH
#export LD_LIBRARY_PATH
#export TNS_ADMIN

}


###########################################################################
# Do not edit anything below this line.
###########################################################################


Pid=0                   # process to kill if we are cancelled
nsrnmostart_status=0    # did it work?


#
# Handle cancel signals sent by savegrp when user stops the group.
#
handle_signal()
{
        if [ $Pid != 0 ]; then
                kill -2 $Pid
        fi
        exit 1
}

#
# The main portion of this shell.
#

#
# Make sure we respond to savegrp cancellations.
#
trap handle_signal 2 15

#
# Build the nsrnmostart command
#

opts=""
while [ $# -gt 0 ]; do
        case "$1" in
        -s )    # server name
                opts="$opts $1 '$2'"
                shift 2
                ;;
        -N )    # save set name
                opts="$opts $1 '$2'"
                shift 2 
                ;;
        -e )    # expiration time
                opts="$opts $1 '$2'"
                shift 2 
                ;;
        -b )    # Specify pool
                opts="$opts $1 '$2'"
                shift 2 
                ;;
        -c )    # Specify the client name
                opts="$opts $1 '$2'"
 
                shift 2 
                ;;
        -g )    # Specify group
                opts="$opts $1 '$2'"
                shift 2 
                ;;
        -m )    # Specify masquerade
                opts="$opts $1 '$2'"
                shift 2 
                ;;
        -A )    # Specify PowerSnap options
                opts="$opts $1 '$2'"
                shift 2 
                ;;
        *)      # rest of options      
                opts="$opts $1"
                shift
                ;;
        esac
done

if [ "${BACKUP_OPT}" != "" ];
then
        BACKUP_COMMAND_LINE="nsrnmostart ""$BACKUP_OPT"" $opts"
else
        BACKUP_COMMAND_LINE="nsrnmostart $opts"
fi

#
# Export all necessary environment variables
#
export_environment_variables

#
# Call nsrnmostart to do the backups.
#

#print $BACKUP_COMMAND_LINE
eval ${BACKUP_COMMAND_LINE} &
 Pid=$!
 wait $Pid

 nsrnmostart_status=$?
 if [ $nsrnmostart_status != 0 ] ; then
        echo "nsrnmostart returned status of "$nsrnmostart_status
        echo  $0 "exiting."
        exit 1
 fi

exit 0

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

#!/bin/sh

# nsrnmo.sh

# Legato Networker Module for Oracle 4.1

# This script is part of the Legato NetWorker Module for Oracle.

# Modification of this script should be done with care and only after reading

# the administration manual included with this product.

# This script should only be run as part of a scheduled savegroup.

# Returns 0 on success; 1 on failure.

# REQUIRED Variable: ORACLE_HOME

# Default value: NONE (site specific)

# Description: Specifies where the Oracle Server installation is located.

# It is a requirement that rman be located in ORACLE_HOME/bin.

# Samples:

# ORACLE_HOME=/disk3/oracle/app/oracle/product/8.1.6

ORACLE_HOME=/oracle/app/oracle/product/9.2.0

# REQUIRED Variable: PATH

# Default value: NONE (site and platform specific)

# Description: Set up the PATH environment variable.

# This must be configured to include the path to "nsrnmostart"

# Samples:

# PATH=/bin:/usr/sbin:/usr/bin:/nsr/bin:/opt/networker/bin

PATH=/bin:/usr/sbin:/usr/bin:/nsr/bin:/opt/networker/bin

# Optional Variable: ORACLE_SID

# Default value: NONE (site specific)

# Description: Specifies the SID of the Oracle database being backed up.

# It is required by proxy copy backups when catalog synchronization is

# enabled.

# Samples:

# ORACLE_SID=orcl815

ORACLE_SID=gzmisc

# Optional Variable: NSR_RMAN_ARGUMENTS

# Default value: NONE (site specific)

# Description: Provide extra rman parameters.

# You must enclose the command in quotes or it will not be

# passed correctly to rman.

# Samples:

# NSR_RMAN_ARGUMENTS="nocatalog msglog '/nsr/applogs/msglog.log' append"

# NSR_RMAN_ARGUMENTS="nocatalog"

NSR_RMAN_ARGUMENTS="msglog '/nsr/applogs/msglog.log' append"

# Optional Variable: NSR_RMAN_OUTPUT

# Default value: NONE (site specific)

# Description: Provide option to capture the RMAN standard output

# if RMAN "msglog" or "log" command line option is not set.

# The connect strings will be hidden in this file.

# Samples:

# NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log append"

# NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log"

NSR_RMAN_OUTPUT="/nsr/applogs/msglog.log append"

# Optional Variable: NSR_SB_DEBUG_FILE

# Default value: NONE (site specific)

# Description: To enable debugging output for NMO scheduled backups set

# the following to an appropriate path and file name.

# Set this variable for debugging purposes only

# Samples:

# NSR_SB_DEBUG_FILE=/nsr/applogs/nsrnmostart.log

NSR_SB_DEBUG_FILE=

# Optional Variable: PRECMD

# Default value: NONE

# Description: This variable can be used to run a command or command script

# before nsrnmostart. It will be launched once for every saveset

# entered in the client setup.

PRECMD=

# Optional Variable: POSTCMD

# Default value: NONE

# Description: This variable can be used to run a command or command script

# after nsrnmostart has completed. It will be launched once for

# every saveset entered in the client setup.

POSTCMD=

# Optional Variable: SHLIB_PATH,LD_LIBRARY_PATH

# Default value: NONE

# Description: These variables may have to be set on HP-UX 11.0 (64 bit) operating systems.

# We suggest leaving it unset unless you have a scheduled backup problem.

# If it is set you must also uncomment the export SHLIB_PATH and LD_LIBRARY_PATH

# in the function export_environment_variables below.

# Samples:

# SHLIB_PATH=/disk3/oracle/app/oracle/product/8.1.6/lib

# LD_LIBRARY_PATH=/disk3/oracle/app/oracle/product/8.1.6/lib64

# Optional Variable: TNS_ADMIN

# Default value: NONE

# Description: This variable needs to be set if Oracle Net configuration

# files are not located in default locations.If it is set you must also uncomment

# the export TNS_ADMIN in the function export_environment_variables below.

# Samples:

# TNS_ADMIN=/disk3/oracle/app/oracle/product/8.1.6/network/admin1

export_environment_variables()

{

export ORACLE_HOME

export ORACLE_SID

export NSR_RMAN_ARGUMENTS

export NSR_RMAN_OUTPUT

export PRECMD

export POSTCMD

export PATH

export NSR_SB_DEBUG_FILE

#export SHLIB_PATH

#export LD_LIBRARY_PATH

#export TNS_ADMIN

}

###########################################################################

# Do not edit anything below this line.

###########################################################################

Pid=0 # process to kill if we are cancelled

nsrnmostart_status=0 # did it work?

# Handle cancel signals sent by savegrp when user stops the group.

handle_signal()

{

if [ $Pid != 0 ]; then

kill -2 $Pid

exit 1

}

# The main portion of this shell.

# Make sure we respond to savegrp cancellations.

trap handle_signal 2 15

# Build the nsrnmostart command

opts=""

while [ $# -gt 0 ]; do

case "$1" in

-s ) # server name

opts="$opts $1 '$2'"

shift 2

;;

-N ) # save set name

opts="$opts $1 '$2'"

shift 2

;;

-e ) # expiration time

opts="$opts $1 '$2'"

shift 2

;;

-b ) # Specify pool

opts="$opts $1 '$2'"

shift 2

;;

-c ) # Specify the client name

opts="$opts $1 '$2'"

shift 2

;;

-g ) # Specify group

opts="$opts $1 '$2'"

shift 2

;;

-m ) # Specify masquerade

opts="$opts $1 '$2'"

shift 2

;;

-A ) # Specify PowerSnap options

opts="$opts $1 '$2'"

shift 2

;;

*) # rest of options

opts="$opts $1"

shift

;;

esac

done

if [ "${BACKUP_OPT}" != "" ];

then

BACKUP_COMMAND_LINE="nsrnmostart ""$BACKUP_OPT"" $opts"

else

BACKUP_COMMAND_LINE="nsrnmostart $opts"

# Export all necessary environment variables

export_environment_variables

# Call nsrnmostart to do the backups.

#print $BACKUP_COMMAND_LINE

eval ${BACKUP_COMMAND_LINE} &

Pid=$!

wait $Pid

nsrnmostart_status=$?

if [ $nsrnmostart_status != 0 ] ; then

echo "nsrnmostart returned status of "$nsrnmostart_status

echo $0 "exiting."

exit 1

exit 0

发现均正常。
通过检查脚本，添加log跟踪调试，也了解了legato的调用原理，在windows的机器上调用client的nsrnmo1脚本，nsrnmo1脚本会接收从windows主机传来的参数，再调用nsrnmostart脚本，生成执行语句。因此上面的脚本中的第254行，我们跟踪到实际的语句是：

nsrnmostart -s 'gz_bak01' -g 'OracleArch' -LL -m 'gz_db' -t 1235638289 -l incr -q -W 78 -N '/oracle/app/oracle/product/9.2.0/bin/oraclearch' /oracle/app/oracle/product/9.2.0/bin/oraclearch

1	nsrnmostart -s 'gz_bak01' -g 'OracleArch' -LL -m 'gz_db' -t 1235638289 -l incr -q -W 78 -N '/oracle/app/oracle/product/9.2.0/bin/oraclearch' /oracle/app/oracle/product/9.2.0/bin/oraclearch

当nsrnmo1执行完成后，再写index和bootstrap。此时一个想法升起：难道是nsrnmo1无法正常结束才导致后续写index和bootstrap无法进行？

尝试在GUI中再次启动备份作业，但是挂死的时候，kill掉nsrnmo1的进程，果然！开始写index和bootstrap了！
但是，为何nsrnmo1无法正常结束呢？

第四天：
在网上搜索相关的信息，发现一个CU的帖子是和我遇到问题的现象很类似。原因是nsrnmo的版本问题，他在oracle 10g下，用nmo 4.2不能正常结束进程，但是用nmo 4.1却可以正常结束。赶紧叫主机工程师查nmo版本，结果是令人兴奋的！！：

发生问题的贵州的nmo版本是：
NMO 4.2 NetWorker Module for Oracle
NetWorker 7.1.2.Build.325 NetWorker for HP-UX 11.XX (64-bit)

对比能正常备份的陕西的版本是：
NMO 4.1 Legato NetWorker Module for Oracle
NetWorker 7.1.2.Build.325 NetWorker for HP-UX 11.XX (64-bit)

再次问了做工程的工程师，他说当时在桌上拿了张光盘，也没仔细看版本！！

于是赶紧重新安装了nmo4.1的包，再次测试备份，终于这次备份终于在rman备份结束后，开始抓带写index和bootstrap了！不过兴奋的心情很快就消失了，在抓带的时候，又报错read open error：

第五天：
如果备份的时候，报错read open error，那么我测试label会怎么样呢？
用slow方式指定tape0或者tape1做label，正常。

用fast的方式，报错：

再次尝试备份，log中还是一样的报错：

02/20/09 11:04:06 nsrd: media event cleared: Waiting for 1 writable volumes to backup pool 'Default' tape(s) on gz_dc01
02/20/09 11:04:06 nsrd: gz_db:/oracle/app/oracle/product/9.2.0/bin/oraclearch saving to pool 'Default' (FZX239L1)
02/20/09 11:04:06 nsrd: gz_db:/oracle/app/oracle/product/9.2.0/bin/oraclearch saving to pool 'Default' (FZX239L1)
02/20/09 11:04:25 nsrd: gz_db:/oracle/app/oracle/product/9.2.0/bin/oraclearch done saving to pool 'Default' (FZX239L1) 255 MB
02/20/09 11:04:28 nsrd: gz_db:/oracle/app/oracle/product/9.2.0/bin/oraclearch done saving to pool 'Default' (FZX239L1) 522 MB
02/20/09 11:04:42 nsrd: media info: suggest mounting FZX243L1 on gz_bak01 for writing  to pool 'index'
02/20/09 11:04:42 nsrd: media waiting event: Waiting for 1 writable volumes to backup pool 'index' tape(s) on gz_bak01
02/20/09 11:04:43 nsrd: media info: loading volume FZX243L1 into \\.\Tape1
02/20/09 11:04:44 nsrmmd #12: Start nsrmmd #12, with PID 544, at HOST gz_bak01
02/20/09 11:05:01 nsrd: \\.\Tape1 1:Verify label operation in progress
02/20/09 11:05:04 nsrd: write completion notice: Writing to volume FZX239L1 complete
02/20/09 11:05:16 nsrd: media warning: \\.\Tape1 reading: read open error: drive status is The tape drive is ready for use
02/20/09 11:05:17 nsrd: \\.\Tape1 1:Eject operation in progress
02/20/09 11:05:57 nsrd: Jukebox 'hpmsl6000' failed: expected volume 'FZX243L1' got 'NULL'.
02/20/09 11:06:04 nsrd: media info: suggest mounting RHY016L2 on gz_bak01 for writing  to pool 'index'
02/20/09 11:06:05 nsrd: media info: loading volume RHY016L2 into \\.\Tape1
02/20/09 11:06:22 nsrd: \\.\Tape1 1:Verify label operation in progress
02/20/09 11:06:38 nsrd: media warning: \\.\Tape1 reading: read open error: drive status is The tape drive is ready for use
02/20/09 11:06:38 nsrd: \\.\Tape1 1:Eject operation in progress
02/20/09 11:07:18 nsrd: Jukebox 'hpmsl6000' failed: expected volume 'RHY016L2' got 'NULL'.
02/20/09 11:07:24 nsrd: media info: Suggest manually labeling a new writable volume for pool 'index'

02/20/09 11:04:06 nsrd: media event cleared: Waiting for 1 writable volumes to backup pool 'Default' tape(s) on gz_dc01

02/20/09 11:04:06 nsrd: gz_db:/oracle/app/oracle/product/9.2.0/bin/oraclearch saving to pool 'Default' (FZX239L1)

02/20/09 11:04:25 nsrd: gz_db:/oracle/app/oracle/product/9.2.0/bin/oraclearch done saving to pool 'Default' (FZX239L1) 255 MB

02/20/09 11:04:28 nsrd: gz_db:/oracle/app/oracle/product/9.2.0/bin/oraclearch done saving to pool 'Default' (FZX239L1) 522 MB

02/20/09 11:04:42 nsrd: media info: suggest mounting FZX243L1 on gz_bak01 for writing to pool 'index'

02/20/09 11:04:42 nsrd: media waiting event: Waiting for 1 writable volumes to backup pool 'index' tape(s) on gz_bak01

02/20/09 11:04:43 nsrd: media info: loading volume FZX243L1 into \\.\Tape1

02/20/09 11:04:44 nsrmmd #12: Start nsrmmd #12, with PID 544, at HOST gz_bak01

02/20/09 11:05:01 nsrd: \\.\Tape1 1:Verify label operation in progress

02/20/09 11:05:04 nsrd: write completion notice: Writing to volume FZX239L1 complete

02/20/09 11:05:16 nsrd: media warning: \\.\Tape1 reading: read open error: drive status is The tape drive is ready for use

02/20/09 11:05:17 nsrd: \\.\Tape1 1:Eject operation in progress

02/20/09 11:05:57 nsrd: Jukebox 'hpmsl6000' failed: expected volume 'FZX243L1' got 'NULL'.

02/20/09 11:06:04 nsrd: media info: suggest mounting RHY016L2 on gz_bak01 for writing to pool 'index'

02/20/09 11:06:05 nsrd: media info: loading volume RHY016L2 into \\.\Tape1

02/20/09 11:06:22 nsrd: \\.\Tape1 1:Verify label operation in progress

02/20/09 11:06:38 nsrd: media warning: \\.\Tape1 reading: read open error: drive status is The tape drive is ready for use

02/20/09 11:06:38 nsrd: \\.\Tape1 1:Eject operation in progress

02/20/09 11:07:18 nsrd: Jukebox 'hpmsl6000' failed: expected volume 'RHY016L2' got 'NULL'.

02/20/09 11:07:24 nsrd: media info: Suggest manually labeling a new writable volume for pool 'index'

按照log的建议，手工label一卷磁带到index pool，仍然报错read open error。

找来北办的dba，Rill来一起讨论这个问题，他说之前遇到过类似的问题，可以试试将device的CDI的访问方式从scsi command该成not used：

修改之后，再次备份，哇！终于成功了！

02/20/09 13:06:18 nsrd: media info: suggest mounting FZX243L1 on gz_bak01 for writing  to pool 'index'
02/20/09 13:06:18 nsrd: media waiting event: Waiting for 1 writable volumes to backup pool 'index' tape(s) on gz_bak01
02/20/09 13:06:19 nsrd: media info: loading volume FZX243L1 into \\.\Tape1
02/20/09 13:06:20 nsrmmd #12: Start nsrmmd #12, with PID 5680, at HOST gz_bak01
02/20/09 13:06:37 nsrd: \\.\Tape1 2:Verify label operation in progress
02/20/09 13:06:41 nsrd: write completion notice: Writing to volume FZX239L1 complete
02/20/09 13:07:01 nsrd: \\.\Tape1 2:Mount operation in progress
02/20/09 13:07:07 nsrd: media event cleared: Waiting for 1 writable volumes to backup pool 'index' tape(s) on gz_bak01
02/20/09 13:07:07 nsrd: gz_bak01:index:gz_db saving to pool 'index' (FZX243L1)
02/20/09 13:07:12 nsrd: gz_bak01:index:gz_db done saving to pool 'index' (FZX243L1) 22 KB
02/20/09 13:07:16 nsrd: gz_bak01:bootstrap saving to pool 'index' (FZX243L1)
02/20/09 13:07:17 nsrmmdbd: media db is saving its data.  This may take a while.
02/20/09 13:07:17 nsrmmdbd: media db is open for business.
02/20/09 13:07:19 nsrd: gz_bak01:bootstrap done saving to pool 'index' (FZX243L1) 802 KB
02/20/09 13:07:25 nsrd: savegroup info: Added 'gz_bak01' to the group 'OracleArch' for bootstrap backup.
02/20/09 13:07:25 nsrd: savegroup notice: OracleArch completed, total 2 client(s), 0 Hostname(s) Unresolved, 0 Failed, 2 Succeeded. 
02/20/09 13:08:07 nsrd: write completion notice: Writing to volume FZX243L1 complete

02/20/09 13:06:18 nsrd: media info: suggest mounting FZX243L1 on gz_bak01 for writing to pool 'index'

02/20/09 13:06:18 nsrd: media waiting event: Waiting for 1 writable volumes to backup pool 'index' tape(s) on gz_bak01

02/20/09 13:06:19 nsrd: media info: loading volume FZX243L1 into \\.\Tape1

02/20/09 13:06:20 nsrmmd #12: Start nsrmmd #12, with PID 5680, at HOST gz_bak01

02/20/09 13:06:37 nsrd: \\.\Tape1 2:Verify label operation in progress

02/20/09 13:06:41 nsrd: write completion notice: Writing to volume FZX239L1 complete

02/20/09 13:07:01 nsrd: \\.\Tape1 2:Mount operation in progress

02/20/09 13:07:07 nsrd: media event cleared: Waiting for 1 writable volumes to backup pool 'index' tape(s) on gz_bak01

02/20/09 13:07:07 nsrd: gz_bak01:index:gz_db saving to pool 'index' (FZX243L1)

02/20/09 13:07:12 nsrd: gz_bak01:index:gz_db done saving to pool 'index' (FZX243L1) 22 KB

02/20/09 13:07:16 nsrd: gz_bak01:bootstrap saving to pool 'index' (FZX243L1)

02/20/09 13:07:17 nsrmmdbd: media db is saving its data. This may take a while.

02/20/09 13:07:17 nsrmmdbd: media db is open for business.

02/20/09 13:07:19 nsrd: gz_bak01:bootstrap done saving to pool 'index' (FZX243L1) 802 KB

02/20/09 13:07:25 nsrd: savegroup info: Added 'gz_bak01' to the group 'OracleArch' for bootstrap backup.

02/20/09 13:07:25 nsrd: savegroup notice: OracleArch completed, total 2 client(s), 0 Hostname(s) Unresolved, 0 Failed, 2 Succeeded.

02/20/09 13:08:07 nsrd: write completion notice: Writing to volume FZX243L1 complete

可是为啥割接之前配置的scsi commands是能够备份的，现在却不可以了呢？

再次把CDI的参数改回scsi commands，见鬼了！这次，能正常备份了！

第六天：
继续追究这个问题，通过查阅相关文档发现：在CDI手工改成not use的时候，会自动清洗一下磁带，而我们之前总是报错read open error，应该就是driver需要清洗了。当CDI改成not used之后，就被清洗了一次，当再次改回scsi commands的时候，由于磁带被清洗过，于是也就正常备份了。

附文档《Legato Command Reference》中的描述：
Next, if the jukebox contains tape devices, you are asked if automated cleaning of devices in the
jukebox should be turned on. If automated cleaning is enabled, the jukebox and all devices in the
jukebox are configured for automated cleaning. On successful installation, the information that
pertains to device cleaning for the jukebox and all its devices are displayed. Note that with the
introduction of the Common Device Interface (CDI), NetWorker now has two events that will
cause an automatic cleaning to occur: schedule-based cleaning, with devices being cleaned after a
certain (configurable) amount of time has elapsed, and on-demand cleaning, where cleaning is initiated
by TapeAlert warnings issued by the devices. Schedule-based cleaning is always active when
autocleaning is enabled. On-demand cleaning is used when the CDI attribute for a tape device is
set to anything other than ’Not Used’ in the device resource. If on-demand cleaning is being used,
you should set the Cleaning Interval for the device itself to a large time, such as 6 months, so that
NetWorker does not clean the device unnecessarily. See nsr_device.5 for a more detailed explanation
of CDI, TapeAlert and Cleaning Interval.

至此，问题和疑惑终于全部解决！

6条评论

jason说道：

2009-03-02 12:02

很值得学习，
顺便问一下，你的网站怎么注册啊。

回复
小荷说道：

2009-03-02 15:03

re jason：我是在dreamhost注册的。注册之后有一键install，完成的很快的。之后把mysql的连接串配一下，就完成了。

回复
wind_7th说道：

2009-06-27 17:02

jm，我是广西的hfg，无意中看到你的这个帖子。关于The tape drive is ready for use的问题，我也是折腾了很久，得到的结结论和你的差不多，不知道是你早还是我早，如果能早点共享就好了：）但是根据我收集到的资料，似乎没有提到从CDI转到not use的时候会清洗磁带，我理解只是对磁带机控制方式的不同。以下是我收集到的一些信息，共享吧：

———————————————————————————
The CDI feature controls how tape drive cleaning is handled.
———————————————————————————

CDI was introduced in NetWorker 7 and is a new API used for communicating with tape devices. CDI helps because it have (sometimes) better clearer error messges and it uses Tape Alert. There have been lots of problems with CDI and certain tape drives.

The best way then usually is to turn off CDI and use the old way off communcating with the tape drive. This is a attribute on the tape device resource in NetWorker you can change from “SCSI commands” to “Not used”.

CDI is a good thing when it is working so you should only change it for test purposes and see if it makes any difference for you. However, I have never had problems with LTO and CDI so far.

———————————————————————————
CDI (read-write, no create)
Not used essentially turns off the use of CDI.

SCSI commands
NetWorker will use the CDI interface to send explicit
SCSI commands to tape drives. This allows the best
control of and status collection from a device and is
the default for SCSI or SCSI-like tape drives directly
under NetWorker’s control.

回复
wind_7th说道：

2009-06-27 17:09

刚看到这段“【文章发布信息】发表于： 2009-02-26 @ 19:38:46 ”，我大概是08年12月初发现的这个情况，可能我发现得稍早一点。如果信息沟通能充份一点，你就可以少这段折腾了，呵呵。

回复
小荷说道：

2009-06-29 11:02

re wind_7th：呵呵，是啊，公司就是缺这样的一个交流平台。郁闷！

回复
niujiantao说道：

2010-12-22 16:16

这个我也用过，给折磨坏了，只是用在了 sybase上

回复

发表回复取消回复

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据。

legato备份磨难记

相关文章

recover时报错ora-289，有意思

在异常捕获时显示全部的堆栈信息

修复missing的datafile

6条评论

发表回复 取消回复

发表回复取消回复