周末有个加班,需要在一台sever上升级22个instance,数据库从9207升级到9208,在打一个CPUApr2010。在这里简单记录一下升级的过程和经验。
从这个架构来看,这个server上的数据库属于库小且多的类型,虽然每个库都很小,也就几个G,但是数量不少。所以在做9207升级到9208时候,我选择了在升级完oracle软件之后,并发的升级数据字典,来提高效率。同时,由于升级数据字典大部分是sga内的操作,比如编辑,生成数据字典等,为了加快速度,我也将原来100M的shared_pool_size临时的加大了。同时为了保证并发,多个数据库能启动一起跑,也不能加太多。最后找到一个比较平衡的值:sga max size为600M,shared pool size 为200M,java pool为200M,执行catpatch脚本的时候,剩余的物理内存:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 1 0 775970 1154 0 0 0 128 406 0 283 69513 2820 19 4 73 4 1 0 775967 1034 0 0 0 321 1508 0 265 58798 2322 18 3 71 8 2 0 775963 1063 0 0 0 386 870 0 348 58771 2305 19 9 64 8 1 0 776063 1073 0 0 0 450 642 0 388 68940 2642 21 6 70 4 1 0 776064 1684 0 0 0 644 1356 0 374 64883 3350 20 3 72 4 1 0 776058 960 0 0 0 0 0 0 470 51368 4602 18 5 71 6 1 1 776056 1196 0 0 0 514 1173 0 506 44987 4851 18 3 72 7 1 0 776039 1102 0 1 0 323 630 0 551 41085 5471 18 4 69 10 2 0 776040 1024 0 0 0 324 696 0 476 54143 4648 20 4 72 5 2 0 776141 1727 0 0 0 772 4029 0 452 55381 4149 19 3 72 6 1 0 776111 1035 0 0 0 0 0 0 463 61001 4753 20 6 70 4 2 0 776110 1036 0 0 0 322 1077 0 366 48175 3192 18 2 72 8 1 0 776102 1037 0 0 0 193 1741 0 322 49947 2725 19 2 75 4 1 1 776100 1002 0 0 0 129 313 0 281 54982 2177 21 2 71 6 |
可以看到在2个instance起来后,并且同时跑catpatch脚本的时候,大约剩余的物理内存为4M左右(1000个页面,每个页面4K,因此大约为4M)。至于为什么只启动2个600M sga的库,就把物理内存压的那么死,后面我会进一步分析。
升级9207到9208的过程很顺利,原来估计每个库1小时左右的时间,在执行的时候,也就每个库6,7分钟左右。
升级完后,开始打CPUApr2010的补丁。
在利用opatch apply的时候,报错:
OPatch encounters the following issues during file patching:
1 2 3 4 5 6 7 8 9 10 |
The following files had problems with being patched: 1. /u01/oracle/product/9.2.0/lib/libjox9.a [ Couldn't copy /migr/ora_patch/p9352224_CPUApr2010/9352224/files/lib/libjox9.a to /u01/oracle/product/9.2.0/lib/libjox9.a from /migr/ora_patch/p9352224_CPUApr2010/9352224. ] Replying 'Y' will terminate the patch installation immediately. It WILL NOT restore any updates that have been performed to this point. It WILL NOT update the inventory. Replying 'N' will update the inventory showing the patch has been applied. NOTE: After replying either 'Y' or 'N' it is critical to review: My Oracle Support Note 312767.1 How to rollback a failed Interim patch installation. Do you want to STOP? Please respond Y|N > |
ok,这是这个patch在read me中说到的issue:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
Issue 3 When installing the CPUApr2010 using the OPatch utility, you may see the following error: OPatch encounters the following issues during file patching: The following files had problems with being patched:1. <ORACLE_HOME>/lib [ Couldn't copy <Patch location>/files/lib/libjox9.a to <ORACL:E_HOME>/lib from <Patch location>. ] Workaround: 1. Reply Y to the Do you want to STOP? prompt. 2. As root, run the following command: /usr/sbin/slibclean 3. Rollback the patch using the following command: sh $ORACLE_HOME/.patch_storage/123456/rollback_123456.sh (where 123456 is the patch number) 4. Apply the patch again. |
于是按照wordround的方式操作,执行slibclean之后,再次执行opatch。没想到,这次报错的是:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
Comparing "/migr/ora_patch/p9352224_CPUApr2010/9352224/files/lib/liboraolap9.a/ExpressOlapiDataCursorModule.o" and "/u01/oracle/product/9.2.0/.patch_storage/verify/lib/liboraolap9.a/ExpressOlapiDataCursorModule.o" Source file name is : /migr/ora_patch/p9352224_CPUApr2010/9352224/files/lib/liboraolap9.a/ExpressOlapiDataCursorModule.o, size is : 59616 Destination file name(from OracleHome) is : /u01/oracle/product/9.2.0/.patch_storage/verify/lib/liboraolap9.a/ExpressOlapiDataCursorModule.o, size is : 59046 Archive failed: failed to update "/u01/oracle/product/9.2.0/lib/liboraolap9.a" with updated "/migr/ora_patch/p9352224_CPUApr2010/9352224/files/lib/liboraolap9.a/ExpressOlapiDataCursorModule.o" Comparing "/migr/ora_patch/p9352224_CPUApr2010/9352224/files/lib/liboraolap9.a/ExpressOlapiDataSourceModule.o" and "/u01/oracle/product/9.2.0/.patch_storage/verify/lib/liboraolap9.a/ExpressOlapiDataSourceModule.o" Source file name is : /migr/ora_patch/p9352224_CPUApr2010/9352224/files/lib/liboraolap9.a/ExpressOlapiDataSourceModule.o, size is : 148542 Destination file name(from OracleHome) is : /u01/oracle/product/9.2.0/.patch_storage/verify/lib/liboraolap9.a/ExpressOlapiDataSourceModule.o, size is : 147238 Archive failed: failed to update "/u01/oracle/product/9.2.0/lib/liboraolap9.a" with updated "/migr/ora_patch/p9352224_CPUApr2010/9352224/files/lib/liboraolap9.a/ExpressOlapiDataSourceModule.o" Comparing "/migr/ora_patch/p9352224_CPUApr2010/9352224/files/lib/liboraolap9.a/ExpressOlapiModule.o" and "/u01/oracle/product/9.2.0/.patch_storage/verify/lib/liboraolap9.a/ExpressOlapiModule.o" Source file name is : /migr/ora_patch/p9352224_CPUApr2010/9352224/files/lib/liboraolap9.a/ExpressOlapiModule.o, size is : 706 Destination file name(from OracleHome) is : /u01/oracle/product/9.2.0/.patch_storage/verify/lib/liboraolap9.a/ExpressOlapiModule.o, size is : 706 Archive failed: failed to update "/u01/oracle/product/9.2.0/lib/liboraolap9.a" with updated "/migr/ora_patch/p9352224_CPUApr2010/9352224/files/lib/liboraolap9.a/ExpressOlapiModule.o" Comparing "/migr/ora_patch/p9352224_CPUApr2010/9352224/files/lib/liboraolap9.a/xsoqftbl.o" and "/u01/oracle/product/9.2.0/.patch_storage/verify/lib/liboraolap9.a/xsoqftbl.o" Source file name is : /migr/ora_patch/p9352224_CPUApr2010/9352224/files/lib/liboraolap9.a/xsoqftbl.o, size is : 46110 Destination file name(from OracleHome) is : /u01/oracle/product/9.2.0/.patch_storage/verify/lib/liboraolap9.a/xsoqftbl.o, size is : 46110 Archive failed: failed to update "/u01/oracle/product/9.2.0/lib/liboraolap9.a" with updated "/migr/ora_patch/p9352224_CPUApr2010/9352224/files/lib/liboraolap9.a/xsoqftbl.o" There are 56 issues copying files to Oracle Home. There are 14 issues patching Java library in Oracle Home. There are 115 issues patching static library in Oracle Home. FILE PROBLEM: some files are not patched. OPATCH_JAVA_ERROR: Patch was not successfully applied. Verification of the patch failed. ERROR: OPatch failed as verification of the patch failed. [resoprod:oracle:/migr/ora_patch/p9352224_CPUApr2010/9352224:] |
由于用opatch rollback进行回滚,回滚成功,再次进行opatch,还是遇到libjox9.a的报错,再次运行rollback_
1 2 3 4 5 6 7 8 9 10 11 12 13 |
./rollback_9352224.sh[905]: System: not found. ./rollback_9352224.sh[913]: System: not found. ./rollback_9352224.sh[921]: System: not found. ./rollback_9352224.sh[929]: System: not found. ./rollback_9352224.sh[937]: System: not found. ./rollback_9352224.sh[945]: System: not found. ./rollback_9352224.sh[953]: System: not found. ./rollback_9352224.sh[961]: System: not found. ./rollback_9352224.sh[969]: System: not found. ./rollback_9352224.sh[977]: System: not found. cp: /u01/oracle/product/9.2.0//lib/libjox9.a: Cannot open or remove a file containing a running program. Rollback completed. [resoprod:oracle:/u01/oracle/product/9.2.0/.patch_storage/9352224:] |
用fuser和lsof看,看不出是谁用在这个文件:
1 2 3 4 5 6 7 8 |
[resoprod:oracle:/u01/oracle/product/9.2.0/.patch_storage/9352224:] fuser /u01/oracle/product/9.2.0//lib/libjox9.a /u01/oracle/product/9.2.0//lib/libjox9.a: [resoprod:oracle:/u01/oracle/product/9.2.0/.patch_storage/9352224:] [resoprod:oracle:/u01/oracle/product/9.2.0/.patch_storage/9352224:] [resoprod:oracle:/u01/oracle/product/9.2.0/.patch_storage/9352224:] lsof /u01/oracle/product/9.2.0//lib/libjox9.a In while loop:256 Value of I :130 np:256 [resoprod:oracle:/u01/oracle/product/9.2.0/.patch_storage/9352224:] |
尝试重命名:
[resoprod:oracle:/u01/oracle/product/9.2.0/.patch_storage/9352224:] mv $OH//lib/libjox9.a $OH//lib/libjox9.a.bak20100710
再次opatch apply,成功!
这里遇到的一个问题,就是在opatch的时候,aix中会有写lib的文件会被占用,虽然在opatch的read me中会建议使用slibclean来清理,但是在实际的使用情况下往往会不适用,最好的方式就是用cp一个同名的,将原来被占用的那个mv成别的名字。再重新opatch。
最后,来说说那个机器的内存使用情况的问题。之前可能你会觉得奇怪,为什么只是启动了2个instance,每个instance的sga max size为600M,2个合计也就1.2G,为什么在执行catpatch的时候,剩余内存会那么少?
这个我在事后检查了主机的参数设置。
我们在数据库升级和patch完之后,恢复到原来sga大小,启动所有的实例,检查机器的剩余内存:
1 2 3 4 5 6 7 8 |
[resoprod:oracle:/home/oracle:] svmon -G size inuse free pin virtual memory 2621440 2617514 3926 196189 1565904 pg space 2850816 578410 work pers clnt other pin 96974 6 0 99209 in use 1294414 1323054 46 |
剩余内存大约为3926×4k=16M
继续检查内存情况:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
[resoprod:root:/home/root:] vmo -a cpu_scale_memp = 8 data_stagger_interval = 161 defps = 1 force_relalias_lite = 0 framesets = 2 htabscale = n/a kernel_heap_psize = 4096 kernel_psize = 4096 large_page_heap_size = 0 lgpg_regions = 0 lgpg_size = 0 low_ps_handling = 1 lru_file_repage = 1 lru_poll_interval = 10 lrubucket = 131072 maxclient% = 80 maxfree = 1088 maxperm = 2010410 maxperm% = 80 maxpin = 2116994 maxpin% = 80 mbuf_heap_psize = 4096 memory_affinity = 1 memory_frames = 2621440 memplace_data = 2 memplace_mapped_file = 2 memplace_shm_anonymous = 2 memplace_shm_named = 2 memplace_stack = 2 memplace_text = 2 memplace_unmapped_file = 2 mempools = 1 minfree = 960 minperm = 502602 minperm% = 20 nokilluid = 0 npskill = 22272 npsrpgmax = 178176 npsrpgmin = 133632 npsscrubmax = 178176 npsscrubmin = 133632 npswarn = 89088 num_spec_dataseg = 0 numpsblks = 2850816 page_steal_method = 0 pagecoloring = n/a pinnable_frames = 2423453 psm_timeout_interval = 5000 pta_balance_threshold = n/a relalias_percentage = 0 rpgclean = 0 rpgcontrol = 2 scrub = 0 scrubclean = 0 soft_min_lgpgs_vmpool = 0 spec_dataseg_int = 512 strict_maxclient = 1 strict_maxperm = 0 v_pinshm = 0 vm_modlist_threshold = -1 vmm_fork_policy = 1 vmm_mpsize_support = 1 wlm_memlimit_nonpg = 1 [resoprod:root:/home/root:] |
我们看到maxperm为80%。我个人认为,maxperm是用来设置non-computational的分页,(参考《Overview of AIX page replacement》此文),而non-computational的分页,也就是用于文件系统buffer,一般如果database使用裸设备,不经过文件系统缓存,maxperm可以设置成3%~5%,而对于文件系统做datafile的database,这个值也只要设置成20%~30%左右即可。把太多物理内存用于文件系统buffer,是没有必要的,对于database来说,将数据块从datafile中读取之后,就是在db buffer cache中进行运算了。
并且,在IBM的官方文档中也找到相关的文章也印证我的猜想。可见《Oracle 9i在AIX上的性能调整 — 内存篇》:
在Oracle数据库应用的环境下,可以将MINPERM和MAXPERM分别设为5%和20%甚至更小,从而使内存更多地被用于Oracle的SGA而不是系统的文件缓存。
3条评论
very nice !
那针对你现在这个情况,aix可用内存比较少,你改这三个maxclient,maxperm,minperm参数,系统可用内存增加没有呢…
re magscott:修改了这个参数后,效果比较明显。启动24个instance后,剩余内存还有2G左右。