同事遇到了一个比较奇怪的问题,某客户是4个节点的RAC,olsnodes能看到全部的节点,但是check cluster只能看到部分节点,且伴随CRS-4404的报错。
查了mos,和crs-4404的报错都指向gpnp。
“crsctl check cluster -all” command gives CRS-4404, CRS-4405 errors (Doc ID 1392934.1)
CRSCTL CHECK CLUSTER -ALL errors out with CRS-4404 & CRS-2332 seen in GI Alert Log (Doc ID 1620503.1)
但是我认为应该不是gpnp进程的问题,而是mdnsd进程的问题。我在一个3节点的RAC中测试了一下kill gpnpd进程,结果是返回正常,而kill mdnsd进程才发生了只能看到部分节点,且伴随CRS-4404的报错:
在节点3上kill gpnp进程:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
[root@12102-rac3 ~]# ps -ef |grep d.bin root 2034 1 6 21:15 ? 00:00:19 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot oracle 2349 1 0 21:16 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin oracle 2361 1 0 21:16 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/mdnsd.bin oracle 2364 1 1 21:16 ? 00:00:03 /u01/app/12.1.0.2/grid/bin/evmd.bin oracle 2377 1 0 21:16 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/gpnpd.bin oracle 2387 1 3 21:16 ? 00:00:09 /u01/app/12.1.0.2/grid/bin/gipcd.bin oracle 2402 2364 0 21:16 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/evmlogger.bin -o /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.log root 2415 1 1 21:16 ? 00:00:03 /u01/app/12.1.0.2/grid/bin/orarootagent.bin root 2664 1 0 21:16 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/cssdmonitor root 2681 1 0 21:16 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/cssdagent oracle 2692 1 5 21:16 ? 00:00:14 /u01/app/12.1.0.2/grid/bin/ocssd.bin root 2850 1 1 21:16 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/octssd.bin reboot root 2891 1 1 21:16 ? 00:00:04 /u01/app/12.1.0.2/grid/bin/osysmond.bin root 2898 1 2 21:16 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/crsd.bin reboot root 3011 1 0 21:17 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/orarootagent.bin oracle 3117 1 1 21:19 ? 00:00:00 /u01/app/12.1.0.2/grid/bin/oraagent.bin oracle 3138 1 0 21:19 ? 00:00:00 /u01/app/12.1.0.2/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit root 3307 3259 0 21:20 pts/1 00:00:00 grep d.bin [root@12102-rac3 ~]# kill -9 2377 [root@12102-rac3 ~]# ps -ef |grep d.bin root 2034 1 5 21:15 ? 00:00:26 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot oracle 2349 1 0 21:16 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin oracle 2361 1 0 21:16 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/mdnsd.bin oracle 2364 1 1 21:16 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/evmd.bin oracle 2387 1 3 21:16 ? 00:00:15 /u01/app/12.1.0.2/grid/bin/gipcd.bin oracle 2402 2364 0 21:16 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/evmlogger.bin -o /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.log root 2415 1 1 21:16 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/orarootagent.bin root 2664 1 0 21:16 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/cssdmonitor root 2681 1 0 21:16 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/cssdagent oracle 2692 1 5 21:16 ? 00:00:20 /u01/app/12.1.0.2/grid/bin/ocssd.bin root 2850 1 1 21:16 ? 00:00:04 /u01/app/12.1.0.2/grid/bin/octssd.bin reboot root 2891 1 1 21:16 ? 00:00:06 /u01/app/12.1.0.2/grid/bin/osysmond.bin root 2898 1 2 21:16 ? 00:00:08 /u01/app/12.1.0.2/grid/bin/crsd.bin reboot root 3011 1 0 21:17 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/orarootagent.bin oracle 3117 1 0 21:19 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin oracle 3138 1 0 21:19 ? 00:00:00 /u01/app/12.1.0.2/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit oracle 3839 1 3 21:23 ? 00:00:00 /u01/app/12.1.0.2/grid/bin/gpnpd.bin #######gpnpd进程被自动拉起来. root 3878 3259 0 21:23 pts/1 00:00:00 grep d.bin [root@12102-rac3 ~]# |
此时while true; do olsnodes; crsctl check cluster -all; crsctl stat res -t |grep ons; echo “================date
=================”; sleep 3; done
的输出:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
在节点1上: ================Mon Nov 16 21:42:39 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac3: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** ora.ons ================Mon Nov 16 21:42:43 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac3: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** ora.ons ================Mon Nov 16 21:42:47 CST 2015================= 在节点2上: ================Mon Nov 16 21:42:05 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac3: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** ora.ons ================Mon Nov 16 21:42:11 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac3: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** ora.ons ================Mon Nov 16 21:42:14 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac3: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** ora.ons ================Mon Nov 16 21:42:18 CST 2015================= 在节点3上: ================Mon Nov 16 21:43:55 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac3: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** ora.ons ================Mon Nov 16 21:43:58 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac3: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** ora.ons ================Mon Nov 16 21:44:02 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac3: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** ora.ons ================Mon Nov 16 21:44:05 CST 2015================= |
在节点3上kill mdnsd进程:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
[root@12102-rac3 ~]# ps -ef |grep d.bin root 2143 1 5 21:37 ? 00:00:26 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot oracle 2413 1 0 21:37 ? 00:00:03 /u01/app/12.1.0.2/grid/bin/oraagent.bin oracle 2433 1 0 21:37 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/mdnsd.bin oracle 2436 1 1 21:37 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/evmd.bin oracle 2456 1 0 21:37 ? 00:00:03 /u01/app/12.1.0.2/grid/bin/gpnpd.bin oracle 2472 1 3 21:37 ? 00:00:16 /u01/app/12.1.0.2/grid/bin/gipcd.bin oracle 2478 2436 0 21:37 ? 00:00:01 /u01/app/12.1.0.2/grid/bin/evmlogger.bin -o /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.log root 2495 1 1 21:37 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/orarootagent.bin root 2803 1 0 21:37 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/cssdmonitor root 2820 1 0 21:37 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/cssdagent oracle 2831 1 4 21:37 ? 00:00:21 /u01/app/12.1.0.2/grid/bin/ocssd.bin root 2958 1 1 21:37 ? 00:00:04 /u01/app/12.1.0.2/grid/bin/octssd.bin reboot root 2979 1 1 21:37 ? 00:00:07 /u01/app/12.1.0.2/grid/bin/osysmond.bin root 2986 1 2 21:37 ? 00:00:08 /u01/app/12.1.0.2/grid/bin/crsd.bin reboot root 3086 1 0 21:38 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/orarootagent.bin oracle 3163 1 0 21:39 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin root 4082 3274 0 21:45 pts/0 00:00:00 grep d.bin [root@12102-rac3 ~]# [root@12102-rac3 ~]# [root@12102-rac3 ~]# [root@12102-rac3 ~]# kill -9 2433 [root@12102-rac3 ~]# [root@12102-rac3 ~]# [root@12102-rac3 ~]# [root@12102-rac3 ~]# ps -ef |grep d.bin root 2143 1 5 21:37 ? 00:00:26 /u01/app/12.1.0.2/grid/bin/ohasd.bin reboot oracle 2413 1 0 21:37 ? 00:00:03 /u01/app/12.1.0.2/grid/bin/oraagent.bin oracle 2436 1 1 21:37 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/evmd.bin oracle 2456 1 0 21:37 ? 00:00:03 /u01/app/12.1.0.2/grid/bin/gpnpd.bin oracle 2472 1 3 21:37 ? 00:00:17 /u01/app/12.1.0.2/grid/bin/gipcd.bin oracle 2478 2436 0 21:37 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/evmlogger.bin -o /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u01/app/12.1.0.2/grid/log/[HOSTNAME]/evmd/evmlogger.log root 2495 1 1 21:37 ? 00:00:05 /u01/app/12.1.0.2/grid/bin/orarootagent.bin root 2803 1 0 21:37 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/cssdmonitor root 2820 1 0 21:37 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/cssdagent oracle 2831 1 4 21:37 ? 00:00:21 /u01/app/12.1.0.2/grid/bin/ocssd.bin root 2958 1 1 21:37 ? 00:00:04 /u01/app/12.1.0.2/grid/bin/octssd.bin reboot root 2979 1 1 21:37 ? 00:00:07 /u01/app/12.1.0.2/grid/bin/osysmond.bin root 2986 1 2 21:37 ? 00:00:09 /u01/app/12.1.0.2/grid/bin/crsd.bin reboot root 3086 1 0 21:38 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/orarootagent.bin oracle 3163 1 0 21:39 ? 00:00:02 /u01/app/12.1.0.2/grid/bin/oraagent.bin oracle 4127 1 1 21:45 ? 00:00:00 /u01/app/12.1.0.2/grid/bin/mdnsd.bin root 4136 3274 0 21:45 pts/0 00:00:00 grep d.bin [root@12102-rac3 ~]# |
此时while true; do olsnodes; crsctl check cluster -all; crsctl stat res -t |grep ons; echo “================date
=================”; sleep 3; done
的输出:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
在节点1上: ================Mon Nov 16 21:47:18 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** CRS-4404: The following nodes did not reply within the allotted time: 12102-rac3 ora.ons ================Mon Nov 16 21:48:22 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** CRS-4404: The following nodes did not reply within the allotted time: 12102-rac3 ora.ons ================Mon Nov 16 21:49:25 CST 2015================= 节点2上: ================Mon Nov 16 21:50:28 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** CRS-4404: The following nodes did not reply within the allotted time: 12102-rac3 ora.ons ================Mon Nov 16 21:51:31 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** 12102-rac2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** CRS-4404: The following nodes did not reply within the allotted time: 12102-rac3 ora.ons ================Mon Nov 16 21:52:35 CST 2015================= 节点3上: ================Mon Nov 16 21:50:28 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac3: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** CRS-4404: The following nodes did not reply within the allotted time: 12102-rac1, 12102-rac2 ora.ons ================Mon Nov 16 21:51:31 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac3: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** CRS-4404: The following nodes did not reply within the allotted time: 12102-rac1, 12102-rac2 ora.ons ================Mon Nov 16 21:52:34 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac3: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** CRS-4404: The following nodes did not reply within the allotted time: 12102-rac1, 12102-rac2 ora.ons ================Mon Nov 16 21:53:38 CST 2015================= 12102-rac1 12102-rac2 12102-rac3 ************************************************************** 12102-rac3: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** CRS-4404: The following nodes did not reply within the allotted time: 12102-rac1, 12102-rac2 ora.ons ================Mon Nov 16 21:54:41 CST 2015================= |