最近同事遇到一个问题,某个报表程序报错ora-4030的错误:
1 |
错误信息:[Error] **** Dbupdate.sqlUpdateSubs Function:begin ? := trt_subs_1_i_update_target (?, ?, ?, ?, ?); end;, Execute GD:1li run finished with error, errcode:-4030,errmsg:ORA-04030: out of process memory when trying to allocate 16408 bytes (koh-kghu call ,pmucalm coll),errlog:GD_1li TableA_S.(SCS or ADS) Update TableP_S, circle= 2. |
通过trace,发现是以下语句的问题:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
declare type subs_mobile_no_arr is table of trt_subs_GD_1li_wap_S.subs_mobile_no%type index by pls_integer; type icp_id_arr is table of trt_subs_GD_1li_wap_S.icp_id%type index by pls_integer; type icp_code_arr is table of trt_subs_GD_1li_wap_S.icp_code%type index by pls_integer; type channel_id_arr is table of trt_subs_GD_1li_wap_S.channel_id%type index by pls_integer; type serv_id_arr is table of trt_subs_GD_1li_wap_S.serv_id%type index by pls_integer; type brand_id_arr is table of trt_subs_GD_1li_wap_S.brand_id%type index by pls_integer; type fee_mobile_no_arr is table of trt_subs_GD_1li_wap_S.fee_mobile_no%type index by pls_integer; type action_id_arr is table of trt_subs_GD_1li_wap_S.action_id%type index by pls_integer; type crt_his_date_arr is table of trt_subs_GD_1li_wap_S.crt_his_date%type index by pls_integer; type city_id_arr is table of trt_subs_GD_1li_wap_S.city_id%type index by pls_integer; type attr_arr is table of trt_subs_GD_1li_wap_S.attr%type index by pls_integer; type srh_channel_id_arr is table of trt_subs_GD_1li_wap_S.srh_channel_id%type index by pls_integer; arr_subs_mobile_no subs_mobile_no_arr; arr_icp_id icp_id_arr; arr_icp_code icp_code_arr; arr_channel_id channel_id_arr; arr_serv_id serv_id_arr; arr_brand_id brand_id_arr; arr_fee_mobile_no fee_mobile_no_arr; arr_action_id action_id_arr; arr_crt_his_date crt_his_date_arr; arr_city_id city_id_arr; arr_attr attr_arr; arr_srh_channel_id srh_channel_id_arr; begin select subs_mobile_no, icp_id, icp_code, channel_id, serv_id, brand_id, fee_mobile_no, decode(action_id, 'SCS', 'C', 'ADS', 'D') action_id, crt_his_date, city_id, attr, srh_channel_id bulk collect into arr_subs_mobile_no, arr_icp_id, arr_icp_code, arr_channel_id, arr_serv_id, arr_brand_id, arr_fee_mobile_no, arr_action_id, arr_crt_his_date, arr_city_id, arr_attr, arr_srh_channel_id from trt_subs_GD_1li_wap_S where action_id = 'SCS' or action_id = 'ADS'; if arr_subs_mobile_no.last > 0 then arr_srh_channel_id srh_channel_id_arr; begin select subs_mobile_no, icp_id, icp_code, channel_id, serv_id, brand_id, fee_mobile_no, decode(action_id, 'SCS', 'C', 'ADS', 'D') action_id, crt_his_date, city_id, attr, srh_channel_id bulk collect into arr_subs_mobile_no, arr_icp_id, arr_icp_code, arr_channel_id, arr_serv_id, arr_brand_id, arr_fee_mobile_no, arr_action_id, arr_crt_his_date, arr_city_id, arr_attr, arr_srh_channel_id from trt_subs_GD_1li_wap_S where action_id = 'SCS' or action_id = 'ADS'; if arr_subs_mobile_no.last > 0 then forall i in arr_subs_mobile_no.first .. arr_subs_mobile_no.last update f_wap_subs_GD_p_s c set c.city_id = arr_city_id(i), c.icp_id = arr_icp_id(i), c.icp_code = arr_icp_code(i), c.subs_cancel_channel_id = arr_channel_id(i), c.brand_id = arr_brand_id(i), c.subs_status_id = arr_action_id(i), c.fee_mobile_no = arr_fee_mobile_no(i), c.usubs_date = arr_crt_his_date(i), c.usubs_day = to_char(arr_crt_his_date(i), 'YYYYMMDD'), c.idx_day = to_char(arr_crt_his_date(i), 'YYYYMMDD'), c.is_subs_valid = 0, c.flow_time = sysdate, c.srh_channel_id = arr_srh_channel_id(i) where subs_mobile_no = arr_subs_mobile_no(i) and serv_id = arr_serv_id(i) and is_subs_valid = 1; end if; end; |
这个sql在执行的时候,pga的内存被迅速的消耗,通过v$sesstat看到这个session每次使用pga快到2G时,就报错了。而且每次都是这样。所以一开始,我们怀疑是不是pga不够,增加pga大小为3G继续观察程序的运行情况,结果发现还是一样的报错。
在metalink上发现类似的有个bug:Bug 4994859: ORA-04030 WHILE EXECUTING A PROCEDURE CONTAINING BULK COLLECT。于是打算分2步走:
1、向开发建议修改程序,不使用BULK COLLECT(上面sql的85行处)
2、由于bug是在Sun Solaris SPARC (64-bit)上的9207版本,我们的是HP UX的9208版本,可能不是这个bug。所以向oracle开SR进行确认。
oracle回复:
1 2 3 4 5 6 7 8 9 10 |
Basically, an ORA-4030 indicates a limit has been reached with respect to process private memory allocation. An ORA-4030 error is an error in the PGA; Memory limitation related to an Oracle rdbms server process is reached. Typical causes: * OS Memory limit reached such as physical memory and/or swap/virtual paging * OS limits reached such as kernel or user shell limits that limit overall, user level or process level memory usage * OS limit on PGA memory size due to SGA attach address <<Note 262540.1>> Relocate SGABEG on 64-bit Oracle * Oracle internal limit example Bug 3130972 * Application design causing limits to be reached * Bug – space leaks, heap leaks |
于是用RDA收集主机信息,发现有一个参数设置:
1 2 3 4 5 6 7 8 9 10 11 |
Comments -------- Ulimit ===== time(seconds) unlimited file(blocks) unlimited data(kbytes) 2097152 <---- stack(kbytes) 131072 memory(kbytes) unlimited coredump(blocks) 4194303 nofiles(descriptors) 4096 |
从这里我们看到ulimit -d参数被限制了,大小为2G,我们通过修改2个内核参数从而达到修改这个参数的目的:
1、将maxdsiz从2147483648 (2G) 调整到4187593114(4G)
2、将maxdsiz_64bit从2147483648 (2G)调整6442450944 (6G).
ulimit -d自动的变成:
1 |
data(kbytes) 4063168 |
再次运行原来的应用程序,发现不再报错。
一条评论
最好将程序中的 “批量绑定” 修改为游标方式。
曾经遇到类似问题, 将PGA设置非常大, ulimits改为无限大,都不能解决问题, 最后让项目组把代码改成游标,问题终于解决。