Thursday, October 20, 2011

BPEL container abnormally terminated and after that it is not starting

Visit the Below Website to access unlimited exam questions for all IT vendors and Get Oracle Certifications for FREE
http://www.free-online-exams.com
 Problem:    BPEL container abnormally terminated and after that it is not starting

Log files:
warnings in the logs 

10/01/14 11:01:12 Exception in thread "PingWaiter" java.lang.OutOfMemoryError: unable to create new native thread
10/01/14 11:01:12 at java.lang.Thread.start0(Native Method)
10/01/14 11:01:12 at java.lang.Thread.start(Thread.java:574)
10/01/14 11:01:12 at org.collaxa.thirdparty.jgroups.protocols.PingSender.start(PingSender.java:34)
10/01/14 11:01:12 at org.collaxa.thirdparty.jgroups.protocols.PingWaiter.findInitialMembers(PingWaiter.java:113)
10/01/14 11:01:12 at org.collaxa.thirdparty.jgroups.protocols.PingWaiter.run(PingWaiter.java:99)
10/01/14 11:01:12 at java.lang.Thread.run(Thread.java:595)
rejoin the cluster

Jan 14, 2010 1:08:27 PM org.collaxa.thirdparty.jgroups.protocols.pbcast.ClientGmsImpl join
WARNING: join(172.29.16.30:49652) sent to 172.29.16.30:55491 timed out, retrying
Jan 14, 2010 1:08:34 PM org.collaxa.thirdparty.jgroups.protocols.pbcast.ClientGmsImpl join
WARNING: join(172.29.16.30:49652) sent to 172.29.16.30:55491 timed out, retrying
Jan 14, 2010 1:08:41 PM org.collaxa.thirdparty.jgroups.protocols.pbcast.ClientGmsImpl join
WARNING: join(172.29.16.30:49652) sent to 172.29.16.30:55491 timed out, retrying

Solution:


1.       As this happened after processing many large files, and combined with the OOM error you saw , there is a bug that you might be hitting that matches this behavior
Bug 8813964 BPEL PROCESS LOOPS WHILE PROCESSING LARGE FILE

The fix for this bug is included in MLR #19 for 10.1.3.3 - patch number 8590151
2.       maxpermSize determine size of heap allocated to OC4J Processes.

If you have enough memory please increse 
-XX:MaxPermSize=512M
for the BPEL container 
3.       Also to reduce risk of nodes dropping from the cluster you can increase the fault detection FD timeout setting. FD timeout specifies the maximum time in milliseconds for a cluster node to respond to a message. A node will be dropped from a cluster after the max number of tries is reached without a response. Both the UDP and TCP sections useFD timeout and max_tries.

Detail information, please reference 

6.11 jGroup Nodes May Drop Out of the BPELCluster


These 2 combined should prevent further OOM error on the system 
<FD timeout="2000"
max_tries="3"
shun="true"/>
References:
Note 455152.1 How To Configure BPEL 10.1.3 Cluster
note 967076.1 jGroup Nodes May Drop Out of the BPELCluster



Get Oracle Certifications for all Exams
Free Online Exams.com

No comments: