Monday, October 10, 2011

How to troubleshoot hanging jdbc sessions from application connect pool

Visit the Below Website to access unlimited exam questions for all IT vendors and Get Oracle Certifications for FREE
http://www.free-online-exams.com

Problem: How to troubleshoot hanging jdbc sessions from application connect pool

Symptoms:
Oracle Application server 10.1.3.4 (default jdbc=10.1.0.5)

Hibernate Framework is being used.

Application is not using jdbc API directly. Application is using application server connection pool managed by datasource.

this is a description for set parameters:
we have min connections - 30
inital chache - 30
max connections - 150
incatvity time out - 120

based on this it should be the following:






Log files:
* application.log:
contains errors like this:
11/01/10 19:11:14.166 M2 Declaration webExternal: Broken pipe (errno:32)
11/01/10 19:11:14.167 M2 Declaration webExternal: Servlet error
java.io.IOException: Broken pipe (errno:32)
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
at sun.nio.ch.IOUtil.write(IOUtil.java:75)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:302)
at java.nio.channels.Channels.write(Channels.java:60)
at java.nio.channels.Channels.access$000(Channels.java:47)
at java.nio.channels.Channels$1.write(Channels.java:134)
at com.evermind.server.http.AJPOutputStream.endRequest(AJPOutputStream.java:117)
at com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:317)
at com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:199)
at oracle.oc4j.network.ServerSocketReadHandler$SafeRunnable.run(ServerSocketReadHandler.java:260)
at com.evermind.util.ReleasableResourcePooledExecutor$MyWorker.run(ReleasableResourcePooledExecutor.java:303)
at java.lang.Thread.run(Thread.java:595)

* same errors are seen in application_internal.log
11/01/10 19:00:59.927 M2 Declaration webInternal-SSO: Broken pipe (errno:32)
11/01/10 19:00:59.928 M2 Declaration webInternal-SSO: Servlet error
java.io.IOException: Broken pipe (errno:32)
at sun.nio.ch.FileDispatcher.write0(Native Method)
...

* system-application.log:
same errors:
11/01/10 09:58:43.964 dms: Servlet error
java.io.IOException: Broken pipe (errno:32)
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
..

* log.xml contains a lot like this:




2011-01-10T19:28:04.237+04:00
webservices
OWS-04046

1
jdpuap01.dubaiworld.ae
172.30.112.30
service
197
appsoau

M2 Declaration webInternal-SSO
M2 Declaration Internal-SSO-0.28.0.12
WebServiceCouriorSoapHttpPort
WebServiceCourior


1294673283:172.30.112.30:17491:0:1952


Caught exception while handling request: org.springframework.orm.hibernate3.HibernateOptimisticLockingFailureException: Object of class [com.pcfc.customs.declaration.domain.Declaration] with identifier [7471565]: optimistic locking failed; nested exception is org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect): [com.pcfc.customs.declaration.domain.Declaration#7471565] org.springframework.orm.hibernate3.HibernateOptimisticLockingFailureException: Object of class [com.pcfc.customs.declaration.domain.Declaration] with identifier [7471565]: optimistic locking failed; nested exception is org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect): [com.pcfc.customs.declaration.domain.Declaration#7471565]



* Customs~UAT_Customs~Customs~1.log: i assume this is stdout of OC4J(opmn/logs) and contains full thread stacks.

contains errors like this:
11/01/10 18:22:03 java.lang.RuntimeException: failed to evaluate: =Class.new();
11/01/10 18:22:03 Continuing ...
11/01/10 18:22:03 java.lang.InstantiationException: java.math.BigInteger
11/01/10 18:22:03 Continuing ...

will analyze thread dumps with timestamps..


I definitely see a number of threads that are 'hanging' (waiting on the server)
for example the following PreparedStatement objects I found in both thread stacks.. (note that these are not all,but some that I tested to be duplicate:
seconddump.txt: - locked <1fffffff550d7720> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff4e1a71f0> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff550f55d0> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff53c11d50> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff4e2d6208> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff4cbf5848> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff52ad3798> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff4f9655b8> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff48f25e80> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff513656a0> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff5514eea0> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff4b790ec0> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff4b74f128> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff4b1887f8> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff4b73e188> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff4beabb98> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff53b4f398> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff4b182a70> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff4c573d20> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff4c5c7ab8> (a oracle.jdbc.driver.T4CPreparedStatement)
seconddump.txt: - locked <1fffffff4b1991e0> (a oracle.jdbc.driver.T4CPreparedStatement)

so it seems that this is taking at least 3 minutes for the RDBMS to process the query... I guess this is what is causing the problem.
I assume this is no real leak but more threads that are spending a lot of time..

Solution:



There are different approaches to this problem.
1. Either your threads are busy on the connections itself or are busy processing
==> for this a kill -3 (on unix/linux) might help you furhter==> this to see if threads are stuck.
2. You have a Connection leak in your application.
This means your thread finished working but did not call the close() on the Connection. (this should normally return a Connection to the pool)
Normally one should code the close() request in a finally block. This to ensure that the code is executed no matter if Exceptions occur.
Solutions could be:
what is configurable is to have an abandonedConnnectionTimeout.

http://download.oracle.com/docs/cd/B25221_05/web.1013/b14427/datasrc.htm#sthref407
Note that if you want to use this feature, you'll need to apply a patch to the OC4J instance (that is using JDBC 10.1.0.5):
Note 730096.1: Timeout of Implicit Connection Cache Causes Deadlock

This because you risk hitting a deadlock when setting abandoned or timetolive timeouts..

When you have hit the limit, can you check:
2.1. What is the OC4J home page showing on the Connections? (and number of threads)?

2.2. Also use dmstool to check the number of open and closed connection calls:

2.2.1 cd $ORACLE_HOME/bin
2.2.2. dmstool -list | grep jdbc or dmstool -list | find "JDBC"
==> this output all things we can monitor
2.2.3. monitor the open and closed and completed requests:
for example in my case:
dmstool -q /tcleyman-lap/OC4J:12501:6003/JDBC/Driver/ConnectionOpenCount.count
dmstool -q /tcleyman-lap/OC4J:12501:6003/JDBC/Driver/ConnectionCloseCount.count
dmstool -q /tcleyman-lap/OC4J:12501:6003/JDBC/Driver/ConnectionCreate.completed

2.3. Did you also check v$session/v$session_wait on database side?
Does this confirm the number of connections opened from OC4J side?
Do you see a number of Connections in waiting for a longer time?

2.4. can you do a couple of kill -3 on the java process (OC4J)?
maybe 3 with 1 minute interval.

*** Issue is resolved by running dbms_stats

issue was caused by queries spending a lot of time in RDBMS.
Seems this was due to index usage, stats were not updated.

References:


Note 402480.1: How To Ensure No ResultSet/Connection/Statement Leaks Exist in Your JDBC Code

Get Oracle Certifications for all Exams
Free Online Exams.com

No comments: