Question & Answer
Question
How to analyze blocking locks and trace it to a hanging JVM >> how to use lsof to find out which client machine /socket your dbprocess is talking to lsof -p #dbprocess | grep #port1521
Cause
Answer
Summary of Recommendations for Analyzing Blocking Locks :
=================================================
1. Find out who
the root blockers are. (your DBA should have a script for this, or you will
query V$SESSION and find this)
Record the root blocker session ID #SID
2. Join V$SESSION and V$PROCESS find out the dbprocess PID linked
to this SID using the SQL below
select P.SPID from V$SESSION S ,
V$PROCESS P where S.PADDR=P.ADDR
and SID = #SID
e.g.,
select
P.SPID,P.PID from V$SESSION S , V$PROCESS P where S.PADDR=P.ADDR
and SID
= 12
3. This should show up in UNIX as an Oracle dbProcess .
Confirm that by running the command below
ps -ef | grep SPID
oracle 15233 1 0 09:40:40 ? 1:23 oracleyf38jtsc (LOCAL=NO)
4. On the DB machine run a lsof -p command to find out which client machine
/socket your dbprocess is talking to
lsof -p #dbprocess | grep
#port1521
e.g.,
# lsof -p 15233 | grep 1521
oracle
15233 oracle 17u IPv4 0x3000663fb58 0t70170814 TCP
mc3800.company.com:1521->10.20.30.45:55194 (ESTABLISHED)
This
indicates that your dbserver mc3800 port 1521 is talking to client machine
10.20.30.45 socket 55194, ideally the name would clue you into
appserver/agentserver etc.
5. Now Login to the client machine
10.20.30.45 and issue a lsof - i command. This will tell you the java ProcessID
that?s running on the client machine
e.g.,
[root@mc root]# lsof -i
TCP:55194
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
java 11585
mc60 112u IPv4 10367259 TCP mc.company.com:55194->10.20.30.60:1521
(ESTABLISHED)
6. ps-ef grep on the ProcessID to find out the java
command that initiated the process. This should tell you which JVM owns the
root blocker
[root@mc root]# ps -ef | grep 11585
mc60 11585 11542
99 09:40 pts/5 1-06:07:56 /opt/was6/WebSphere/AppServer/java/bin/java
-DAG=PrintWaveServer -Xss2m -Xms2048m -Xmx2048m -verbosegc
-Dsun.rmi.dgc.server.gcInterval=3600000 -DSYSTEST=Y
-Dlog4j.configuration=resources/log4jconfig.xml
-DLOGFILE=/app2/mc60/logs/agents/yfs_PrintWaveServer_mc.yan
root 15805
15748 0 17:22 pts/7 00:00:00 grep 11585
7. Do 3 successive thread
dumps kill -3 on the processId for that JVM
kill -3 11585
8.
Within a thread dump, each run is demarcated by a string ?2XMFULLTHDDUMP Full
thread dump ?
Within each run analyze each thread
If the thread
that says sleep, wait, or polling etc, it indicates a passive `willing to wait?
nature.
Skip such threads.
3XMTHREADINFO "Thread-66"
4XESTACKTRACE at java.lang.Thread.sleep
4XESTACKTRACE at
com.yantra.interop.services.reprocess.ErrorDispatchConsumer.run
4XESTACKTRACE at java.lang.Thread.run(Thread.java:568)
If the
thread does not have any immediate STACKTRACE, but shows that it is runnable
(State:R) and has a NativeStacktrace showing sysMonitorWait, we would
recursively analyze it against other MonitorWaits, for now skip such
threads.
3XMTHREADINFO "Thread-65_TAX_REMITTANCE" (TID:?,
state:R,?)
NULL
3HPNATIVESTACK Native Stack of
"Thread-65_TAX_REMITTANCE" PID 21584
NULL -------------------------
3HPSTACKLINE sysMonitorWait at B71CC344 in libhpi.so
In some
cases, here, we see that the threads, that are waiting on a MQGET, these are
just listening on Queues.
Skip such threads.
3XMTHREADINFO
"Thread-46_GIDManagePriceUpdates"
4XESTACKTRACE at
java.net.SocketInputStream.socketRead0(Native Method)
?
4XESTACKTRACE at com.ibm.mq.MQSESSIONClient.MQGET(MQSESSIONClient.java())
4XESTACKTRACE at
com.yantra.interop.services.jms.JMSConsumer.run(JMSConsumer.java
After eliminating these Threads you will be left with a smaller set of
threads to analyze.
Each thread should occur three times in your thread
dump, because you ran kill-3 3 times
Examine the repeating thread and see
if there has been any change in the stack,
If the stack has changed that
thread has progressed.
Skip such threads.
9. By
elimination, you should be left with a very small set of threads, that indicate
which java process is hung, and you should be in a position to then localize
your troubleshooting to that java process.
Application traces and error
logs will support your research here.
10.It is possible that you
might have a blocking lock on the DB, and you have done a thorough
investigation down to the final JVM, but you don?t find an active blocker in
the JVM.
This could mean that a Connection held the lock, but the thread
is no longer in existence.
Before killing the locking session,
there is important datapoint to capture, i.e., the time of the last SQL, for
this - track the V$SESSION entry for the blocker, track the last held SQL and
what was the LAST_CALL_ET.
SELECT SID,LAST_CALL_ET from V$SESSION where
SID = #SID
You might need to backtrack in time by the value
indicated here and for that time period, look in the JVM?s application logs and
see if any exception occurred. Analyze the exception and trace it to code to
see if there is any scope for correction/better error handling.
This information is also available in Word document format from Applications Support, in their Troubleshooting\General_Performance folder, as "BlockingLocksTroubleshooting.doc."
Historical Number
PRI49577
Product Synonym
[<p><b>]Fact[</b><p>];
Was this topic helpful?
Document Information
Modified date:
16 June 2018
UID
swg21525864