IBM Support

How to analyze blocking locks and trace it to a hanging JVM >> how to use lsof to find out which client machine /socket your dbprocess is talking to lsof -p #dbprocess | grep #port1521

Question & Answer


Question

How to analyze blocking locks and trace it to a hanging JVM >> how to use lsof to find out which client machine /socket your dbprocess is talking to lsof -p #dbprocess | grep #port1521

Cause

Answer

Summary of Recommendations for Analyzing Blocking Locks :

=================================================

1. Find out who the root blockers are. (your DBA should have a script for this, or you will query V$SESSION and find this)
Record the root blocker session ID #SID

2. Join V$SESSION and V$PROCESS find out the dbprocess PID linked to this SID using the SQL below

select P.SPID from V$SESSION S , V$PROCESS P where S.PADDR=P.ADDR
and SID = #SID
e.g.,
select P.SPID,P.PID from V$SESSION S , V$PROCESS P where S.PADDR=P.ADDR
and SID = 12

3. This should show up in UNIX as an Oracle dbProcess . Confirm that by running the command below

ps -ef | grep SPID

oracle 15233 1 0 09:40:40 ? 1:23 oracleyf38jtsc (LOCAL=NO)

4. On the DB machine run a lsof -p command to find out which client machine /socket your dbprocess is talking to

lsof -p #dbprocess | grep #port1521

e.g.,
# lsof -p 15233 | grep 1521
oracle 15233 oracle 17u IPv4 0x3000663fb58 0t70170814 TCP mc3800.company.com:1521->10.20.30.45:55194 (ESTABLISHED)

This indicates that your dbserver mc3800 port 1521 is talking to client machine 10.20.30.45 socket 55194, ideally the name would clue you into appserver/agentserver etc.

5. Now Login to the client machine 10.20.30.45 and issue a lsof - i command. This will tell you the java ProcessID that?s running on the client machine
e.g.,
[root@mc root]# lsof -i TCP:55194
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
java 11585 mc60 112u IPv4 10367259 TCP mc.company.com:55194->10.20.30.60:1521 (ESTABLISHED)

6. ps-ef grep on the ProcessID to find out the java command that initiated the process. This should tell you which JVM owns the root blocker
[root@mc root]# ps -ef | grep 11585
mc60 11585 11542 99 09:40 pts/5 1-06:07:56 /opt/was6/WebSphere/AppServer/java/bin/java -DAG=PrintWaveServer -Xss2m -Xms2048m -Xmx2048m -verbosegc -Dsun.rmi.dgc.server.gcInterval=3600000 -DSYSTEST=Y -Dlog4j.configuration=resources/log4jconfig.xml -DLOGFILE=/app2/mc60/logs/agents/yfs_PrintWaveServer_mc.yan
root 15805 15748 0 17:22 pts/7 00:00:00 grep 11585

7. Do 3 successive thread dumps kill -3 on the processId for that JVM
kill -3 11585

8. Within a thread dump, each run is demarcated by a string ?2XMFULLTHDDUMP Full thread dump ?
Within each run analyze each thread
If the thread that says sleep, wait, or polling etc, it indicates a passive `willing to wait? nature.
Skip such threads.
3XMTHREADINFO "Thread-66"
4XESTACKTRACE at java.lang.Thread.sleep
4XESTACKTRACE at com.yantra.interop.services.reprocess.ErrorDispatchConsumer.run
4XESTACKTRACE at java.lang.Thread.run(Thread.java:568)

If the thread does not have any immediate STACKTRACE, but shows that it is runnable (State:R) and has a NativeStacktrace showing sysMonitorWait, we would recursively analyze it against other MonitorWaits, for now skip such threads.

3XMTHREADINFO "Thread-65_TAX_REMITTANCE" (TID:?, state:R,?)
NULL
3HPNATIVESTACK Native Stack of "Thread-65_TAX_REMITTANCE" PID 21584
NULL -------------------------
3HPSTACKLINE sysMonitorWait at B71CC344 in libhpi.so

In some cases, here, we see that the threads, that are waiting on a MQGET, these are just listening on Queues.
Skip such threads.

3XMTHREADINFO "Thread-46_GIDManagePriceUpdates"
4XESTACKTRACE at java.net.SocketInputStream.socketRead0(Native Method)
?
4XESTACKTRACE at com.ibm.mq.MQSESSIONClient.MQGET(MQSESSIONClient.java())
4XESTACKTRACE at com.yantra.interop.services.jms.JMSConsumer.run(JMSConsumer.java


After eliminating these Threads you will be left with a smaller set of threads to analyze.
Each thread should occur three times in your thread dump, because you ran kill-3 3 times
Examine the repeating thread and see if there has been any change in the stack,
If the stack has changed that thread has progressed.
Skip such threads.


9. By elimination, you should be left with a very small set of threads, that indicate which java process is hung, and you should be in a position to then localize your troubleshooting to that java process.
Application traces and error logs will support your research here.

10.It is possible that you might have a blocking lock on the DB, and you have done a thorough investigation down to the final JVM, but you don?t find an active blocker in the JVM.
This could mean that a Connection held the lock, but the thread is no longer in existence.

Before killing the locking session, there is important datapoint to capture, i.e., the time of the last SQL, for this - track the V$SESSION entry for the blocker, track the last held SQL and what was the LAST_CALL_ET.
SELECT SID,LAST_CALL_ET from V$SESSION where SID = #SID

You might need to backtrack in time by the value indicated here and for that time period, look in the JVM?s application logs and see if any exception occurred. Analyze the exception and trace it to code to see if there is any scope for correction/better error handling.

Note

This information is also available in Word document format from Applications Support, in their Troubleshooting\General_Performance folder, as "BlockingLocksTroubleshooting.doc."

[{"Product":{"code":"SS6PEW","label":"IBM Sterling Order Management"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All","Edition":"","Line of Business":{"code":"LOB59","label":"Sustainability Software"}}]

Historical Number

PRI49577

Product Synonym

[<p><b>]Fact[</b><p>];

Document Information

Modified date:
16 June 2018

UID

swg21525864