Fixes are available
APAR status
Closed as program error.
Error description
You are running WebSphere MQ on Linux and experience one of more of: an unexplained core dump, MQ FFST that reports a SIGSEGV, hang of a queue manager process. These symptoms will apply to a wide range of problems, but a review of the failure documentation may show some errors that are specific to this problem. Analysis of the thread stack of a core file generated at the time that the issue occurred may show the following for the failing thread: #0 __kernel_vsyscall () #1 __lll_mutex_lock_wait () from /lib/libc.so.6 #2 _L_lock_113 () from /lib/libc.so.6 #3 ptmalloc_lock_all () from /lib/libc.so.6 #4 fork () from /lib/libc.so.6 #5 fork () from /lib/libpthread.so.0 #6 j9dump_create () from /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so #7 doSystemDump () from /opt/WebSphere61/AppServer/java/jre/bin/libj9dmp23.so #8 protectedDumpFunction () from /opt/WebSphere61/AppServer/java/jre/bin/libj9dmp23.so #9 j9sig_protect () from /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so #10 runDumpFunction () from /opt/WebSphere61/AppServer/java/jre/bin/libj9dmp23.so #11 triggerDumpAgents () from /opt/WebSphere61/AppServer/java/jre/bin/libj9dmp23.so #12 dumpCrashData () from /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so #13 j9sig_protect () from /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so #14 structuredSignalHandler () from /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so #15 masterSynchSignalHandler () from /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so #16 xehInterpretSavedSigaction () from /opt/mqm/lib/libmqmcs_r.so #17 xehExceptionHandler () from /opt/mqm/lib/libmqmcs_r.so #18 <signal handler called> #19 malloc_consolidate () from /lib/libc.so.6 #20 _int_malloc () from /lib/libc.so.6 #21 malloc () from /lib/libc.so.6 #22 j9mem_allocate_memory_basic () from /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so #23 j9mem_allocate_memory_callSite () from /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so #24 allocateJavaStack () from /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so #25 allocateVMThread () from /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so #26 startJavaThread () from /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so #27 java_lang_Thread_startImpl () from /opt/WebSphere61/AppServer/java/jre/bin/libjclscar_23.so #28 ?? () #29 ?? () #30 ?? () #31 ?? () #32 ?? () #33 allocateVMThread () from /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so #34 javaProtectedThreadProc () from /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so #35 j9sig_protect () from /opt/WebSphere61/AppServer/java/jre/bin/libj9prt23.so #36 javaThreadProc () from /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so #37 thread_wrapper () from /opt/WebSphere61/AppServer/java/jre/bin/libj9thr23.so #38 start_thread () from /lib/libpthread.so.0 #39 clone () from /lib/libc.so.6 This stack happened to be from an environment where MQ was being called by a JVM but the important part of the stack is the malloc_consolidate call as this is the stack frame which causes the SIGSEGV. Another example of the failure can be seen here. Note that in this example, both MQ and Java signal handlers were disabled: #0 malloc_consolidate (av=0x8b300010) at malloc.c:4576 #1 0xb7dfc769 in _int_malloc (av=0x8b300048, bytes=4372) at malloc.c:3975 #2 0xb7dfdce6 in *__GI___libc_malloc (bytes=4372) at malloc.c:3393 #3 0x8e787cce in xihQueryThreadEntry () from /opt/mqm/lib/libmqmcs_r.so #4 0x8e7501e4 in xcsInitialize () from /opt/mqm/lib/libmqmcs_r.so #5 0x8e909e6b in zstMQCONNX () from /opt/mqm/lib/libmqz_r.so #6 0x8f32e8f7 in MQCONNX () from /opt/mqm/lib/libmqm_r.so #7 0x8e97e7eb in Java_com_ibm_mq_server_MQSESSION__1MQCONNX () from /opt/mqm/java/lib/libmqjbnd05.so #8 0xb7cf328e in VMprJavaSendNative () from The core file may show one or more threads in an xcsWaitThread MQ call, for example: #0 __kernel_vsyscall () #1 __lll_mutex_lock_wait () from /lib/libpthread.so.0 #2 pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #3 pthread_cond_timedwait@GLIBC_2.0 () from /lib/libpthread.so.0 #4 xcsWaitThread () from /opt/mqm/lib/libmqmcs_r.so #5 xtmStopTimerThread () from /opt/mqm/lib/libmqmcs_r.so #6 xcsTerminate () from /opt/mqm/lib/libmqmcs_r.so #7 xcsReleaseThread () from /opt/mqm/lib/libmqmcs_r.so #8 zutReleaseSharedPCD () from /opt/mqm/lib/libmqz_r.so #9 zstMQDISC () from /opt/mqm/lib/libmqz_r.so #10 MQDISC () from /opt/mqm/lib/libmqm_r.so #11 Java_com_ibm_mq_server_MQSESSION__1MQDISC () from /opt/mqm/java/lib/libmqjbnd05.so #12 VMprJavaSendNative () from /opt/WebSphere61/AppServer/java/jre/bin/libj9vm23.so Further examination of the core file will show that there are no threads containing a stack frame named xtmTimerThread. These symptoms were not directly observed for this problem but could both be indicative of it: 1. An FDC may be generated with Probe Id XC130003 in xcsWaitThread. 2. An MQ thread may appear to hang in a call to pthread_cond_timedwait
Local fix
Problem summary
**************************************************************** USERS AFFECTED: Whether a system is potentially affected by this problem depends on its implementation of the pthread_cond_destroy() API - specifically if EBUSY is implemented as documented by the POSIX standard. This will vary both between platform and C library release and so it is not feasible to give a definitive list of at risk systems. Platforms affected: Linux (Power),Linux (s390x),Linux (x86),Linux (x86-64), Linux (zSeries) **************************************************************** PROBLEM SUMMARY: The problem is caused by an incorrect expectation of how the pthread_cond_destroy API should behave. When destroying a conditional variable using the pthread_cond_destroy API, the WebSphere MQ code expected the API to return EBUSY if a pthread_cond_timedwait was currently using the conditional variable. This assumption is based on the specification at: http://www.opengroup.org/onlinepubs/009695399/functions/pthread_ cond_destroy.html The [EBUSY] and [EINVAL] error checks, if implemented [EBUSY] The implementation has detected an attempt to destroy the object referenced by cond while it is referenced (for example, while being used in a pthread_cond_wait() or pthread_cond_timedwait()) by another thread. It was assumed that the EBUSY check would be implemented as standard but this was in fact an optional part of the specification and so not all platform implementations would necessarily behave in this way. As a result of this incorrect assumption, it was possible for a waiting thread to try and use a conditional variable that had been destroyed by the thread being waited on. This would happen if the thread being waited on had managed to call pthread_cond_destroy() before the thread calling pthread_cond_timedwait() had been dispatched.
Problem conclusion
The problem was resolved by changing the way that WebSphere MQ destroys conditional variables. Instead of relying on the thread being waited on to destroy the conditional variable, the code now ensures that the variable is destroyed only when no other threads are using it. By doing this, MQ avoids the race condition that caused the failure. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: v6.0 Platform Fix Pack 6.0.2.11 -------- -------------------- Linux (x86) tbc_p600_0_2_11 Linux (x86-64) tbc_p600_0_2_11 Linux (zSeries) tbc_p600_0_2_11 Linux (Power) tbc_p600_0_2_11 Linux (s390x) tbc_p600_0_2_11 v7.0 Platform Fix Pack 7.0.1.4 -------- -------------------- Linux (x86) U836460 Linux (x86-64) U836464 Linux (zSeries) U836461 Linux (Power) U836462 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available, information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IZ74801
Reported component name
WMQ LIN X86 V6
Reported component ID
5724H7204
Reported release
602
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2010-04-19
Closed date
2010-07-30
Last modified date
2010-07-30
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WMQ LIN X86 V6
Fixed component ID
5724H7204
Applicable component levels
R602 PSY
UP
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSFKSJ","label":"WebSphere MQ"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"6.0.2","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
31 March 2023