IT19169: MQ APPLIANCE HA QUEUE MANAGERS NOT TOLERANT OF NETWORK PING FAILURES.

APAR status

Closed as program error.

Error description

A failover occurs from primary to standby queue managers in a
HA/DR environment.  This is soon followed by failover back to
the primary queue manager.

The IBM Appliance system log contains entries such as this:

Dec 16 16:50:51 (none) pengine[24787]:  warning: unpack_rsc_op:
Processing failed op monitor for QMGR on x.y.a.b: not running
(7)
Dec 16 16:50:51 (none) pengine[24787]:    error: color_instance:
Pre-allocation failed: got x.y.a.b instead of x.y.a.b
Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
Demote  QMGR_drbd:0#011(Master -> Slave x.y.a.b)
Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
Promote QMGR_drbd:1#011(Slave -> Master x.y.a.b)
Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
Stop    QMGR_fs#011(x.y.a.b)
Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
Stop    QMGR#011(x.y.a.b)
Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
Move    QMGR_DR_IP#011(Started x.y.a.b -> x.y.a.b)
Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
Demote  QMGR_DR_drbd:0#011(Master -> Slave x.y.a.b - blocked)
Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
Move    QMGR_DR_drbd:0#011(Slave x.y.a.b -> x.y.a.b)

Local fix

It may be possible to work around or decrease the likelihood of
this problem occurring by fixing any network issues that may be
affecting connectivity between IBM MQ Appliances

Problem summary

****************************************************************
USERS AFFECTED:
Users of the IBM MQ Appliance who have configured a combination
of HA and DR and who have unreliable network connectivity may be
affected by this problem.


Platforms affected:
MultiPlatform

****************************************************************
PROBLEM DESCRIPTION:
A transient DR ping failure could have resulted in a queue
manager briefly starting on the HA secondary appliance before
switching back to the HA primary appliance.  If a network is
particularly unreliable then this failover behaviour may have
happened frequently.

Problem conclusion

The code that detects whether the DR secondary IBM MQ Appliance
can be contacted was modified so that it is more tolerant of
transient network failures.

---------------------------------------------------------------
The fix is targeted for delivery in the following PTFs:

Version    Maintenance Level
v8.0       8.0.0.7
v9.0 CD    9.0.2

The latest available maintenance can be obtained from
'WebSphere MQ Recommended Fixes'
http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037

If the maintenance level is not yet available information on
its planned availability can be found in 'WebSphere MQ
Planned Maintenance Release Dates'
http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
---------------------------------------------------------------

Temporary fix

Comments

APAR Information

APAR number
IT19169
Reported component name
IBM MQ APPL M20
Reported component ID
5725S1400
Reported release
800
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2017-02-08
Closed date
2017-02-28
Last modified date
2017-06-01

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
IBM MQ APPL M20
Fixed component ID
5725S1400

Applicable component levels

R800 PSY
UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SS5K6E","label":"IBM MQ Appliance"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0","Edition":"","Line of Business":{"code":"LOB36","label":"IBM Automation"}}]

Document Information

Modified date:
01 June 2017

Tips

IT19169: MQ APPLIANCE HA QUEUE MANAGERS NOT TOLERANT OF NETWORK PING FAILURES.

Subscribe to this APAR

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R800 PSY

Document Information

Share your feedback

Need support?