A fix is available
APAR status
Closed as program error.
Error description
During a QMGR restart it fails with the abn 5c6-00E20045 because of a partially deleted shared queue. When a task then tries to open the queue the error causes it to loop. The looping task is part of a display queue command running on DB2 server. Checkpoint processing then runs and issues a query to DB2, and this unluckily picks the same DB2 server task and is queued behind the looping task. That meant checkpoints were no longer being written, but the tests continued doing lots of persistent/recoverable work. The QMGR was then cancelled, leaving a huge backlog of outstanding recovery processing since the last successful checkpoint. On the next startup the volume of this recovery processing resulted in the various memory problems.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All users of IBM MQ for z/OS Version 9 * * Release 2 Modification 0 and * * Release 3 Modification 0. * **************************************************************** * PROBLEM DESCRIPTION: When MQ checkpointing has stalled a * * Queue Manager could get into an * * unrecoverable situation after the logs * * containing the most recent checkpoint * * have been reused. * **************************************************************** If logging activity continues after MQ checkpoint processing has stalled for long enough the log containing the most recent checkpoint could be reused. This has the potential to leave the Queue Manager in an unrecoverable situation as there would be no checkpoint to rebuild the Queue Manager from.
Problem conclusion
A new message CSQJ169E has been added to indicate when the last checkpoint is no longer contained on any of the active logs. When this scenario is detected this new message will be output during active log switch processing to indicate to the user that checkpoint processing may have stalled. Action may need to be taken to ensure that a new checkpoint is taken to prevent the Queue Manager from proceeding into an unrecoverable situation. The IBM Documentation is updated as follows: Both the V930 & V920 doc pages below will have new entries for message CSQJ169E: IBM MQ -Reference -Messages and reason codes -IBM MQ for z/OS messages,completion, and reason codes -Messages for IBM MQ for z/OS -Recovery log manager messages (CSQJ...) CSQJ169E LAST CHECKPOINT NOT FOUND IN ACTIVE LOG COPY & WITH STARTRBA=&, CHECKPOINT RBA=&. Explanation During active log switch processing the last checkpoint was not found on any active logs. This could leave the Queue Manager in an unrecoverable position if there are insufficient archive logs available to find the required recovery point during restart processing. This may be an indication that checkpoint processing may have stalled or is not completing in a timely manner and should be investigated. System action Log switch processing will continue. System programmer response You may be able to re-establish checkpointing by stopping and restarting the Queue Manager. If checkpointing is stalled, the STOP QMGR command may not be able to shut down the Queue Manager normally. If this happens, then the Queue Manager may need to be cancelled. Before doing so, ensure that the logs from the restart RBA onwards are available. The restart RBA can be found using the DISPLAY USAGE command. If it appears that checkpointing has stalled, then take a dump of the Queue Manager Address Space and contact your IBM support center for assistance to help understand why checkpointing may have stalled. If checkpointing does not appear to have stalled, then an alternative reason for this situation might be that the Queue Managers active logs are too small for the current workload and checkpoint processing is not completing during the scope of one active logs lifespan.
Temporary fix
Comments
APAR Information
APAR number
PH47266
Reported component name
IBM MQ Z/OS V9
Reported component ID
5655MQ900
Reported release
200
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2022-06-15
Closed date
2023-09-21
Last modified date
2023-11-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UI93668 UI93669 UI93670 UI93671 UI93672 UI93673 UI93674 UI93675
UI93676 UI93677 UI93678 UI93679
Modules/Macros
CSQFJDIC CSQFJDIE CSQFJDIF CSQFJDIK CSQFJDIU CSQFLTXC CSQFLTXE CSQFLTXF CSQFLTXK CSQFLTXU CSQFMTXC CSQFMTXE CSQFMTXF CSQFMTXK CSQFMTXU CSQJW307
Fix information
Fixed component name
IBM MQ Z/OS V9
Fixed component ID
5655MQ900
Applicable component levels
R200 PSY UI93674
UP23/10/10 P F310
R201 PSY UI93675
UP23/10/10 P F310
R202 PSY UI93676
UP23/10/10 P F310
R203 PSY UI93677
UP23/10/10 P F310
R204 PSY UI93678
UP23/10/10 P F310
R205 PSY UI93679
UP23/10/10 P F310
R300 PSY UI93668
UP23/10/10 P F310
R301 PSY UI93669
UP23/10/10 P F310
R302 PSY UI93670
UP23/10/10 P F310
R303 PSY UI93671
UP23/10/10 P F310
R304 PSY UI93672
UP23/10/10 P F310
R305 PSY UI93673
UP23/10/10 P F310
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"200","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
02 November 2023