IBM Support

IT43654: Qmgr might generate FDC RN189010 with rrcE_FILE_CORRUPT after applying IT42194 channel might go unresponsive with high CPU

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • The queue manager might generate a failure data capture (FDC)
    record with probe ID RN189010 or RN262010 with error code
    rrcE_FILE_CORRUPT after applying fix for APAR IT42194 under
    heavy load.
    
    The FDC looks like below :
    
    | Operating System  :- SunOS 5.11                           |
    | LVLS              :- 9.0.0.16                             |
    | Product Long Name :- IBM MQ for Solaris (x86-64 platform) |
    | Vendor            :- IBM                                  |
    | O/S Registered    :- 1                                    |
    | Data Path         :- /var/mqm                             |
    | Installation Path :- /opt/mqm                             |
    | Installation Name :- Installation2    (2)                 |
    | License Type      :- Production                           |
    | Probe Id          :- RN189010                             |
    | Application Name  :- MQM                                  |
    | Component         :- rflIndex                             |
    | SCCS Info         :-
      /build/jslot0/p900_P/src/lib/client/amqrfila.c,           |
    | Program Name      :- amqrmppa                             |
    | Arguments         :- -m QME                               |
    | Addressing mode   :- 64-bit                               |
    | LANG              :- C                                    |
    | Process           :- 3895                                 |
    | Thread            :- 27    RemoteResponder                |
    | QueueManager      :- QME                                  |
    | UserApp           :- FALSE                                |
    | ConnId(1) IPCC    :- 187                                  |
    | ConnId(3) QM-P    :- 451                                  |
    | Last HQC          :- 1.0.0-3211776                        |
    | Last HSHMEMB      :- 0.0.0-0                              |
    | Last ObjectName   :-                                      |
    | Major Errorcode   :- rrcE_FILE_CORRUPT                    |
    | Minor Errorcode   :- OK                                   |
    | Probe Type        :- MSGAMQ9517                           |
    | Probe Severity    :- 2                                    |
    | Probe Description :- AMQ9517: File damaged.               |
    | FDCSequenceNumber :- 0                                    |
    
    MQM Function Stack
    ccxResponder
    rrxResponder
    rrxOpenSync
    rflOpen
    rflIndex
    xcsFFST
    
    The channel might also go unresponsive with the symptoms
    documented in the APAR IT40017.
    .
    AMQ9514 Channel is in use
    .
    ! Operating System  :- OS400 V7R5M0                         !
    ! PIDS              :- 5724H7251                            !
    ! LVLS              :- 9.2.0.11                             !
    ! Product Long Name :- IBM MQ for IBM i                     !
    ! Probe Id          :- XC308010                             !
    ! Application Name  :- MQM                                  !
    ! Component         :- xlsReleaseMutex                      !
    ! SCCS Info         :-
      /build/slot1/p920_P/src/lib/cs/amqxlfsa.c,                !
    ! Line Number       :- 3076                                 !
    ! Build Date        :- Apr 21 2023                          !
    ! Build Level       :- p920-011-230421                      !
    ! Job Name          :- 662118/QMQM/RUNMQCHL (RUNMQCHL)      !
    ! Job Description   :- QMQM/QMQMJOBD                        !
    ! Submitted By      :- 653643/QMQM/RUNMQCHI                 !
    ! Activation Group  :- 19 (QMQM) (QMQM/RUNMQCHL)            !
    ! Max File Handles  :- 2048                                 !
    ! Thread            :- 00000001    RunChannel               !
    ! UserApp           :- FALSE                                !
    ! Major Errorcode   :- xecL_W_LONG_LOCK_WAIT                !
    ! Minor Errorcode   :- OK                                   !
    ! Probe Type        :- MSGAMQ6150                           !
    ! Probe Severity    :- 3                                    !
    ! Probe Description :- AMQ6150W: IBM MQ resource busy.      !
    ! FDCSequenceNumber :- 0                                    !
    ! Arith1            :- 0                                    !
    ! Arith2            :- 2402 0x'962'                         !
    ! Comment1          :- SyncFile                             !
    

Local fix

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    Users using MQ with APAR IT42194.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    In internal testing under a very high load, a FDC record with
    probe ID RN189010 and error code rrcE_FILE_CORRUPT was generated
    due to a logic error within code to update the channel sync file
    . The logic error exposed a timing condition where invalid data
    could be written into the channel sync file during concurrent
    updates. The invalid data was detected by new syncfile
    validation logic added by IT42194, although was often not
    harmful to the queue manager.
    
    At MQ 9.2 and earlier this could have been seen when multiple
    channels are starting or stopping concurrently. In this scenario
    there was also a possibility that a further timing window could
    be encountered, within which the affected channel processes
    could enter a looping state, consuming additional CPU and
    potentially delaying other channels running within the same
    process (if the affected channel is being serviced by an
    amqrmppa channel pool process). The channel sync file is not
    used at version 9.3 and later, and so 9.3 is not susceptible to
    this issue.
    
    At all versions including 9.3 it was theoretically possible to
    encounter the same FDC record if concurrent administrative
    modifications are made to a client channel definition table
    (CCDT), IBM is not aware of any cases of this latter scenario
    being encountered.
    

Problem conclusion

  • The timing window to encounter this FDC record when updating the
    channel sync file, and the possible of of the channel becoming
    unresponsive as a result, has been corrected for the affected
    9.2 and earlier releases.
    
    The timing window to encounter this FDC record when updating a
    CCDT file has been corrected for the affected 9.3 and earlier
    releases.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.0 LTS   9.0.0.18
    v9.1 LTS   9.1.0.15
    v9.2 LTS   9.2.0.15
    v9.3 LTS   9.3.0.10
    v9.x CD    9.3.3
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT43654

  • Reported component name

    IBM MQ BASE M/P

  • Reported component ID

    5724H7261

  • Reported release

    900

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2023-04-27

  • Closed date

    2023-05-09

  • Last modified date

    2023-09-25

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    IBM MQ BASE M/P

  • Fixed component ID

    5724H7261

Applicable component levels

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"9.0","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
26 September 2023