IBM Support

IT40017: IBM MQ channels might go unresponsive with high CPU usage in channel process if channel synchronization record is corrupted

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • IBM MQ channels might go unresponsive with high CPU usage in
    channel process amqrmppa or runmqchl if the channel
    synchronization record is corrupted.  If the problem affects the
    channels on the SENDER side(e.g. SDR or CLUSSDR) channels  then
    no message will be sent by the channels. The "DIS CHS" output
    likely to show no value in the SUBSTATE field.
    
    AMQ8417I: Display Channel Status details.
      CHANNEL(CLUSCHL1)      CHLTYPE(CLUSSDR)
      ...
      RQMNAME(RQM1)            STATUS(RUNNING)
      SUBSTATE( )
      XMITQ(SYSTEM.CLUSTER.TRANSMIT.CLUSCHL1)
    
    If the affected channels are on the receiver side (e.g. RCVR or
    CLUSRCVR) then the channel process on the receiver side consumes
    high CPU with the corresponding SDR or RCVR channel going into
    retrying state.
    
    The top output for the affect channel process shows high CPU
    usage.
    
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM
        TIME+ COMMAND
     133101 mqm       20   0  265412  13836  11584 R 106.2   0.4
    2:08.97 runmqchl
     133101 mqm       20   0  265412  13836  11584 R   99.7   0.4
      2:11.97 runmqchl
     133101 mqm       20   0  265412  13836  11584 R   99.3   0.4
      2:14.97 runmqchl
     133101 mqm       20   0  265412  13836  11584 R   99.7   0.4
      2:17.97 runmqchl
    
    IBM MQ trace shows the channel process repeatedly calling
    rflSeekBytes and rflReadBytes
    with the same pattern of comparison and file pointers.
    
    Example:
    
     19:54:22.511503   133101.1      RSESS:000001
    NewFilePointer=716(0x000002cc) <---------
     19:54:22.511528   133101.1      RSESS:000001
    NewFilePointer=1072(0x00000430)
     19:54:22.511552   133101.1      RSESS:000001
    NewFilePointer=1428(0x00000594)
     19:54:22.511577   133101.1      RSESS:000001
    NewFilePointer=1784(0x000006f8)
     19:54:22.511603   133101.1      RSESS:000001
    NewFilePointer=2140(0x0000085c)
     19:54:22.511628   133101.1      RSESS:000001
    NewFilePointer=2496(0x000009c0)
     19:54:22.511653   133101.1      RSESS:000001
    NewFilePointer=2852(0x00000b24)
     19:54:22.511678   133101.1      RSESS:000001
    NewFilePointer=716(0x000002cc)  <---------
     19:54:22.511704   133101.1      RSESS:000001
    NewFilePointer=1072(0x00000430)
     19:54:22.511728   133101.1      RSESS:000001
    NewFilePointer=1428(0x00000594)
     19:54:22.511753   133101.1      RSESS:000001
    NewFilePointer=1784(0x000006f8)
     19:54:22.511778   133101.1      RSESS:000001
    NewFilePointer=2140(0x0000085c)
     19:54:22.511806   133101.1      RSESS:000001
    NewFilePointer=2496(0x000009c0)
     19:54:22.511831   133101.1      RSESS:000001
    NewFilePointer=2852(0x00000b24)
    

Local fix

  • Stop the queue manager
    Backup the queue manager
    Rename the sync file AMQRSYNA.DAT
    Start the queue manager with -ns option (strmqm -ns  QM)
    Recreate the channel sync file ( rcrmqobj -m QM -t syncfile )
    stop the queue manager
    start the queue manager
    

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    All users of IBM MQ distributed channels who have a corrupted
    channel synchronization record in the channel sync file.
    Corruption of this file is not an expected or typical usage
    pattern, and has not been observed as a result of any known
    product defect.
    
    The channel sync file is used by all queue manager channel types
    except SVRCONN/CLNTCONN and AMQP channels.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    The IBM MQ channel process was not detecting a corruption in the
    channel synchronization record and this caused infinite loop,
    resulting in the channel going into an unresponsive state.
    

Problem conclusion

  • The IBM MQ code has been modified prevent infinite loop if the
    channel synchronization record is corrupted.
    
    This APAR does not address the corruption in the channel
    synchronization record itself, as the cause of the corruption at
    the time this issue was observed remains unknown.
    
    With the fix applied, if the queue manager detects an infinite
    loop when finding a channel record in the channel
    synchronization file, the queue manager generates the following
    error message and the channel goes into retrying state.
    
    ------------------------
    02/28/2022 10:37:07 PM - Process(181467.1) User(root)
    Program(runmqchl)
     Host(host1.ibm.com) Installation(Installation1)
     VRMF(9.1.0.7) QMgr(qm1)
     Time(2022-03-01T06:37:07.434Z)
     ArithInsert1(1017)
     CommentInsert1(AMQRSYNA.DAT)
    
    AMQ9516E: File error occurred for file 'AMQRSYNA.DAT'.
    
    EXPLANATION:
    The filesystem returned error code 1017 for file 'AMQRSYNA.DAT'.
    ACTION:
    Record the name of the file and tell the systems administrator,
    who should
    ensure that file is correct and available, for example that the
    current user
    has appropriate access to the file for reading or writing.
    ------------------------
    
    The user needs to take appropriate action to resolve the issue
    i.e. rebuild the syncfile using rcrmqobj in this case. To
    rebuild the syncfile, check the Local Fix/Workaround section
    
    The queue manager also generates the following failure data
    capture (FDC) record.
    
    AMQ184577.0.FDC 2022/03/01 17:37:07.740247-8 Installation1
    runmqchl 184577 1 RM738001 rflFindRecord Unknown(3F9)
    
    Probe Id :- RM738001
    Application Name :- MQM
    Component :- rflFindRecord
    Program Name :- runmqchl
    Arguments :- -c "CHL9 " -m "qm1
    Major Errorcode :- Unknown(3F9)
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.0 LTS   9.0.0.16
    v9.1 LTS   9.1.0.12
    v9.2 LTS   9.2.0.7
    v9.x CD    9.3.2
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT40017

  • Reported component name

    IBM MQ BASE MP

  • Reported component ID

    5724H7271

  • Reported release

    910

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2022-02-19

  • Closed date

    2022-10-11

  • Last modified date

    2023-02-24

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    IT43171

Fix information

  • Fixed component name

    IBM MQ BASE MP

  • Fixed component ID

    5724H7271

Applicable component levels

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
24 February 2023