IBM Support

IJ12492: DEADLOCK GROUPPROTOCOLDRIVERTHREAD 'RPC WAIT' FOR CCMSGGROUPLEAV

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • After logAssertFailed, restart gpfs, node showed as
    arbitrating, could not join back cluster. Cluster manager
    node have deadlock messages.
    
    Reported in:
    Spectrum Scale 4.1.1.15 and 4.2.3.4 mixed cluster, x86_64
    Linux
    
    Known Impact:
    Node could not join cluster.
    
    Error Message such as:
    cluster manager node showed deadlock in :
    0x7F75EC0341A0 (   5871) waiting 1509.838507008 seconds,
    GroupProtocolDriverThread: on ThCond 0x7F767C1D93B8
    (0x7F767C1D93B8) (MsgRecordCondvar), reason 'RPC wait'
    for ccMsgGroupLeave
    

Local fix

  • Workaround would be to kill the file system manager node,
    or run 'mmcommon breakDeadlock' command.
    

Problem summary

  • mmlsfileset command with -J or -F option can interfere with
    group protocal and delay/prevent a node from leaving/join
     the cluster.   This issue would show up as nodes staying in
    arbitrating state.
    

Problem conclusion

  • Speed up mmlsfileset command with -J and -F option and reduce
    the chance of it interfering with group protocal.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ12492

  • Reported component name

    SPECTRUM SCALE

  • Reported component ID

    5725Q01AP

  • Reported release

    423

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-01-03

  • Closed date

    2019-01-03

  • Last modified date

    2019-06-28

  • APAR is sysrouted FROM one or more of the following:

    IJ07765

  • APAR is sysrouted TO one or more of the following:

    IJ12513

Fix information

  • Fixed component name

    SPECTRUM SCALE

  • Fixed component ID

    5725Q01AP

Applicable component levels

  • R423 PSY U885025

       19/06/28 I 1000

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"423","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSFKCN","label":"General Parallel File System"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"423","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
28 June 2019