IBM Support

IV97266: CAA:SLOW GOSSIP TRANSMISSION ON BOOT MAY CAUSE PARTITIONED CLUSTAPPLIES TO AIX 7200-01 17/09/29 PTF PECHANGE

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • **************************************************************
    * USERS AFFECTED:
    * Systems running the AIX 7200-01 Technology Level
    * with bos.cluster.rte at the 7.2.1.0 or 7.2.1.1 level.
      **************************************************************
    * ERROR DESCRIPTION:
    * After rebooting a node in either a PowerHA or VIOS SSP cluster
    * using CAA, there is a chance that the node may create its own
    * cluster, causing a split-brain / partitioned cluster in the
    * CAA environment.
    *
    * This is more likely to be seen if the network is slow and
    * there is a delay in gossip packets being received by the
    * rebooted node.
    *
    * The effect of a split-brain / partitioned cluster can vary,
    * but in the worst cases: PowerHA may react by bringing
    * resources online at the same time on multiple nodes, and
    * VIOS SSP can experience pool going down on one or more nodes.
      **************************************************************
    * RECOMMENDATION:
    * Install APAR IV97266.
    * Prior to fix availability, an interim fix is available from
    * either
    * ftp://aix.software.ibm.com/aix/ifixes/iv97266/
    * https://aix.software.ibm.com/aix/ifixes/iv97266/
    * The ifix can be installed using Live Update (LU).
    * If LU is not used, installation of the ifix requires a
    * reboot.
      **************************************************************
    

Local fix

  • n/a
    

Problem summary

  • PROBLEM SUMMARY:
    After rebooting a node in either a PowerHA or VIOS SSP
    cluster using CAA, there is a chance that the node may
    create its own cluster, causing a split-brain / partitioned
    cluster in the CAA environment.
    This is more likely to be seen if the network is slow and
    there is a delay in gossip packets being received by the
    rebooted node.
    The effect of a split-brain / partitioned cluster can vary,
    but in the worst cases: PowerHA may react by bringing
    resources online at the same time on multiple nodes, and
    VIOS SSP can experience pool going down on one or more
    nodes.
    

Problem conclusion

  • There is a gate in which all initial clusterwide lock
    requests should consider the count of nodes heartbeating to
    the repository in addition to those gossiping over network.
    There was a hole in the gate and the fix closes it.
    

Temporary fix

  •   *********
      * HIPER *
      *********
    

Comments

APAR Information

  • APAR number

    IV97266

  • Reported component name

    AIX V7.2

  • Reported component ID

    5765CD200

  • Reported release

    720

  • Status

    CLOSED PER

  • PE

    YesPE

  • HIPER

    YesHIPER

  • Submitted date

    2017-06-16

  • Closed date

    2017-06-16

  • Last modified date

    2017-11-07

  • APAR is sysrouted FROM one or more of the following:

    IV97148

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    AIX V7.2

  • Fixed component ID

    5765CD200

Applicable component levels

  • R720 PSY U873440

       UP17/09/21 I 1000

PTF to Fileset Mapping

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSVEF8","label":"AIX 7.2 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"720","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11S","label":"AIX 7.2 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"720","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
07 November 2017