IBM Support

IJ31564: CLUSTER NODE AUTOMATICALLY REBOOTED DURING ROLLING MIGRATION

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • An attempt was made to upgrade AIX OS and PowerHA
    upgrade.
    
    Below are the process which we followed:
    
    Node 1:
    
    1) Sync the cluster.
    
    2) Stop cluster services and CAA on passive node.
    
    3) Apply OS patching on passive node.
    
    4) Upgrade PowerHA on passive node.
    
    5) Reboot server
    
    6) Start cluster services along with CAA.
    
    -- Week later --
    
    Node 2:
    
    7) Perform Resource Group failover.
    
    8) Stop cluster services and CAA on passive node.
    
    9) Apply OS patching on passive node.
    
    10) Upgrade PowerHA on passive node.
    
    11) Reboot server
    
    12) Start cluster services along with CAA.
    
    13) Once everything comes up , sync the cluster.
    
    did you find  why active cluster node (downgraded
    version)
    crashed?
    
    After step 6 and before step 7, the Node 2(i.e. down
    level
    PowerHA version node) crashed with following error. This
    happens after completing the step 6 and
    leaving for 10-15 minutes.
    
    clstrmgrES[11075902]: 2021-01-19T13:32:57|clstrmgrES:
    received invalid rginfo request version
    PowerHA SystemMirror for AIX: clexit.rc : Unexpected
    termination of clstrmgrES.
    PowerHA SystemMirror for AIX: clexit.rc : Halting system
    immediately!!!
    

Local fix

Problem summary

  • Resource group is online on both the nodes of the cluster during
     middle of rolling migration from 724 to 725.
    
    we may also observe node get rebooted during migration.
    
    Same problem can happen from 723 to 724 or 725 release.
    

Problem conclusion

  • Modified code to support new node strcure and also
    cleared unncessary migration bits during migraiton process
    

Temporary fix

  • Not available
    

Comments

APAR Information

  • APAR number

    IJ31564

  • Reported component name

    POWERHA SYSMIR

  • Reported component ID

    5765H3900

  • Reported release

    724

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Submitted date

    2021-03-15

  • Closed date

    2021-03-25

  • Last modified date

    2021-03-25

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    IJ31615

Fix information

  • Fixed component name

    POWERHA SYSMIR

  • Fixed component ID

    5765H3900

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSLM9V","label":"PowerHA SystemMirror Standard Edition for AIX"},"Platform":[{"code":"PF053","label":"Power Systems"}],"Version":"724","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
26 March 2021