IBM Support

Activation of a hot spare node may cause node warmstarts in another cluster

Flashes (Alerts)


Abstract

If an SVC cluster activates a Hot Spare Node, and a second SVC cluster or Storwize/FlashSystem controller is visible on the SAN, the remote system may experience node warmstarts and temporary loss of access to data.

This issue will be fixed in a future software version under APAR HU02213. It can be avoided by disabling the hot spare node before upgrading.

Content

APAR HU02213 is a software issue related to the SVC hot spare node feature.
When a hot spare node is activated on a cluster without the fix for APAR HU02213, a small timing window can cause the cluster to broadcast an incorrect message to other clusters on the SAN. This can trigger node warmstarts on the remote cluster.
If a hot spare node is configured, it will be activated when another node goes offline - for example during software upgrade, or if hardware maintenance takes place. In these scenarios, the probability of the issue happening is low, but the risk can be eliminated either by isolating the cluster from others on the SAN, or by removing the hot spare node from the cluster.
The risk of the issue occurring during maintenance on an individual node is much lower, because the hot spare node is only activated and deactivated once, as opposed to once per node for a software upgrade. 
Procedure to isolate a remote copy partner cluster
If the SVC cluster has a remote copy partner, but no Storwize / FlashSystem controllers, this will be sufficient to prevent the issue
1. Stop any remote copy partnerships. Use the GUI, or the CLI command chpartnership -stop <cluster_id>
2. Disable SAN zoning between the cluster to be upgraded, and any other SVC cluster
3. Complete the software upgrade or node hardware maintenance
4. Re-enable SAN zoning between the clusters.
5. Start any remote copy partnerships. Use the GUI, or the CLI command chpartnership -start <cluster_id>
Procedure to remove the hot spare node from the cluster
If the SVC cluster has Storwize / FlashSystem controllers, it will be necessary to remove the hot spare node from the cluster before upgrading. This will prevent the issue.
To remove the hot spare node, use the GUI or the rmnode <node_id> CLI command.
After upgrade has completed, re-add the hot spare node using the GUI or the addnode -panelname <panel_name> -spare CLI command.

[{"Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STPVGU","label":"SAN Volume Controller"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Document Information

Modified date:
09 November 2020

UID

ibm16356439