IBM Support

Compute node initialization might take longer than normal time or might fail

Troubleshooting


Problem

IBM Cloud Pak System has two storage nodes, working together in a high availability (HA) mode. If one of the storage nodes goes down, the compute nodes on the rack might take a long time, (a few hours or so) to initialize. Or, the initialization might fail.

Symptom

On the Management console, a compute node can take a long time (in hours) to move from the “Initializing” status to the “Available” status during initialization.

Cause

IBM Cloud Pak System has two storage nodes, each with two ports. By design, compute nodes are made to boot from port2 of each storage node. Thus, when a storage node goes down, a compute node has only one port to access the needed boot LUN. With one storage node being down, sometimes the load on the available storage node can go high, depending on the workload environment. This situation can lead to network congestion that interferes with the compute node’s communication on port2 to the remaining storage node. If a compute node attempts to initialize during that time, the available storage node might take a long time to respond. 

Diagnosing The Problem

The issues that cause a storage node to be unavailable or offline must be addressed as soon as possible. Running the device or workloads with access to only a single storage node puts the normal device operations at risk.

Resolving The Problem

Restore the storage node HA as soon as possible.

Document Location

Worldwide

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSFQSV","label":"IBM Cloud Pak System Software"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"2.3.0;2.3.1;2.3.2","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
12 June 2020

UID

ibm16225884