IBM Fusion Data Foundation service error scenarios

Use these troubleshooting information to know the problem and workaround when install or configure IBM Fusion Data Foundation service.

Red Hat OpenShift Data Foundation storage node failure
Red Hat OpenShift Data Foundation Object Storage Device (OSD) failure
Local storage operator unable to find candidate storage nodes
IBM Fusion Data Foundation capacity cannot be loaded
IBM Fusion Data Foundation cluster fails due to pending StorageClusterPreparing stage

Local storage operator unable to find candidate storage nodes

When you configure a IBM Fusion Data Foundation cluster, you do not find any candidate storage nodes.

Cause

When you configure IBM Fusion Data Foundation cluster, only compute nodes with available disks (SSD/NVMe or HDD) get displayed in the Data Foundation page of IBM Storage Fusion user interface. The following nodes get filtered out and do not display on the screen:

Nodes have SSD/NVMe or HDD disks but they are not in available state
The selected disk properties are not present in current node. For example, disk size or disk type.
The total disk count (with same disk size, disk type) is less than 3.

Steps to verify whether you have the correct storage node candidates

In Red Hat OpenShift Container Platform console, go to Operators > Installed Operators.
Verify whether the LocalStorage operator is installed successfully.
Run the following command to get all the worker nodes:
```
oc get node -l node-role.kubernetes.io/worker=
```
Run the following command to check if discovery results are created for all worker nodes.
```
oc get localvolumediscoveryresult -n openshift-local-storage
```
Run the following command to confirm that none of the nodes have a IBM Fusion Data Foundation storage label:
```
oc get node -l cluster.ocs.openshift.io/openshift-storage=
```

Note:

In Linux on IBM zSystems platform, disks might be formatted and partitioned first. For more information about this behavior, see Red Hat OpenShift Data Foundation on IBM Z and IBM LinuxONE - Reference Architecture section 4.1.1 and 4.1.2.

If all the above checks pass, but the node still could not be seen in the IBM Storage Fusion user interface, then contact IBM support .

IBM Fusion Data Foundation capacity cannot be loaded

If you encounter this issue in the Data foundation page of IBM Storage Fusion user interface, then contact IBM support .

IBM Fusion Data Foundation cluster fails due to pending StorageClusterPreparing stage

In this stage, the PVC is not created and the odfcluster status shows as follows:


conditions:
  - lastTransitionTime: "2022-12-01T15:09:47Z"
    message: storagecluster is not ready,install pending
    reason: StorageClusterPreparing
    status: "False"
    type: Ready
  phase: InProgress
  replica: 1

To diagnose and fix the problem, do the following steps:

Run the following command to open the storagecluster CR:

oc get storageclusters.ocs.openshift.io -n openshift-storage ocs-storagecluster -o yaml

Check whether the output of the command shows the following error message in the status:

ConfigMap "ocs-kms-connection-details" not found'

Output example:


status:
conditions:
- lastHeartbeatTime: "2023-03-29T08:01:10Z"
lastTransitionTime: "2023-03-29T07:49:47Z"
message: 'Error while reconciling: some StorageClasses were skipped while waiting
for pre-requisites to be met: [ocs-storagecluster-cephfs,ocs-storagecluster-ceph-rbd]'
reason: ReconcileFailed
status: "False"
type: ReconcileComplete

If you notice the error message, check the root-operator logs with the following command:

oc logs -n openshift-storage $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name)

Example output:

2023-03-29 07:55:41.297073 E | ceph-cluster-controller: failed  to reconcile CephCluster  "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile
cluster "ocs-storagecluster-cephcluster": failed to configure local ceph  cluster: failed to perform validation before cluster creation: failed  to validate kms connection details: failed to get backend version:  failed to list vault system mounts: Error making API
request.
URL: GET https://9.9.9.75:8200/v1/sys/mounts
Code: 403. Errors: * permission denied