IBM Storage Fusion 2.8.0 Hotfix for the GDP OCP issues

General Page

IBM Storage Fusion 2.8.0 is impacted due to an IBM Storage Scale container native failure for a few of the OCP versions.

A recent Linux kernel update to address the CVE-2024-25744 Linux security vulnerability resulted in a mmbuildgpl command failure while building the IBM Storage Scale kernel portability layer. IBM Storage Scale cannot get into an active state on a node with the updated kernel.

To get more information about the kernel changes, see Red Hat support page.

The RHEL9.2 5.14.0-284.66.1.el9_2.x86_64 and higher versions impact the IBM Storage Scale.

OpenShift levels containing the kernels (x86_64 only) that impact the IBM Storage Scale Container Native:

4.15.13 or higher versions
4.14.25 or higher versions
4.13.42 or higher versions

Apply this hotfix proactively to avoid issues and get immune to Scale OpenShift Container Platform version compatibility issues.

How to identify the probelm

The following example steps show how to identify the error and recover from a failure of the IBM Storage Scale Container Native after an OpenShift upgrade.
It is applicable for the IBM Storage Scale Container Native 5.2.0.0 version that is upgraded to any of the following OpenShift levels:
4.13.42 or higher versions
4.14.25 or higher versions
4.15.13 or higher versions

Check whether the OpenShift version is 4.15.13, 4.14.25, 4.15.13 or a higher version. If the OpenShift version is in lower level, then you might not face this issue.
Run the following command to check whether a worker node is set to SchedulingDisabled state. It indicates the next work node that is scheduled for a Red Hat OpenShift machine config upgrade rollout.
oc get nodes -o wide
Run the following command to check whether at least a single scale-core pod must be in the Init: CrashLoopBackOff state.
oc get pods -o wide
Run the following command to check the logs from the mmbuildgpl pod of the worker node that is in Init: CrashLoopBackOff state and search for the variable st_ino error, which is a signature of this issue.
oc logs worker2 -c mmbuildgpl
- The example output shows that the worker2 node is in an Init: CrashLoopBackOff state because the mmbuildgpl fails to compile a portability layer that is used as kernel tie-in for the IBM Storage Scale Container Native. The mmbuildgpl failed due to a defect that created an incompatibility with the RHCOS 9 EUS kernel level 5.14.0-284.66.1.e19_2 and higher versions.
- The OpenShift Machine Config Operator (MCO) rolled out a new configuration on the underlying worker2 node. It then progressed to the next node because the cluster integrity protected by the IBM Storage Scale, which prevented the draining of the next scale core pod. The Red Hat OpenShift upgraded itself as a result of holding the MCO rollout for a long time.
Example output:
....
Invoking Kbuild... /usr/bin/make -C /usr/src/kernels/5.14.0-284.66.1.el9_2.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config ; \
if [ $? -ne 0 ]; then \ exit 1;\ fi make[2]: Entering directory '/usr/src/kernels/5.14.0-284.66.1.el9_2.x86_64' CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:61, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54: /usr/lpp/mmfs/src/gpl-linux/kx.c: In function 'vstat': /usr/lpp/mmfs/src/gpl-linux/kx.c:238:12: error: 'struct stat' has no member named '__st_ino'; did you mean 'st_ino'? 238 | statbuf->__st_ino = vattrp->va_ino; | ^~~~~~~~ | st_ino make[3]: *** [scripts/Makefile.build:321: /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 make[2]: *** [Makefile:1923: /usr/lpp/mmfs/src/gpl-linux] Error 2 make[2]: Leaving directory '/usr/src/kernels/5.14.0-284.66.1.el9_2.x86_64' make[1]: *** [makefile:140: modules] Error 1 make[1]: Leaving directory '/usr/lpp/mmfs/src/gpl-linux' make: *** [makefile:145: Modules] Error 1 -------------------------------------------------------- mmbuildgpl: Building GPL module failed at Mon May 20 17:27:51 UTC 2024. -------------------------------------------------------- mmbuildgpl: Command failed. Examine previous error messages to determine cause. cleanup run
Resolution
1. Recovery from the failure state where a single scale-core pod is in Init: CrashLoopBackoff state as follows:
  - For HCI:
    1. Add the field enableManualInstallation: true to the scalemanager CR spec.
    2. Replace the isf-storage-operator-controller-manager image with a new image in the installed operator CSV.
      isf-storage-operator - cp.icr.io/cp/isf/isf-storage-operator@sha256:df694029549fce04cec5236832970c4a537b85eec6a7289e27a9ae21c9d4f5de
    3. Replace the isf-cns-operator image in the cr-version-cm config map of the ibm-spectrum-fuison-ns namespace.
      isf-cns-operator - cp.icr.io/cp/isf/isf-cns-operator@sha256:5023fc0c078a832d8c14b5ab5c6a02e7c1f6370f9d39cb6567db7557fbe05a27
    4. Replace isf-storage-service image in cr-version-cm config map of ibm-spectrum-fuison-ns namespace.
      isf-storage-services - cp.icr.io/cp/isf/isf-storage-services@sha256:5d980a0d311b6eccfbd9fddb5af07395bbe52c066c925e4542e281f5045b9c53
  - For SDS:
    1. Replace the isf-cns-operator image with a new image in the installed operator CSV.
      isf-cns-operator - cp.icr.io/cp/isf-sds/isf-cns operator@sha256:734d34758f579fcbb22a15500fc40cbcf84d42fba32e0d9f233c263a7df3a37d
2. Run the following commands to make sure that all the pods are running.
  oc get pods -n ibm-spectrum-fusion-ns | grep isf-storage-service-dep
  oc get pods -n ibm-spectrum-fusion-ns | grep isf-storage-operator-controller-manager
  oc get pods -n ibm-spectrum-fusion-ns | grep isf-cns-operator-controller-manager
  
  Example ouput:
```
~ % oc get pods -n ibm-spectrum-fusion-ns | grep isf-storage-service dep
isf-storage-service-dep-6c874667f-km4xv                           1/1     Running   0          86m
~ % oc get pods -n ibm-spectrum-fusion-ns | grep isf-storage-operator-controller-manager
isf-storage-operator-controller-manager-86ccc69c4d-hjh65          2/2     Running   0          87m
~ % oc get pods -n ibm-spectrum-fusion-ns | grep isf-cns-operator-controller-manager    
isf-cns-operator-controller-manager-6748dcd68f-ndrjt              2/2     Running   0          87m
```
3. For SDS, click upgrade button on the IBM Storage Fusion user interface to upgrade the GDP service.
4. For HCI, run the following command to set the trigger update to true in the scale CR.
  oc patch Scale storagemanager -n ibm-spectrum-fusion-ns --type='json' -p='[{"op": "replace", "path": "/spec/triggerUpdate", "value": true}]'
5. For HCI, after a new isf-storage-service-dep pod comes up with the latest image, run the command curl -k https://isf-scale-svc/api/v1/upgradeWithOperator in the isf-storage-service-dep pod terminal.
```
~ % oc project ibm-spectrum-fusion-ns
Now using project "ibm-spectrum-fusion-ns"
~ % oc rsh <isf-storage-service-dep pod name>
sh-4.4# curl -k https://isf-scale-svc/api/v1/upgradeWithOperator
{"status":"Deployed ECE and CSI upgrade yaml files on OCP cluster successfully"}sh-4.4# exit
exit
```
6. Verify the upgrade status as follows:
  - In the ibm-spectrum-scale-operator namespace, verify whether the coreECE image is updated in the ibm-spectrum-scale-manager-config configmap.
    
    coreECE: cp.icr.io/cp/spectrum/scale/erasure-code/ibm-spectrum-scale-daemon@sha256:9adcab69b470572b1dd3ef2d965d9e3873165612ac3b8e089374d8d53d979841
  - In the ibm-spectrum-scale-operator namespace, verify whether the new pod is running state .
  - In the ibm-spectrum-scale-csi namespace, verify whether the new pod is running state .
7. Monitor the upgrade status as follows:
  - Nodes start rebooting one by one after a successful patching.
  - The new scale pods come up in the ibm-spectrum-scale namespace after all the nodes are restarted.
  - Run the following command to get the pods details in the ibm-spectrum-scale namespace.
    oc get pods -n ibm-spectrum-scale
8. Run the following command to login to any scale core pod.
  oc rsh <podname>
9. Run the following command to check the scale service state on all scale core pods.
  mmgetstate -a
10. Run the following command to check whether the filesystem is mounted on all the nodes.
  mmlsmount all
  - However, it is alright to proceed when only a single scale-core pod is in the Init: CrashLoopBackoff state and all other scale-core pods are in a Running.
  - The scale-core pod that is already in an Init: CrashLoopBackoff state can continue to remain in the same state even after the scale-core pods are updated. In such a case, delete the single scale-core pod in Init: CrashLoopBackoff state.
  - The deletion causes the pod to recycle and achieve a running state, and the IBM Storage Scale no longer blocks the Red Hat OpenShift Machine Config Operator (MCO).
  - Run the oc get mcp command to check the Machine Config Operator, and update the rest of the nodes to complete the Red Hat OpenShift upgrade. Follow the upgrade instructions to validate the upgrade status and ensure that all pods are in a Running state.
11. All core pods must be in a running state with new versions after you complete the upgrade steps.
  
  For offline mirroring, add the above required images to your offline registry along with the IBM Storage Scale 5.2.0.1 images. For IBM Storage Scale 5.2.0.1 images, contact IBM support.

[{"Type":"MASTER","Line of Business":{"code":"LOB69","label":"Storage TPS"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSFETU","label":"IBM Storage Fusion"},"ARM Category":[{"code":"a8m3p0000000rXCAAY","label":"SW"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB69","label":"Storage TPS"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSXEFDS","label":"IBM Storage Fusion HCI Appliance Software"},"ARM Category":[{"code":"a8m3p0000000rX7AAI","label":"HW"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Tips

IBM Storage Fusion 2.8.0 Hotfix for the GDP OCP issues

General Page

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?