General Page
A recent Linux kernel update to address the CVE-2024-25744 Linux security vulnerability resulted in a mmbuildgpl command failure while creating the IBM Storage Scale kernel portability layer. IBM Storage Scale cannot get into an active state on a node with the updated kernel.
IBM Support page: https://www.ibm.com/support/pages/node/7155787
OpenShift levels containing the kernels (x86_64 only) that impact the IBM Storage Scale Container Native:
- 4.13.42 or higher versions
- 4.14.25 or higher versions
The following example steps show how to identify the error and recover from a failure of the IBM Storage Scale Container Native after an OpenShift upgrade.
How to identify the probelm
-
Check whether the KERNEL-VERSION column lists the kernel 5.14.0-284.66.1 or a higher version. If the kernel version is lower than 5.14.0-284.66.1, then you might not face this issue.
-
Run the following command to check whether any node is set to SchedulingDisabled state. It indicates the next compute node that is scheduled for a Red Hat OpenShift machine config upgrade rollout.
oc get nodes
Example output:
NAME STATUS compute-1-ru5.rack1 Ready compute-1-ru6.rack1 Ready compute-1-ru7.rack1 Ready,SchedulingDisabled control-1-ru2.rack1 Ready control-1-ru3.rack1 Ready control-1-ru4.rack1 Ready
-
Run the following command to check whether at least a single scale-core pod must be in the Init: CrashLoopBackOff state.
oc get pods -n ibm-spectrum-scale
Example output:
$ oc get pods -n ibm-spectrum-scale NAME STATUS compute-1-ru5 Running compute-1-ru6 Running compute-1-ru7 Init: CrashLoopBackOff control-1-ru2 Running control-1-ru3 Running control-1-ru4 Running ibm-spectrum-scale-gui-0 Running ibm-spectrum-scale-gui-1 Running ibm-spectrum-scale-pmcollector-0 Running ibm-spectrum-scale-pmcollector-1 Running
- Run the following command to check the logs from the mmbuildgpl pod of the worker node that is in Init: CrashLoopBackOff state. Search for the variable st_ino error that is a signature of this issue.
oc logs compute-1-ru7 -c mmbuildgpl
- The example output shows that the
compute-1-ru7
node is in an Init: CrashLoopBackOff state because the mmbuildgpl fails to compile a portability layer that is used as kernel tie-in for the IBM Storage Scale Container Native. The mmbuildgpl failed due to a defect that created an incompatibility with the RHCOS 9 EUS kernel level 5.14.0-284.66.1.e19_2 and higher versions.
- The OpenShift Machine Config Operator (MCO) rolled out a new configuration on the underlying
compute-1-ru7
node. It then progressed to the next node because the cluster integrity protected by the IBM Storage Scale, which prevented the draining of the next scale core pod. The Red Hat OpenShift upgraded itself as a result of holding the MCO rollout for a long time.
Example output:
....
Invoking Kbuild...
/usr/bin/make -C /usr/src/kernels/5.14.0-284.66.1.el9_2.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config ; \
if [ $? -ne 0 ]; then \
exit 1;\
fi make[2]: Entering directory '/usr/src/kernels/5.14.0-284.66.1.el9_2.x86_64'
CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:61,
from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54:
/usr/lpp/mmfs/src/gpl-linux/kx.c: In function 'vstat':
/usr/lpp/mmfs/src/gpl-linux/kx.c:238:12: error: 'struct stat' has no member named '__st_ino'; did you mean 'st_ino'?
238 | statbuf->__st_ino = vattrp->va_ino;
| ^~~~~~~~
| st_ino
make[3]: *** [scripts/Makefile.build:321: /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 make[2]: *** [Makefile:1923: /usr/lpp/mmfs/src/gpl-linux] Error 2
make[2]: Leaving directory '/usr/src/kernels/5.14.0-284.66.1.el9_2.x86_64'
make[1]: *** [makefile:140: modules] Error 1
make[1]: Leaving directory '/usr/lpp/mmfs/src/gpl-linux'
make: *** [makefile:145: Modules] Error 1
--------------------------------------------------------
mmbuildgpl: Building GPL module failed at Mon May 20 17:27:51 UTC 2024.
--------------------------------------------------------
mmbuildgpl: Command failed. Examine previous error messages to determine cause.
cleanup run
Resolution
- In case of offline install, mirror the required images by using the following steps:
- Log in to the IBM Entitled Container Registry by using the IBM entitlement key. For procedure, see step 1 in the IBM Fusion HCI documentation.
- Run the command to login to the Docker registry with your enterprise registry credentials.
docker login $LOCAL_ISF_REGISTRY -u <your enterprise registry username> -p <your enterprise registry password>
-
From the mirroring host, run the following commands to copy the required images to the aritifactory:
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/erasure-code/ibm-spectrum-scale-daemon@sha256:63bba2ee49ba14a7c38f94c4bf4f92366ebec33e30226dde08ede24a43dce573 docker://$TARGET_PATH/erasure-code/ibm-spectrum-scale-daemon@sha256:63bba2ee49ba14a7c38f94c4bf4f92366ebec33e30226dde08ede24a43dce573
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/ibm-spectrum-scale-core-init@sha256:e7366f5fa4ca7dbcd71b2e8966a4e795d78c0c9c2167b33088b5565f29ab591d docker://$TARGET_PATH/ibm-spectrum-scale-core-init@sha256:e7366f5fa4ca7dbcd71b2e8966a4e795d78c0c9c2167b33088b5565f29ab591d
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/ibm-spectrum-scale-gui@sha256:31c2aebcd5f95c99ae3c03b41af17dc4e6523caca12635cc5f161d51dba107a6 docker://$TARGET_PATH/ibm-spectrum-scale-gui@sha256:31c2aebcd5f95c99ae3c03b41af17dc4e6523caca12635cc5f161d51dba107a6
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/postgres@sha256:b2f06ce12103bedbc0a49ae4ffff062d90824e0f45462de712f66952679f7670 docker://$TARGET_PATH/postgres@sha256:b2f06ce12103bedbc0a49ae4ffff062d90824e0f45462de712f66952679f7670
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/ubi-minimal@sha256:582e18f13291d7c686ec4e6e92d20b24c62ae0fc72767c46f30a69b1a6198055 docker://$TARGET_PATH/ubi-minimal@sha256:582e18f13291d7c686ec4e6e92d20b24c62ae0fc72767c46f30a69b1a6198055
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/ibm-spectrum-scale-pmcollector@sha256:79237b6ad3076722520e7743841e87148de241e32a0bc7e7cc2bc5b6a4e52fff docker://$TARGET_PATH/ibm-spectrum-scale-pmcollector@sha256:79237b6ad3076722520e7743841e87148de241e32a0bc7e7cc2bc5b6a4e52fff
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/ibm-spectrum-scale-monitor@sha256:a5996e1d0eb2bcdac2009c696c4f9f23e9e40273fc305562ec077463ecd18a99 docker://$TARGET_PATH/ibm-spectrum-scale-monitor@sha256:a5996e1d0eb2bcdac2009c696c4f9f23e9e40273fc305562ec077463ecd18a99
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/ibm-spectrum-scale-grafana-bridge@sha256:fca0c3bdfb2e3161b134548e9daa66df71289ab048ce52328fbfa50b3c8ed56e docker://$TARGET_PATH/ibm-spectrum-scale-grafana-bridge@sha256:fca0c3bdfb2e3161b134548e9daa66df71289ab048ce52328fbfa50b3c8ed56e
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/ibm-spectrum-scale-coredns@sha256:1ba4d51e896607c6f968f8df8e04ccfe7a71babd778838c9de040beda6bf1ff7 docker://$TARGET_PATH/ibm-spectrum-scale-coredns@sha256:1ba4d51e896607c6f968f8df8e04ccfe7a71babd778838c9de040beda6bf1ff7
skopeo copy --all --preserve-digests docker://icr.io/cpopen/ibm-spectrum-scale-must-gather@sha256:05948ccd999cfa4646cb022e2da0185dd0c46f1d1945ceab569857e213cb256f docker://$TARGET_PATH/ibm-spectrum-scale-must-gather@sha256:05948ccd999cfa4646cb022e2da0185dd0c46f1d1945ceab569857e213cb256f
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/ibm-spectrum-scale-ganesha@sha256:46483e50df399149b5200de5d0f0acf702c677d9b99da401bf56d6d701b388d0 docker://$TARGET_PATH/ibm-spectrum-scale-ganesha@sha256:46483e50df399149b5200de5d0f0acf702c677d9b99da401bf56d6d701b388d0
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/ibm-spectrum-scale-stunnel@sha256:5f4d2d93920531617b5adc42b4b7585cc51ab18789cc763e6b1ef9747e921399 docker://$TARGET_PATH/ibm-spectrum-scale-stunnel@sha256:5f4d2d93920531617b5adc42b4b7585cc51ab18789cc763e6b1ef9747e921399
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/ibm-spectrum-scale-pmsensors@sha256:798ab98524033f3499dad667a93547424ca62c882f588091e56ee1ea7fcb0c90 docker://$TARGET_PATH/ibm-spectrum-scale-pmsensors@sha256:798ab98524033f3499dad667a93547424ca62c882f588091e56ee1ea7fcb0c90
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/csi/csi-snapshotter@sha256:becc53e25b96573f61f7469923a92fb3e9d3a3781732159954ce0d9da07233a2 docker://$TARGET_PATH/csi/csi-snapshotter@sha256:becc53e25b96573f61f7469923a92fb3e9d3a3781732159954ce0d9da07233a2
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/csi/csi-attacher@sha256:4eb73137b66381b7b5dfd4d21d460f4b4095347ab6ed4626e0199c29d8d021af docker://$TARGET_PATH/csi/csi-attacher@sha256:4eb73137b66381b7b5dfd4d21d460f4b4095347ab6ed4626e0199c29d8d021af
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/csi/csi-provisioner@sha256:d078dc174323407e8cc6f0f9abd4efaac5db27838f1564d0253d5e3233e3f17f docker://$TARGET_PATH/csi/csi-provisioner@sha256:d078dc174323407e8cc6f0f9abd4efaac5db27838f1564d0253d5e3233e3f17f
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/csi/livenessprobe@sha256:4dc0b87ccd69f9865b89234d8555d3a614ab0a16ed94a3016ffd27f8106132ce docker://$TARGET_PATH/csi/livenessprobe@sha256:4dc0b87ccd69f9865b89234d8555d3a614ab0a16ed94a3016ffd27f8106132ce
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/csi/csi-node-driver-registrar@sha256:f6717ce72a2615c7fbc746b4068f788e78579c54c43b8716e5ce650d97af2df1 docker://$TARGET_PATH/csi/csi-node-driver-registrar@sha256:f6717ce72a2615c7fbc746b4068f788e78579c54c43b8716e5ce650d97af2df1
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/csi/csi-resizer@sha256:2e2b44393539d744a55b9370b346e8ebd95a77573064f3f9a8caf18c22f4d0d0 docker://$TARGET_PATH/csi/csi-resizer@sha256:2e2b44393539d744a55b9370b346e8ebd95a77573064f3f9a8caf18c22f4d0d0
skopeo copy --all --preserve-digests docker://cp.icr.io/cp/spectrum/scale/csi/ibm-spectrum-scale-csi-driver@sha256:34925dffe24be39e19fda24339bed15c7e9e10110285b11aab304df6bf40a0ec docker://$TARGET_PATH/csi/ibm-spectrum-scale-csi-driver@sha256:34925dffe24be39e19fda24339bed15c7e9e10110285b11aab304df6bf40a0ec
skopeo copy --all --preserve-digests docker://icr.io/cpopen/ibm-spectrum-scale-csi-operator@sha256:ca90553412d96a2a6c3ceb4161a5c29facb3bb5d61ad96519ce6ad9e37627ed6 docker://$TARGET_PATH/ibm-spectrum-scale-csi-operator@sha256:ca90553412d96a2a6c3ceb4161a5c29facb3bb5d61ad96519ce6ad9e37627ed6
skopeo copy --all --preserve-digests docker://icr.io/cpopen/ibm-spectrum-scale-operator@sha256:aecf593217b6ba39bf6e9f99adccb50a130ce2debbdd2e90cf82afc35eb08ef3 docker://$TARGET_PATH/ibm-spectrum-scale-operator@sha256:aecf593217b6ba39bf6e9f99adccb50a130ce2debbdd2e90cf82afc35eb08ef3 -
After all the skopeo commands complete successfully, update the existing ICSP for scale and provide the new $TARGET_PATH both for sources cp.icr.io/cp/spectrum.scale and icr.io/copen.
For example:- mirrors: - old mirror path - $TARGET_PATH. <---provide new path source: cp.icr.io/cp/spectrum/scale - mirrors: - old mirror path - $TARGET_PATH. <---provide new path source: icr.io/copen
- Wait until the MCP rollout is completed. You can check MCP rollout status using the following command.
oc get mcp
- After the MCO rollout is completed, mirror the isf-storage-operator image.
skopeo copy --insecure-policy --preserve-digests --all docker://cp.icr.io/cp/isf/isf-storage-operator@sha256:4c56ddb5a745a196e24b3ba7d7af166c64e67cac302ec859e4ce3c556ac5625c docker://$TARGET_PATH/isf-storage-operator@sha256:4c56ddb5a745a196e24b3ba7d7af166c64e67cac302ec859e4ce3c556ac5625c - For the isf-storage-operator-image, update the ICSP for IBM Storage Fusion images and provide the new $TARGET_PATH to the source cp.icr.io/cp.
For example:- mirrors: - old mirror path - $TARGET_PATH. <---provide new path source: cp.icr.io/cp
- After you complete all the steps, then wait until the MCP rollout is completed. You can check MCP rollout status using the following command.
oc get mcp
- Enabling manual Storage Scale CNSA installation:
- Log in to the OpenShift web console.
- Go to Administration > CustomResourceDefinitions.
- Select ScaleManager CR and go to Instances tab.
- Under the Instances tab, select scalemanager and go to YAML tab.
- Add a field enableManualInstallation: true to the scalemanager CR as follows:
apiVersion: cns.isf.ibm.com/v1 kind: ScaleManager metadata: name: scalemanager namespace: ibm-spectrum-fusion-ns spec: creator: Fusion enableManualInstallation: true
- To recover from the failure state where a single scale-core pod is in Init: CrashLoopBackoff state, upgrade the IBM Storage Scale 5.1.9.1 - 5.1.9.3 to 5.1.9.4 version. Note that:
- Upgrade documentation states not to proceed whenever all pods are not Running. However, it is alright to proceed when only a single scale-core pod is in the Init: CrashLoopBackoff state and all other scale-core pods are in a Running.
- The scale-core pod that is already in an Init: CrashLoopBackoff state can continue to remain in the same state even after the scale-core pods are updated. In such a case, delete the single scale-core pod in Init: CrashLoopBackoff state.
- The deletion causes the pod to recycle and achieve a running state, and the IBM Storage Scale no longer blocks the Red Hat OpenShift Machine Config Operator (MCO).
- Run the oc get mcp command to check the Machine Config Operator, and update the rest of the nodes to complete the Red Hat OpenShift upgrade. Follow the upgrade instructions to validate the upgrade status and ensure that all pods are in a Running state.
- Follow the steps to upgrade Storage Scale CNSA to the version 5.1.9.4:
- Stop the running operator pod by setting the
replicas
in the deployment to 0.
oc scale deployment ibm-spectrum-scale-controller-manager -n ibm-spectrum-scale-operator --replicas=0
- Delete the old security context constraint.
oc delete scc ibm-spectrum-scale-privileged
- Delete the old role binding for privilege.
oc delete rolebinding -n ibm-spectrum-scale ibm-spectrum-scale-privileged --ignore-not-found
- Delete the
MutatingWebhookConfiguration
andValidatingWebhookConfiguration
.
oc delete MutatingWebhookConfiguration ibm-spectrum-scale-mutating-webhook-configuration
oc delete ValidatingWebhookConfiguration ibm-spectrum-scale-validating-webhook-configuration
- Apply the new manifests.
oc apply -f https://raw.githubusercontent.com/IBM/ibm-spectrum-scale-container-native/v5.1.9.4/generated/scale/install.yaml
- Check if all scale pods on the current Openshift or MetroDR site are in a "Running" state with new versions (5.1.9.4) after completing the upgrade:
oc get pods -n ibm-spectrum-scale
oc exec pod/<any_running_scale_pod>-n ibm-spectrum-scale -- mmdsh -N all /usr/lpp/mmfs/bin/ mmdiag --version | grep build
Example output:
$ oc get pods -n ibm-spectrum-scale
NAME READY STATUS RESTARTS AGE
compute-1-ru5 2/2 Running 2 50d
compute-1-ru6 2/2 Running 2 50d
compute-1-ru7 2/2 Running 2 50d
control-1-ru2 2/2 Running 2 50d
control-1-ru3 2/2 Running 2 50d
control-1-ru4 2/2 Running 2 50d
ibm-spectrum-scale-gui-0 4/4 Running 6 50d
ibm-spectrum-scale-gui-1 4/4 Running 6 50d
ibm-spectrum-scale-pmcollector-0 2/2 Running 3 (9d ago) 49d
ibm-spectrum-scale-pmcollector-1 2/2 Running 7 (9d ago) 50d
$ oc exec -n ibm-spectrum-scale pod/compute-1-ru5 -- mmdsh -N all /usr/lpp/mmfs/bin/mmdiag --version | grep build
Defaulted container "gpfs" out of: gpfs, logs, mmbuildgpl (init), config (init)
compute-1-ru6.admin.ibm-spectrum-scale.abc.rack1.: Current GPFS build: "5.1.9.4 ".
compute-1-ru5.admin.ibm-spectrum-scale.abc.rack1.: Current GPFS build: "5.1.9.4 ".
control-1-ru2.admin.ibm-spectrum-scale.abc.rack1.: Current GPFS build: "5.1.9.4 ".
compute-1-ru7.admin.ibm-spectrum-scale.abc.rack2.: Current GPFS build: "5.1.9.1 ".
control-1-ru2.admin.ibm-spectrum-scale.abc.rack2.: Current GPFS build: "5.1.9.1 ".
compute-1-ru6.admin.ibm-spectrum-scale.abc.rack1.: Current GPFS build: "5.1.9.4 ".
control-1-ru3.admin.ibm-spectrum-scale.abc.rack2.: Current GPFS build: "5.1.9.1 ".
control-1-ru3.admin.ibm-spectrum-scale.abc.rack1.: Current GPFS build: "5.1.9.4 ".
compute-1-ru7.admin.ibm-spectrum-scale.abc.rack1.: Current GPFS build: "5.1.9.4 ".
compute-1-ru5.admin.ibm-spectrum-scale.abc.rack2.: Current GPFS build: "5.1.9.1 ".
control-1-ru4.admin.ibm-spectrum-scale.abc.rack1.: Current GPFS build: "5.1.9.4 ".
control-1-ru4.admin.ibm-spectrum-scale.abc.rack2.: Current GPFS build: "5.1.9.1 ".
gpfs-tiebreaker: Current GPFS build: "5.1.9.1 ".
- Stop the running operator pod by setting the
- To resolve this issue, replace the isf-storage-operator-controller-manager image with a new image in the installed operator CSV of the ibm-spectrum-fusion-ns:
- Log in to the OpenShift web console.
- Go to Operators > Installed Operators.
- Select IBM Storage Fusion.
- Go to YAML tab, locate the deployment details for isf-storage-operator and update the isf-storage-operator@<sha> for isf-storage-operator-controller-manager on two places inside the YAML with:
namespace. isf-storage-operator - cp.icr.io/cp/isf/isf-storage-operator@sha256:4c56ddb5a745a196e24b3ba7d7af166c64e67cac302ec859e4ce3c556ac5625c - Save the restart the file.
- Wait until the new pod is up and run the following command to validate the status of the new pod.
oc get pods -n ibm-spectrum-fusion-ns | grep isf-storage-operator-controller-manager
Example output:NAME STATUS isf-storage-operator-controller-manager-86ccc69c4d-hjh65 Running
- For MetroDR system, repeat the Resolution steps 1 through 4 on the secondary site only after all of them are successfully finished on the primary side.
- After both MetroDR sites are upgraded, also consider upgrading IBM Storage Scale on tiebreaker node:
- Download the Storage_Scale_Data_Management-5.1.9.4-x86_64-Linux installation package from the IBM Fix Central.
- Upload the package to the tiebreaker node and execute it.
- In the extracted location, find the Installation Toolkit folder (by default: /usr/lpp/mmfs/5.1.9.4/ansible-toolkit).
- Follow from the step 4 through 9 as mentioned in the IBM Fusion HCI documentation.
Was this topic helpful?
Document Information
Modified date:
03 December 2024
UID
ibm17156911