Flashes (Alerts)
Abstract
A recent Linux kernel update to address the CVE-2024-25744 Linux security vulnerability results in a failure in the mmbuildgpl command when building the IBM Storage Scale kernel portability layer. IBM Storage Scale is unable to achieve an active state upon a node with this updated kernel.
This will affect IBM Storage Scale releases 5.1.8.0 through 5.1.9.3 and 5.2.0.0. The fix for this is contained in 5.1.9.4 and 5.2.0.1.
This is the Red Hat distro page tracking the kernel change: https://access.redhat.com/security/cve/CVE-2024-25744
Based on the above links, the following kernels (x86_64 only) impact IBM Storage Scale:
- RHEL9.2 5.14.0-284.66.1.el9_2.x86_64 and higher
OpenShift levels containing kernels (x86_64 only) that impact IBM Storage Scale Container Native:
- 4.15.13 and higher
- 4.14.25 and higher
- 4.13.42 and higher
Content
...
Invoking Kbuild...
/usr/bin/make -C /usr/src/kernels/5.14.0-284.66.1.el9_2.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config ; \
if [ $? -ne 0 ]; then \
exit 1;\
fi
make[2]: Entering directory '/usr/src/kernels/5.14.0-284.66.1.el9_2.x86_64'
CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:61,
from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54:
/usr/lpp/mmfs/src/gpl-linux/kx.c: In function 'vstat':
/usr/lpp/mmfs/src/gpl-linux/kx.c:238:12: error: 'struct stat' has no member named '__st_ino'; did you mean 'st_ino'?
238 | statbuf->__st_ino = vattrp->va_ino;
| ^~~~~~~~
| st_ino <----------------------- signature of this problem
make[3]: *** [scripts/Makefile.build:321: /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
make[2]: *** [Makefile:1923: /usr/lpp/mmfs/src/gpl-linux] Error 2
make[2]: Leaving directory '/usr/src/kernels/5.14.0-284.66.1.el9_2.x86_64'
make[1]: *** [makefile:140: modules] Error 1
make[1]: Leaving directory '/usr/lpp/mmfs/src/gpl-linux'
make: *** [makefile:145: Modules] Error 1
--------------------------------------------------------
mmbuildgpl: Building GPL module failed at Thu May 16 19:18:34 UTC 2024.
--------------------------------------------------------
mmbuildgpl: Command failed. Examine previous error messages to determine cause.
- First check that the KERNEL-VERSION column lists kernel 5.14.0-284.66.1 or higher. If the kernel level is lower, this issue is not being hit.
- Also confirm a worker node is set to SchedulingDisabled. This will be indicative of the next worker node that OpenShift has selected for rollout of the upgrade machine config. Note this for when the recovery procedure commences.
# oc get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master0.cp.fyre.ibm.com Ready control-plane,master 3d3h v1.28.9+8ca71f7 10.17.105.70 <none> Red Hat Enterprise Linux CoreOS 415.92.202405070140-0 (Plow) 5.14.0-284.66.1.el9_2.x86_64 cri-o://1.28.6-5.rhaos4.15.gita02fb1e.el9
master1.cp.fyre.ibm.com Ready control-plane,master 3d3h v1.28.9+8ca71f7 10.17.106.60 <none> Red Hat Enterprise Linux CoreOS 415.92.202405070140-0 (Plow) 5.14.0-284.66.1.el9_2.x86_64 cri-o://1.28.6-5.rhaos4.15.gita02fb1e.el9
master2.cp.fyre.ibm.com Ready control-plane,master 3d3h v1.28.9+8ca71f7 10.17.106.236 <none> Red Hat Enterprise Linux CoreOS 415.92.202405070140-0 (Plow) 5.14.0-284.66.1.el9_2.x86_64 cri-o://1.28.6-5.rhaos4.15.gita02fb1e.el9
worker0.cp.fyre.ibm.com Ready worker 3d3h v1.28.7+f1b5f6c 10.17.108.41 <none> Red Hat Enterprise Linux CoreOS 415.92.202403270524-0 (Plow) 5.14.0-284.59.1.el9_2.x86_64 cri-o://1.28.4-8.rhaos4.15.git24f50b9.el9
worker1.cp.fyre.ibm.com Ready worker 3d3h v1.28.7+f1b5f6c 10.17.113.236 <none> Red Hat Enterprise Linux CoreOS 415.92.202403270524-0 (Plow) 5.14.0-284.59.1.el9_2.x86_64 cri-o://1.28.4-8.rhaos4.15.git24f50b9.el9
worker2.cp.fyre.ibm.com Ready worker 3d3h v1.28.9+8ca71f7 10.17.120.85 <none> Red Hat Enterprise Linux CoreOS 415.92.202405070140-0 (Plow) 5.14.0-284.66.1.el9_2.x86_64 cri-o://1.28.6-5.rhaos4.15.gita02fb1e.el9
worker3.cp.fyre.ibm.com Ready,SchedulingDisabled worker 3d3h v1.28.7+f1b5f6c 10.17.124.102 <none> Red Hat Enterprise Linux CoreOS 415.92.202403270524-0 (Plow) 5.14.0-284.59.1.el9_2.x86_64 cri-o://1.28.4-8.rhaos4.15.git24f50b9.el9
worker4.cp.fyre.ibm.com Ready worker 3d3h v1.28.7+f1b5f6c 10.17.124.158 <none> Red Hat Enterprise Linux CoreOS 415.92.202403270524-0 (Plow) 5.14.0-284.59.1.el9_2.x86_64 cri-o://1.28.4-8.rhaos4.15.git24f50b9.el9
- Confirm a single scale-core pod (worker2 in this example) is in the following state: Init: CrashLoopBackOff
# oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ibm-spectrum-scale-gui-0 4/4 Running 7 (26h ago) 2d19h 10.254.20.11 worker1.cp.fyre.ibm.com <none> <none>
ibm-spectrum-scale-gui-1 4/4 Running 5 (18h ago) 2d19h 10.254.28.9 worker4.cp.fyre.ibm.com <none> <none>
ibm-spectrum-scale-pmcollector-0 2/2 Running 0 2d19h 10.254.20.12 worker1.cp.fyre.ibm.com <none> <none>
ibm-spectrum-scale-pmcollector-1 2/2 Running 0 26m 10.254.16.4 worker2.cp.fyre.ibm.com <none> <none>
worker0 2/2 Running 0 2d19h 10.17.108.41 worker0.cp.fyre.ibm.com <none> <none>
worker1 2/2 Running 1 (2d19h ago) 2d19h 10.17.113.236 worker1.cp.fyre.ibm.com <none> <none>
worker2 0/2 Init:CrashLoopBackOff 9 (47s ago) 25m 10.17.120.85 worker2.cp.fyre.ibm.com <none> <none>
worker3 2/2 Running 1 (2d19h ago) 2d19h 10.17.124.102 worker3.cp.fyre.ibm.com <none> <none>
worker4 2/2 Running 0 2d19h 10.17.124.158 worker4.cp.fyre.ibm.com <none> <none>
- Check logs from the mmbuildgpl pod of the worker that is in Init: CrashLoopBackOff from above. Look for the st_ino error which is a signature of this issue:
# oc logs worker2 -c mmbuildgpl
....
Invoking Kbuild...
/usr/bin/make -C /usr/src/kernels/5.14.0-284.66.1.el9_2.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config ; \
if [ $? -ne 0 ]; then \
exit 1;\
fi
make[2]: Entering directory '/usr/src/kernels/5.14.0-284.66.1.el9_2.x86_64'
CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:61,
from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54:
/usr/lpp/mmfs/src/gpl-linux/kx.c: In function 'vstat':
/usr/lpp/mmfs/src/gpl-linux/kx.c:238:12: error: 'struct stat' has no member named '__st_ino'; did you mean 'st_ino'?
238 | statbuf->__st_ino = vattrp->va_ino;
| ^~~~~~~~
| st_ino
make[3]: *** [scripts/Makefile.build:321: /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
make[2]: *** [Makefile:1923: /usr/lpp/mmfs/src/gpl-linux] Error 2
make[2]: Leaving directory '/usr/src/kernels/5.14.0-284.66.1.el9_2.x86_64'
make[1]: *** [makefile:140: modules] Error 1
make[1]: Leaving directory '/usr/lpp/mmfs/src/gpl-linux'
make: *** [makefile:145: Modules] Error 1
--------------------------------------------------------
mmbuildgpl: Building GPL module failed at Mon May 20 17:27:51 UTC 2024.
--------------------------------------------------------
mmbuildgpl: Command failed. Examine previous error messages to determine cause.
cleanup run
- What the above shows:
- Worker2 is in Init: CrashLoopBackOff because mmbuildgpl is failing to compile a portability layer used as a kernel tie-in for Storage Scale Container Native. mmbuildgpl is failing due to a defect creating an incompatibility with RHCOS 9 EUS kernel level 5.14.0-284.66.1.el9_2 and higher. The OpenShift Machine Config Operator (MCO) has rolled out a new config on the underlying worker2 RHCOS node successfully, but it cannot progress to the next worker node because Scale is protecting cluster integrity and preventing draining of the next scale-core pod… thus holding up the OpenShift MCO rollout and ultimately, the OpenShift upgrade itself.
- Recovery from this failure state where a single scale-core pod is in Init: CrashLoopBackoff while the OpenShift Machine Config Operator is unable to progress to the next worker and finish the OpenShift upgrade.
- If at 5.2.0.0, follow the 5.2.0 upgrade steps in the Storage Scale Container Native documentation here: https://www.ibm.com/docs/en/scalecontainernative/5.2.0?topic=upgrading-storage-scale-container-native. See details below before proceeding with upgrade.
- If at 5.1.9.1 or 5.1.9.3, follow the 5.1.9 upgrade steps in the Storage Scale Container Native documentation here: https://www.ibm.com/docs/en/scalecontainernative/5.1.9?topic=upgrading-storage-scale-container-native. See details below before proceeding with upgrade.
- all Storage Scale Container Native upgrade instructions point to an install.yaml in the public github repo branch of 5.2.0.x or 5.1.9.x. This branch is always updated with the latest fixpack / efix and upgrading to it will apply the fix for this problem (the branches were updated with the fix on May 22, 2024). Upgrade and install processes for both efixes and fixpacks are identical to the processes of major releases.
- Follow the entirety of the upgrade instructions with one exception. Upgrade documentation will state not to proceed if not all pods are up. In this case, it is ok to proceed as long as only a single scale-core pod is in the state Init: CrashLoopBackoff, so long as all other scale-core pods are in a Running state. After the scale-core pods update, the scale-core pod previously in Init: CrashLoopBackoff may remain in Init: CrashLoopBackoff. When this occurs, and only if a single scale-core pod is in this state, delete this single scale-core pod in Init: CrashLoopBackoff. Deletion will cause the pod to recycle and achieve a Running state. Once this occurs, the OpenShift Machine Config Operator (MCO) will no longer be blocked, by Storage Scale, from continuing to update nodes. Watch the Machine Config Operator (oc get mcp) update the rest of the nodes and complete the OpenShift upgrade. Follow the entirety of the upgrade instructions to validate upgrade success and that all pods are in a Running state afterwards.
- IBM Storage Scale
- Affected customers using the 5.1.9 EUS stream, or lower, should upgrade to IBM Storage Scale V5.1.9.4 or higher https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Storage+Scale&release=5.1.9&platform=All&function=all
- Affected customers using the 5.2.0.0 release should upgrade to IBM Storage Scale V5.2.0.1 or higher https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Storage+Scale&release=5.2.0&platform=All&function=all
- If recovering from a cloudkit image generation failure, upgrade the repository per cloudkit instructions, and re-run the cloukdit create image.
- IBM Storage Scale Container Native
- Affected customers using the 5.1.9 EUS stream should follow documented 5.1.9 upgrade instructions to upgrade to the latest fixed level before upgrading OpenShift to the affected levels. See example 2 for recovery steps. Because the 5.1.9 upgrade instructions point to a 5.1.9.x branch, the resulting upgrade will always be the latest 5.1.9 EUS efix or fixpack available at the time of the upgrade.
- Affected customers using the 5.2.0.0 release should follow documented 5.2.0 upgrade instructions to upgrade to the latest fixed level before upgrading OpenShift to the affected levels. See example 2 for recovery steps. Because the 5.2.0 upgrade instructions point to a 5.2.0.x branch, the resulting upgrade will always be the latest 5.2.0 efix or fixpack available at the time of the upgrade.
Step 1: Open the file /usr/lpp/mmfs/src/gpl-linux/kx.c in a text editor.
Step 2: Please follow the below mentioned steps
Find the code block:#ifdef STAT64_HAS_BROKEN_ST_INO
/* Linux has 2 struct stat64 definitions:
1) /usr/include/asm/stat.h
2) /usr/include/bits/stat.h
Of course, they differ
1) st_dev & st_rdev is 2 bytes in (1) and 8 bytes in (2),
but the 2 definitions overlap.
2) st_ino is 8 bytes in (1) and 4 bytes in (2)
and they are in different places!
Fortunately, (1) defines an ifdef telling us to assign st_ino
to a second variable which just happens to exactly match the
definition in (2). */
statbuf->__st_ino = vattrp->va_ino;
#endif
Remove the line statbuf->__st_ino = vattrp->va_ino;
Save the file
Run mmbuildgpl
Was this topic helpful?
Document Information
Modified date:
14 June 2024
UID
ibm17155787