When you upgrade the ESS EMS to 6.2.0.1, you are prompted to run a script after the update
to complete some background tasks. If you are upgrading from any earlier version than 6.2.0.0, you
will see an error when running the
script.root@utility1-vm1# /opt/ibm/ess/tools/samples/reloadEms.sh
/etc/reloadEms file does not exists, won't run script.
Product
- IBM Storage Scale System
6000
- Utility Node Protocol
- Utility Node EMS
- ESS 3000
- ESS 3200
- ESS 3500
- ESS 5000
|
- Touch the /etc/reloadEms file on the EMS before you start the code
update.
[root@utility1-vm1# touch /etc/reloadEms
- If the touch file is not on the EMS before the upgrade begins, the error appears. If this
happens, touch a file on the EMS and re-run the script. No other action is
necessary.
[root@utility1-vm1# touch /etc/reloadEms
[root@utility1-vm1# /opt/ibm/ess/tools/samples/reloadEms.sh
Note: If the EMS is not part
of the cluster, touching the file is unnecessary.
|
The repair of an FCM4 drive might fail on the first attempt. The replacement of an FCM4
drive involves reformatting and updating the firmware. The replace command might encounter a false
error with the format that causes the process to
fail.
Example [root@s6ka ~]# mmvdisk pdisk replace --recovery-group rg1 --pdisk 'e1s04'
mmvdisk:
mmvdisk: mmchcarrier : [I] Preparing a new pdisk for use may take many minutes.
mmvdisk:
mmvdisk: Callback: /usr/lpp/mmfs/bin/tspreparenewpdiskforuse /dev/nvme0n1.
mmvdisk: Success formatting namespace:ffffffff
mmvdisk: Attempting to update firmware if necessary. Failure will not prevent drive replacement.
mmvdisk: Command: mmchfirmware --type drive --serial-number 03JN353YS12BG41Y0HS --new-pdisk
mmvdisk: NVMe status: Firmware Activation Requires NVM Subsystem Reset: The firmware commit was successful,
however, activation of the firmware image requires an NVM Subsystem(0x4110)
mmvdisk: Command: err 1: mmchfirmware --type drive --serial-number 03JN353YS12BG41Y0HS --new-pdisk
mmvdisk:
mmvdisk: The following pdisks will be formatted on node s6ka:
mmvdisk: //s6ka-hs/dev/nvme0n1,//s6kb-hs/dev/nvme0n1
mmvdisk: Error formatting pdisk e1s04 on device //s6ka-hs/dev/nvme0n1,//s6kb-hs/dev/nvme0n1.
mmvdisk: No such process
mmvdisk: Resuming pdisk e1s04 of RG rg1.
mmvdisk: Carrier resumed.
mmvdisk:
mmvdisk:
mmvdisk: Command failed. Examine previous error messages to determine cause.
[root@s6ka ~]#
Product
- IBM Storage Scale System
6000
|
Re-run the replace command (do not re-seat the
drive).[root@s6ka ~]# mmvdisk pdisk replace
--recovery-group rg1 --pdisk 'e1s04'
|
After an FCM4 drive repair, the firmware version might not be properly displayed when by using
the nvme list command.
Product
- IBM Storage Scale System
6000 with FCM4 and NVMe drives
|
If the firmware version is displayed correctly when by using the nvme list
command, use the following reliable method(s) to verify the actual
level: mmlsfirmware –type drive
or tslsenclslot_nvme
|
For the initial setup of IBM Storage Scale System
6000 with
FCM4 drives, some drive slots might not show up properly.Product
- IBM Storage Scale System
6000 with FCM4 and NVMe drives
|
- Ensure that all NVMe drives are inserted properly and the green LEDs are lit.
- If the LEDs are not lit, re-insert the FCM4 drive.
- Power cycle both canisters simultaneously.
- Recheck drive slots.
|
DIMM errors may not show up as events in the GUI.Product
- IBM Storage Scale System
6000
|
If DIMM errors are suspected, monitor the mmhealth command output for
problems.
Example: mmhealth node show and mmhealth node eventlog
|
For 5147-102 enclosures, the enclosure
firmware update might not complete the update of all enclosures in the first attempt.
Product
- IBM Storage
Scale System 3500
5147-102
|
- Update the storage enclosures
firmware.
mmchfirmware --type storage-enclosure
- Reboot the canister.
- Check whether all enclosures are
updated.
mmlsfirmware --type storage-enclosure
Example, mmlsfirmware --type storage-enclosure
enclosure firmware available
type product id serialnumber level firmware location
---- ---------- ------------- -------- -------- --------
enclosure 5141FN2 78E4XXX E11U E11U ess35001a
enclosure 5147-102 XXXXXXX 4E2A,5460 *5460 ess35001a
If the ‘available firmware’ is prefaced with a ‘*’, run the mmchfirmware
command again. Follow the instructions and check by running the mmlsfirmware
command when complete.
|
In the hardware panel of a system that is using a utility node EMS, a node might displayed as
Unknown .
Product
- Utility Node Protocol
- Utility Node EMS
|
- Check whether historical data is stored.
- Clear the GUI DB and cached data (from the utility node).
systemctl stop gpfsgui
psql postgres postgres -c "drop schema fscc cascade;"
mmccr fdel _gui.settings
mmccr fdel _gui.user.repo
mmccr fdel _gui.keystore_settings
mmccr fdel _gui.policysettings
mmccr fdel _gui.dashboards
mmccr fdel _gui.notification
mmccr fdel gui_jobs
mmccr fdel gui
rm -f /var/lib/mmfs/gui/*.json*
rm -rf /var/log/cnlog/mgtsrv/*
mmchnode --noperfmon -N all
mmperfmon config delete –all
- Recreate the GUI configuration using by the essrun
command.
essrun -N <all nodes> config load <-p>
essrun -N <all nodes> gui –configure
- Rerun the GUI setup wizard.
|
If the IBM Storage Scale
System is not in pristine
condition before upgrade, the precheck might not allow an update to continue.
Product
- IBM Storage Scale System
6000
- Utility Node Protocol
- Utility Node EMS
- ESS 3000
- ESS 3200
- ESS 3500
- ESS 5000
|
- Check all displayed errors. If you determine that the update can proceed even with the problems
that are called out, you can run the following command to bypass the
errors:
essrun -N <nodes to update> update --
precheck --no-health-check
- Update the node.
essrun -N <nodes to update> update
|
The filter on the Hardware Details page of the GUI might not always work as
expected.
Product
- IBM Storage Scale System
6000
- Utility Node Protocol
- Utility Node EMS
- ESS 3000
- ESS 3200
- ESS 3500
- ESS 5000
|
Navigate to the Hardware Details page of the GUI. Use the filter box to
search for items on the page. The page might take time to refresh. If the refresh happens but
nothing is shown, you might need to refresh the entire window and retry.
|
In the panel, you might encounter widgets that display ‘Performance chart
preset not found’.
Product
- IBM Storage Scale System
6000
- Utility Node Protocol
- Utility Node EMS
- ESS 3000
- ESS 3200
- ESS 3500
- ESS 5000
|
Click the 3 dots on the top right side of the NAS Workload panel and select
Edit Widgets. For the widgets that you want to edit, use one of the following optioln:
|
During a disk repair sequence by using GUI takes an extended period of time, the window might
time out and exit the repair.
Product
- IBM Storage Scale System
6000
|
If this occurs after the disk is physically replaced, do not re-enter the GUI and attempt the
drive repair. To continue the repair operation, issue the following
command: mmvdisk pdisk replace --rg '<recovery group>' --pdisk '<eXsXXX>'
|
Adding a node by using the following command fails, if the add action will violate quorum
rules of the existing cluster.essrun --add-node or --add-ems
Product
- IBM Storage Scale System
6000
- Utility Node Protocol
- Utility Node EMS
- ESS 3000
- ESS 3200
- ESS 3500
- ESS 5000
|
|
During a disk repair on the IBM Storage Scale System
6000 node,
the amber LED might not illuminate, which makes it difficult to locate the drive.
Product
- IBM Storage Scale System
6000
|
If this issue occurs during the repair procedure, use the following two options to identify the
disk that needs to be replaced.
- Option 1
- All LEDs of the faulty drive will be off at this point. You can identify the faulty drive by
noting the lack of any lit LED.
- Option 2
-
|
esscallhomeconf fails on the P9 EMS in a co-existence environment.
Product
- IBM Storage Scale System
6000
|
In coexistence environments, the call home for IBM Storage Scale System
6000 will be from the Utility Node EMS only at this
time.
Utility Node EMS exclusively manages call home events for IBM Storage Scale System
6000 in coexistence environments until further
notice.
|
Unified call home enablement might not be consistent on the Utility Protocol node.
|
After configuration of the call home from EMS, check whether the unified call home is enabled on
all 3500, IBM Storage Scale System
6000, and Utility Node.
- VM's
[root@esstest-emsvm ~]# mmdsh -N all
mmsysmonc callhome
isHolisticCallHomeEnabled
esstest-emsvm-hs.tms.stglabs.ibm.com:
yes
lothal-qa6-2-hs.tms.stglabs.ibm.com: yes
lothal-qa6-1-hs.tms.stglabs.ibm.com: yes
esstest-utilitynode2-essvmhs.
tms.stglabs.ibm.com: no
- If protocol node is not enabled, apply the following
workaround:
mmchconfig mmhealth-callhomeis_
holistic_allowed_for_all_ess=yes --
force -N esstest-utilitynode2-essvmhs.
tms.stglabs.ibm.com
|
After updating to ESS 6.1.9.1, you might encounter the call home degraded error:
‘callhome_ptfupdates_failed’ when running ‘mmhealth node show’
Product
- IBM Storage Scale System
6000
- Utility Node Protocol
- Utility Node EMS
- ESS 3000
- ESS 3200
- ESS 3500
- ESS 5000
|
The error seen in this case is: [root@tucutil1vm]# mmhealth node show | grep CALL
CALLHOME DEGRADED 3 days ago callhome_ptfupdates_failed
This error can be ignored for ESS 6.1.9.1. If you want resolve the error, the following
configuration parameter can be
used: root@tucutil1vm]# mmchconfig mmhealthPtfUpdatesMonitorEnabled=no
A sample
output is as follows: Thu Dec 21 19:19:39 EST 2023
mmchconfig: Command successfully
completed
mmchconfig: Propagating the cluster
configuration data to all
affected nodes. This is an
asynchronous process.
[root@tucutil1vm]#
|
After updating to ESS 6.1.9.1, in the GUI when navigating to the Storage
section, and then to Declustered Arrays you might encounter an error. The
performance collector did not return any data.
Product
- IBM Storage Scale System
6000
- Utility Node Protocol
- Utility Node EMS
- ESS 3000
- ESS 3200
- ESS 3500
- ESS 5000
|
If this error occurs, restart pmsensors on all the nodes in the cluster, and restart pmcollector
on the EMS. node:
- On all nodes, run the following command:
systemctl restart pmsensors
- On the EMS, run the following command:
systemctl restart pmcollector
- Refresh the states.
mmsysmoncontrol restart
|
After updating to ESS 6.1.9.1, when you are checking pmsensors and pmcollector, you might
encounter an error, such as the hostname is not set.
Product
- IBM Storage Scale System
6000
- Utility Node Protocol
- Utility Node EMS
- ESS 3000
- ESS 3200
- ESS 3500
- ESS 5000
|
This is the error occurs in this case:
[root@ess3500 ~]# systemctl status pmsensors
A sample output is as
follows: ● pmsensors.service - zimon sensor daemon
Loaded: loaded
(/usr/lib/systemd/system/pmsensors.service; enabled; vendor preset: disabled)
Active: active (running) since Fri
2023-12-22 09:30:08 MST; 14min ago
Process: 1272577 ExecStartPre=/bin/sh -c if [ -z "ess3500rw6a.test.net" -o "ess3500rw6a.test.net" = "localhost*" ] ;
then /usr/bin/logger "pmsensors: ERROR:
hostname is not set <>
This error is a false positive and can be ignored.
|
After host-side update of a Utility node, the VM doe not start after you reboot it. An error
occurs when shared libraries are loading.
librdmacm.so.1: cannot open shared object file: No such file or directory
|
Install a few RPMs in the Utility Node Host by using the SSR Package:
-
m -rf /etc/yum.repos.d/*
-
mount -o loop
/serv/tmp/code20/payload_OS.iso
/serv/mnt/payload_OS
-
yum localinstall
/serv/mnt/payload_OS/BaseOs/Packages/libibverbs-44.0-2.el9.x86_64.rpm
/serv/mnt/payload_OS/BaseOs/Packages/librdmacm-44.0-2.el9.x86_64.rpm
-
umount /serv/mnt/payload_OS
|
For Utility Node protocol node, the
mmhealth command does not monitor HAL.
|
- Manually copy the gpfs.gnr and gpfs.gnr.base rpms to
the Utility Node and install it.
- In the container, navigate to the following
directory:
cd/install/ess/otherpkgs/rhels9/x86_64/gpfs
-
Transfer RPMs to the Utility Protocol
node.. sftp gpfs.gnr.base-1.0.0-0.x86_64.rpm and gpfs.gnr-5.1.9-1.5.x86_64.rpm
- After the files are copied, install RPMs on the Utility Protocol
node.
ssh root@<Utility Protocol node mgmt IP>
cd /tmp ; yum –y install gpfs.gnr.base-1.0.0-0.x86_64.rpm gpfs.gnr-5.1.9-1.5.x86_64.rpm
|
ipmitool sel elist reports PCI errors for many hours.Product
- IBM Storage Scale System
6000
|
The errors shown are intermittent and recoverable. They are not suppressed by the system yet can
be safely ignored.
No action required.
|
GUI configuration: You are asked to create a second user for GUI on Utility Node even though one
user is created from the Power9 EMS node.
|
Configure a second user with a new name. (co-existence).
While configuring the GUI in the Utility Node, you are asked to create a GUI user. Ensure that
the user does not exist. GUI users are shared across, GUIs which means the user can login with an
existent user in to the new GUI. For more information about GUI configuration, see Utility Node co-existence with POWER9 EMS.
|
All events of pair canister are not surfacing.
Product
- IBM Storage Scale System
6000
|
Other events will point you to the problem.
No known issue
|
An enclosure and fan LEDs are not turning on.
Product
- IBM Storage Scale System
6000
|
The IBM Storage Scale System
6000 LEDs might not turn on as
expected. The s6k_ledmgmt_v03.sh script in the samples directory might be used
to control the LEDS.
No known issue
|
The selinux enable command fails on IBM Storage Scale System
6000 nodes.
Product
- IBM Storage Scale System
6000
- Utility Node
|
Due to changes in Red Hat 9.x, deployment is currently restricts enablement of SELinux by using
the essrun or manually in an ESS environment. This restriction applies only to
RH9.x ESS nodes Utility (EMS/Protocol) and IBM Storage Scale System
6000 only.
This restriction will be removed in an upcoming release.
|
Drive fault LED (amber LED) might not light during replacement of the drive.
Product
- IBM Storage Scale System
6000
|
Determine pdisk physical slot location of the drive, which is being replaced by using
the mmvdisk pdisk list
command.
Example:mmvdisk pdisk list --rg all --replace A sample output is
as follows:recovery group pdisk priority
FRU (type) location
-------------- ------------ -------- -
-------------- --------
rg1 e1s24 9.83
3.84TB NVMe Tie Rack yyyyy U13-16,
Enclosure 5149-F48-78L00XX Drive 24
In this example physical slot locations of pdisks
e1s24 being replaced is in slot 24 of enclosure 78L00XX.
Initiate the drive replacement task
using mmvdisk pdisk replace --prepare and when instructed to physically replace
the drive you can turn on the fault LED using following steps (note that the power green LED of the
drive slot will be off during the replacement activity until the new replaced drive is powered on).
- ssh to one of the canisters of the enclosure, for example, 78L00XX where the drive is located.
- Activate fault LED of the slot using LED management helper script.
/opt/ibm/ess/tools/samples/s6k_ledmgmt_v0
3. sh -e D -i <slot number> -a F where,
- -e
- Is the endpoint selection.
- D
- Is for Drive.
- -i
- Is the element id selection, in this example, it is 24.
- -a
- Is the action to be performed.
- F
- Sets fault LED (turn on fault LED).
- R
- Resets the fault LED (turn off Fault LED).
- Complete the replacement.
mmvdisk pdisk --replace
- After the replacement is complete, turn off the corresponding fault LED by resetting
fault.
/opt/ibm/ess/tools/samples/s6k_ledmgmt_v0
3. sh -e D -i <slot number> -a
|
When you are creating an Utility Node
Protocol VM in an Utility Node system by
using essrun command, you might encounter a shared connection failure.
This failure is seen during execution of the following command during Utility Node Protocol VM
setup: essrun -N <node> cesvm --create --vm-name <vm name>
Product
- Utility Node Protocol
- Utility Node EMS
|
- Run the following sed command in the Utility Node EMS container, and then run the
essrun command:
sed -i '/fatal_error = False/a\
small_setup = False' /opt/ibm/ess/deploy/ansible/roles/kvmCreation/files/hostCheck
- Ensure that four spaces between "a\" and "small_setup".
|
The GUI setup wizard might fail due to an empty drop-down selection.
During the GUI setup, a wizard portion of a newly installed system with an Utility Node EMS you may encounter an empty drop-down
box for the Management Server. If seen, you will not be able to continue the wizard.
Product
- Utility Node and Power9 EMS
|
- Exit the GUI.
- Access the command line on the Utility Node EMS and run the following
command:
mmaddcompspec default --replace
- Re-enter the GUI and complete the wizard setup.
|
When creating the Utility Node Protocol VM
you might encounter the following error:
[ERROR] Failed to create a bridge named on node localhost
|
No solution available in this release. Ignore if a bridge already exists and
continue. |
When attempting to start a VM image you might see the following message
(example): The path /emsvm has 282.49 GB free excluding reserved space
which is less than the required 575 GB
|
This issue occurs, when the VM is operational, it will grow as needed (packages installed etc).
The amount of space the QCOW image consumes might exceed the minimum space EMSVM requires. The
workaround is as follows:
- Determine the name of the VM.
virsh list --all
- Start the VM.
virsh start <EMSVM name from first command>
- Use EMSVM to start the VM, and use the underlying virsh commands.
|
When attempting setup Utility Node Protocol services by using the IBM Storage Scale toolkit you
might see the following
error:
[ INFO ] ERROR! couldn't resolve module/action 'selinux'.
This often indicates a misspelling, missing collection,
or incorrect module path. [ INFO ]
The error appears to be in
'/usr/lpp/mmfs/5.1.9.1/ansible-toolkit/ansible/collections/ansible_collections
/ibm/spectrum_scale/roles/core_prepare/tasks/prepare.yml':
line 3, column 3, but may [ INFO ] be elsewhere in the file depending on the exact
syntax problem.
|
- Install Ansible-collection that is part of the Ansible galaxy suite.
- On the EMSVM run the following
command:
scp ansible-collections.tar.gz utilityVM:/root/.
tar -xvf ansible-collections.tar.gz
ansible-galaxy collection list
To obtain ansible-collections.tar.gz you must contact IBM Service.
|
After deploying Utility Node Protocol VM,
you are instructed to set the IP address of the management bridge by using
ess_ssr_setup script. You might encounter the following issue while executing
this command:
root@esstest-utilitynode2-essvm ~]# ess_ssr_setup 2023-12-07 20:56:28,429
INFO: This is version 122 2023-12-07 20:56:28,451
ERROR: Cannot parse lscpu output Exception happened: 1
|
If you encounter this issue, run the following command before
retrying: sed -i "s/{1}/{2}/g" /opt/ibm/ess/tools/bin/ess_ssr_setup
This
will patch the ess_ssr_setup script and allow the user to re-run the command
successfully. |
The firmware update command in ESS 3500 Hybrid or Capacity models might fail with the following
error:
‘diagnostic results: pass-through os error: Cannot allocate memory.’
|
If this error is seen, update the firmware on each enclosure individually and serially.
- Determine a serial number of each enclosure by issuing the following command:
mmlsfirmware --type storage-enclosure A sample output is as
follows: enclosure firmware available
type product id serial number level firmware location
---- ---------- ------------- -------- -------- --------
enclosure 5141-FN2 78E4007 E11S *E11T essio71-ce Rack ess7 U17-18
enclosure 5147-102 00C2001 4E2A,4E29 *4E2A essio71-ce Rack ess7 U13-16
enclosure 5147-102 00C2002 4E29,4E2A *4E2A essio71-ce Rack ess7 U09-12
enclosure 5147-102 00C2003 4E2A,4E29 *4E2A essio71-ce Rack ess7 U05-08
enclosure 5147-102 00C2004 4E2A,4E29 *4E2A essio71-ce Rack ess7 U01-04
enclosure 5147-102 00C2006 4E2A,4E29 *4E2A essio71-ce Rack ess7 U26-29
enclosure 5147-102 00C2007 4E29,4E2A *4E2A essio71-ce Rack ess7 U22-25
- For every 5147-102 enclosure issue the following
command:
mmchfirmware --type storage-enclosure --serial-number <serial number>
Wait
for the command to complete before moving to the next
enclosure.
Example: mmchfirmware --type storage-enclosure --serial-number 00C2001
- Verify the update.
mmlsfirmware --type storage-enclosure
|
If you update an Utility Node EMS
from 6.1.8.x to 6.1.9.2 and later use a newer version of the emsvm --connect-EMS
command, to connect to your VM you might get an error message and fail to
connect.ERROR: We cannot connect or check of EMSVM-23E because it is not created
|
If this happens you can start your VM by using the standard virsh commands
from the Utility Node host.
- Determine the name of the VM.
virsh list --all
- Start the VM.
virsh start <EMSVM name from first command>
|
When updating the Utility Node EMS host by
using the essrun command, you might encounter a failure when connecting from the
EMS VM to the EMS host.
|
This failure is due to incorrect settings in the connections between the EMS host and the EMS
VM.
Attention: Run separate commands against the EMS host and the EMS VM.
- Utility Node EMS host
- Run the following command in Utility Node
EMS host:
virsh net-edit ess_vm_host_only_network
- Change the following command from the following
command:
<ip address='192.168.110.1' netmask='255.255.255.252'>
<dhcp>
<range start='192.168.110.1' end='192.168.110.2'/>
</dhcp>
</ip> to the following
command:<ip address='10.23.16.1' netmask='255.255.255.248'>
<dhcp>
<range start='10.23.16.1' end='10.23.16.6'/>
</dhcp>
</ip>
- Save and quit by using the :wq command.
- Issue the following command in Utility Node EMS host:
virsh net-destroy ess_vm_host_only_network
- Issue the following command in Utility Node EMS host:
virsh net-start ess_vm_host_only_network
- Utility Node EMS VM
- Issue the following command in Utility Node EMSVM:
nmcli d disconnect ras
- Issue the following command in Utility Node EMSVM:
nmcli c del ras
- If the EMSVM is part of a cluster, verify that your cluster will have sufficient quorum nodes
available when the EMS is shut down. Then, issue the following command in Utility Node
EMSVM:
nmcli d disconnect ras
- Utility Node EMS host
- Issue the following command in the Utility Node EMS
Host:
virsh destroy EMSVM-23E
- Issue the following command in the Utility Node EMS
host:
systemctl start libvirtd
- Issue the following command in the Utility Node EMS
host:
virsh start EMSVM-23E
-
Utility Node EMSVM:
- Issue the following command in Utility Node EMSVM:
nmcli d connect ras
- Issue the following command in Utility Node EMSVM:
ip a show ras
- Verify that the RAS interface looks like similar to this in the Utility Node EMSVM by using the ip a show
ras command:
2: ras: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:da:36:39 brd ff:ff:ff:ff:ff:ff
altname enp0s9
inet 10.23.16.6/29 brd 10.23.16.7 scope global dynamic noprefixroute ras
valid_lft 2377sec preferred_lft 2377sec
inet6 fe80::9b89:47ea:5162:31e/64 scope link noprefixroute
valid_lft forever preferred_lft forever
|
When accessing the Dual-Pane GUI statistics page you might be unable to display data on the
second pane.
Product
- ESS 3000
- ESS 3200
- ESS 3500
- ESS 5000
- Utility Node EMS
- Utility Node Protocol
|
If this encountered, you will see Na:Na:Na at the bottom left portion of the
second pane.
To fix this, clear the cache of your browser, and re-enter the GUI.
You may
also need to select the 5-minute interval while editing the pane. |
The deployment or restarting of the container will fail, if SELinux is enabled in the EMS.
If SELinux enabled in EMS, the container will fail with an error as follows:
Error: failed to mount shm tmpfs
"/var/lib/containers/storage/overlay-containers
/8d4329e232d61495d37ad0b818d74a29262465fc5a3df1323e048ecd182ce806/userdata/shm":
invalid argument
|
To avoid this issue, disable SELinux before starting the deployment container:
-
setenforce 0
-
sed -i "s/SELINUX=.*/SELINUX=disabled/g" /etc/selinux/config
-
systemctl reboot
|
The firmware update for ESS 3000 might fail, if it is coming from an N-2 level.
|
When running mmchfirmware --type storage-enclosure, on ESS 3000 you might see
this error:
BIOS: Fail! Required uas driver module failed to load
This can be fixed by running the following command before the mmchfirmware --type
storage-enclosure
command: insmod /lib/modules/$(uname -r)/kernel/drivers/usb/storage/usb-storage.ko.xz
|
You might encounter false positive 4U102 5 volt regulator failures in the
mmlsenclosure and mmhealth node event log.
It is also
possible that a call home will be generated for the 4U102 5 volt regulator.In the
mmhealth node event
log:
power_supply_failed WARNING Power supply 5vra_id2 is FAILED.
power_supply_failed WARNING Power supply 5vra_id2 is FAILED.
|
No fix available in this release. If this is seen contact support to verify false
positive condition.
|
For the ESS 5000 4u106 enclosures, it might take longer to gather firmware information than in
previous releases.
|
Due to changes in tools the mmlsfirmware command may take longer than expected
when gathering information.
This is to inform that this is normal in this release and not a
reason for concern. |
In the Hardware panel of a system using a utility-node EMS, a node might show up as
Unknown.
Product
- IBM® Storage Scale System Utility
Node
|
If this is detected, do the following:
- Run the mmsysmoncontrol restart command on the nodes showing up as
Unknown.
- Run the mmhealth node show --refresh command on all nodes.
- Check hardware states in the GUI.
|
In a utility node by using Infiniband (IB) high-speed networking, the typical use of
essgennetworks to create the network bond will fail.
Product
- IBM Storage Scale System Utility
Node
|
For IB systems you must use other means to create the network bonds on the Utility-Node EMS. The
following tools will work: nmtui, nmcli.
|
After updating to 6.1.8.1, essinstallcheck may show an error saying that
mmvdisk settings do not match to best practices.
The following configuration parameters are set at the cluster level:
- nsdRAIDDiskPerformanceMinLimitPct
- nsdRAIDDiskPerformanceUpdateInterval
- nsdRAIDEventLogToConsole
- nsdRAIDSSDPerformanceMinLimitPct
|
- To resolve this issue, you need issue following command to remove these 4
configuration:
mmchconfig
nsdRAIDDiskPerformanceMinLimitPct=DELETE,nsdRAIDDi
skPerformanceUpdateInterval=DELETE,nsdRAIDEventLogTo
Console=DELETE,nsdRAIDSSDPerformanceMinLimitPct=DELETE
- Issue server configure --update
again:
mmvdisk server configure --update --node-class <mmvdisknode class> --recycle one
|
Updating storage enclosure firmware on a daisy-chain system might fail by using the following
command: mmchfirmware --type storage-enclosure
Product
- ESS 3500
- ESS 5147-102 daisy-chain
|
When updating enclosure firmware on a daisy-chained ESS 3500 with 5 or more 4U102 enclosures, you
must update each enclosure individually by using the serial number instead of the normal parallel
method.
For example:
- Gather the enclosure serial numbers by using the following
command:
# mmlsenclosure all -N ess3500rw6a-hs A sample output is as
follows:serial number product id firmware level service nodes
------------- ---------- -------------- ------- ------
78E400Q 5141-FN2 E11Q yes ess3500rw6a-hs.test.net
78T2468 5147-102 4E29,4E29 no ess3500rw6a-hs.test.net
78T246A 5147-102 4E2A,4E29 no ess3500rw6a-hs.test.net
78T246C 5147-102 4E29,4E29 no ess3500rw6a-hs.test.net
78T246E 5147-102 4E29,4E29 no ess3500rw6a-hs.test.net
78T254a 5147-102 4E29,4E29 no ess3500rw6a-hs.test.net
- By using this serial number list, update each enclosure
individually.
# mmchfirmware --type storage-enclosure --serial-number <serial number>
|
An error occurs during the precheck portion of an upgrade, if the ESS 3500 canister
contains a mix of Micron 7300 and 7450 boot drives.
[ERROR] Expected quantity of NVME model: Micron_7450_MTFDKBA960TFR is 2,
but only found: 1
[ERROR] Expected quantity of NVME model: Micron_7300_MTFDHBA960TDF is 2,
but only found: 1
|
This is an acceptable configuration and the error can be ignored. |
Because of certain conditions such as after a cable failure or a storage firmware update,
the ESS 3500 4U102 enclosures might show a queue_depth other than the expected queue_depth of
1.
|
No action is required if the queue_depth is 2 or less. However, if queue_depth is > 2, the
following script can be used to update the parameter properly from each I/O node that exhibits the
issue: #! /bin/bash
# fix_queue_depth for ess4u102
lsscsi -g | awk '/5147-102/{print substr($7,6)}'| while read line
do
echo 1 > /sys/class/scsi_generic/${line}/device/queue_depth
done
exit
Run essstoragequickcheck from each I/O node to verify the
results.
|
Using this to develop a known issue for serializing mmchfirmware updates in a
daisy-chain system.
|
When updating enclosure firmware on a daisy-chained ESS 3500 with five or more 4u102 enclosures,
you must update each enclosure individually by using a serial number instead of the normal parallel
method.
Example:
Gather the enclosure serial numbers by using the mmlsenclosure all
command. [root@ems9 ~]# mmlsenclosure all -N ess3500rw6a-hs
needs
serial number product id firmware level service nodes
------------- ---------- -------------- ------- ------
78E400Q 5141-FN2 E11Q yes ess3500rw6a-hs.test.net
78T2468 5147-102 4E29,4E29 no ess3500rw6a-hs.test.net
78T246A 5147-102 4E2A,4E29 no ess3500rw6a-hs.test.net
78T246C 5147-102 4E29,4E29 no ess3500rw6a-hs.test.net
78T246E 5147-102 4E29,4E29 no ess3500rw6a-hs.test.net
78T254a 5147-102 4E29,4E29 no ess3500rw6a-hs.test.net
[root@ems9 ~]#
By using this serial number list, update each enclosure
individually: mmchfirmware --type storage-enclosure --serial-number <serial number>
|
Enablement of IPv6 for RoCE by using Service Network includes problematic entry in
/etc/sysconfig/networkscripts/ifcfg-bond-bond0.
Product
- ESS 3200
- ESS 3500
- ESS 5147-102
- ESS 5000
|
- Create a bond by using essgennetworks utility.
- Remove the ‘comment out’ line from the
/etc/sysconf/network-scripts/ifcfg-bond-bond0 file of the node.
- Run nmcli con reload.
- Run nmcli con down bond-bond0; nmcli con up bond-bond0.
- Run all other essgennetworks commands to enable RoCE as per
procedure.
Note: With this workaround the last RoCE enablement command should work as
designed.
Example:essgennetworks -N essio51 --suffix=-ce --interface
enP48p1s0f0,enP48p1s0f1 --bond bond0 --enableRoCE --mtu
9000
[INFO] Starting network generation...
[INFO] nodelist: essio51
[INFO] suffix used for network hostname:-ce
[INFO] Interface(s) available on node essio51-ce
[INFO] Considered interface(s) of node essio51-ce are
['enP48p1s0f0', 'enP48p1s0f1', 'bond0'] with RDMA Port
['mlx5_2', 'mlx5_3', 'mlx5_bond_0'] for this operation
[INFO] Supported Mellanox RoCE card found at node
essio51
[INFO] Supported version of Mellanox OFED found at node
essio51-ce
[INFO] Bond validation passed and found bonds bond0 has
been created using same physical network adapter at node
essio51-ce
[INFO] Bond MTU validation passed and found bonds MTU
set to 9000 at node essio51-ce
[INFO] Interface bond0 have the IPv4 Address assigned at
node essio51-ce
[INFO] Interface bond0 have the IPv6 Address assigned at
node essio51-ce
[INFO] Interface MTU also set to 9000 at node essio51-ce
[INFO] Interface enP48p1s0f0 have the IPv4 Address
assigned at node essio51-ce
[INFO] Interface enP48p1s0f1 have the IPv4 Address
ESS3500
5147-102
ESS3200
ESS5000
Page 2 of 9
assigned at node essio51-ce
[INFO] Interface enP48p1s0f0 have the IPv6 Address
assigned at node essio51-ce
[INFO] Interface enP48p1s0f1 have the IPv6 Address
assigned at node essio51-ce
[INFO] Enabling RDMA for Ports ['mlx5_bond_0', 'mlx5_2',
'mlx5_3']
[INFO] Enabled RDMA i.e. RoCE using bond bond0
[INFO] Enabled RDMA i.e. RoCE using interfaces
enP48p1s0f0,enP48p1s0f1
[INFO] Please recycle the GPFS daemon on those nodes
where RoCE has been enabled.
|
During the GUI setup in a dual-EMS environment, the backup EMS is shown twice on location
specification.Product
- ESS 3200
- ESS 3500
- ESS 5147-102
- ESS 5000
|
- During GUI wizard setup or during GUI edit components, you will be prompted twice for specifying
rack location of the backup EMS. Do the following steps to resolve this issue:
- During the first prompt for the rack location of the management servers, leave the location as
black for the backup EMS.
- Click Next to go to the panel specifying the location for other nodes and
choose the rack location for the backup EMS in this panel.
- During the GUI wizard setup or during GUI edit components, when actions are run via the GUI
after users specify the rack locations, the GUI action could fail due to the backup EMS being added
twice into the component database. Do the following steps to resolve this issue:
- Ignore the error and select click Close button to close the window.
-
Click Finish to continue.
|
The mmhealth command does not report failed cable (cable missing)
between ESS 3500 HBA and 5147-102 IOM.
|
If SAS cable is pulled between HBA adapter and IOM of enclosure, the mmhealth node
show command will not flag any error. Do the following steps to resolve this issue:
- Monitor the mmfs.log.latest file.
- If any suspicious errors are printed to indicating disk paths and/or IOM are missed, then run
mmgetpdisktopology and pass the output to topsummary to find out which path(s)
are missed.
|
During Dual EMS GUI setup, the backup EMS is no longer shown in the hardware
panel.Product
- ESS 3200
- ESS 3500
- ESS 5147-102
- ESS 5000
|
- Run the mmlscomp command from EMS to find the component ID for the backup
EMS.
- Run the mmdelcomp <componend_id> command (component_id is the component ID
obtained from above step 1.)
- Return to the GUI Hardware panel and click Edit Rack
Components, which is on left-top of the server list on the right side.
-
Choose Yes, discover new servers and enclosures first. This takes many
minutes.
-
Click Next to follow the screen prompt to complete Hardware Components
setup.
|
Call home deployment will fail if gpfsadmin username is created/exists as
part of SUDO enablement before the call home deployment.
|
- If SUDO is already configured before deploying esscallhomeconf, then the workaround is to:
- Disable SUDO (in relevant nodes where it is enabled).
- Deploy configuration of Callhome (in a default root mode).
- If SUDO is not enabled yet, then the recommendation is to:
- Deploy Callhome first (Can be achieved through GUI setup or manually via esscallhomeconf. For
more information, see ESS documentation.
- Enable SUDO on the cluster node(s).
|
For RoCE enablement, the bond interface creation may have problematic entry in
/etc/sysconf/networkscripts/ifcfg-bond-bond0. The identified parameter is:
IPV6_ADDR_GEN_MODE.
|
- Create bond using essgennetworks utility.
- Remove the ‘comment out’ line in the
/etc/sysconf/network-scripts/ifcfg-bond-bond0 file of the spcific node.
- Run nmcli con reload.
- Run nmcli con down bond-bond0; nmcli con up bondbond0.
- Run all other essgennetworks commands to enable RoCE as per procedure.
Note: With this workaround
the last RoCE enablement command should work as
designed.
Example:essgennetworks -N essio51 --suffix=-ce --interface
enP48p1s0f0,enP48p1s0f1 --bond bond0 --enableRoCE
--mtu 9000
[INFO] Starting network generation...
[INFO] nodelist: essio51
[INFO] suffix used for network hostname:-ce
[INFO] Interface(s) available on node essio51-ce
[INFO] Considered interface(s) of node essio51-ce are
['enP48p1s0f0', 'enP48p1s0f1', 'bond0'] with RDMA Port
['mlx5_2', 'mlx5_3', 'mlx5_bond_0'] for this operation
[INFO] Supported Mellanox RoCE card found at node
essio51
[INFO] Supported version of Mellanox OFED found at
node essio51-ce
[INFO] Bond validation passed and found bonds bond0
has been created using same physical network adapter
at node essio51-ce
[INFO] Bond MTU validation passed and found bonds
MTU set to 9000 at node essio51-ce
[INFO] Interface bond0 have the IPv4 Address
assigned at node essio51-ce
[INFO] Interface bond0 have the IPv6 Address
assigned at node essio51-ce
[INFO] Interface MTU also set to 9000 at node essio51-
ce
[INFO] Interface enP48p1s0f0 have the IPv4 Address
assigned at node essio51-ce
[INFO] Interface enP48p1s0f1 have the IPv4 Address
assigned at node essio51-ce
ESS5000
Page 5 of 9
[INFO] Interface enP48p1s0f0 have the IPv6 Address
assigned at node essio51-ce
[INFO] Interface enP48p1s0f1 have the IPv6 Address
assigned at node essio51-ce
[INFO] Enabling RDMA for Ports ['mlx5_bond_0',
'mlx5_2','mlx5_3']
[INFO] Enabled RDMA i.e. RoCE using bond bond0
[INFO] Enabled RDMA i.e. RoCE using interfaces
enP48p1s0f0,enP48p1s0f1
[INFO] Please recycle the GPFS daemon on those
nodes where RoCE has been enabled.
|
The mmvdisk sed enroll command might proceed when it is issued after
creating user vdisk sets instead of blocking it.
|
- Issue the mmvdisk sed enroll command after creating a recovery group and
before creating user vdisk sets.
- Contact IBM Support, if you issued the mmvdisk sed enroll command after
creating user vdisk sets.
|
Occasionally, some drive paths on a few drives in an ESS 3500 enclosure were found missing,
when the essfindmissingdisks command was issued.
# essfindmissingdisks
[INFO] Start find missing disk paths
[INFO] nodelist: localhost
[INFO] May take long time to complete search of all drive paths
[INFO] Checking node: localhost
[INFO] Checking missing disk paths from node localhost
[INFO] GNR server: name ess3500a-hs.test.net arch
x86_64 model ESS3500-5141-FN2 serial 78E400XA
[INFO] Enclosure 78E40XA sees 22 disks (22 SSDs, 0 HDDs)
[INFO] Enclosure 78T254A sees 102 disks (0 SSDs, 102 HDDs)
[INFO] Enclosure 78T246A sees 102 disks (0 SSDs, 102 HDDs)
[ERROR] GNR server disk topology: ESS 3500 H2
(2 HBA 24 NVMe 2 Full 4U102) (match: 96/100)
[INFO] GNR configuration: 3 enclosures, 22 SSDs, 2 empty slots,
220 disks total, 0 NVRAM partitions
[ERROR] Location 78E40XA-1 appears empty but should have an SSD
[ERROR] Location 78E40XA-2 appears empty but should have an SSD
[[ERROR] essfindmissingdisks detected error in system.
Please review output carefully.
[root@ess3500a ~]#
This error indicates that one path to the drive might be in a
‘stuck-state’ condition whereas other path is healthy.
|
Run the following script from the primary
canister: root@ess3500a ~]# /opt/ibm/ess/tools/samples/fix_stuck_drive_slots.sh
|
The ESS system reboots unexpected because mpt3sas messages fill logs.
The following error appears:
System crashed with 'swiotlb buffer is full' then 'scsi_dma_map failed' errors
|
To resolve this error, go to the Red Hat known issues. .44 mpt3sas driver will be available
in ESS 6.1.8.
|
Customer may encounter false positive intermittent fan module failures in /var/log/messages. It
is also possible that a call home will be generated for each fan module.
Errors typically seen: mmsysmon[7819]: [W] Event raised: Fan
fan_module1_id4 has a fault.
mmsysmon[7819]: [W] Event raised: Fan
fan_module1_id4 state is FAILED.
|
If this is seen contact IBM support to verify false positive condition.
|
Running essrun ONLINE update might fail on the mmchfirmware -N
localhost --type drive section.
|
Manually issue the mmchfirmware after the deployment. |
In ESS 5000, if using Y-Type cables (used in HDR switches), for running high-speed network, the
mmhealth node show -N ioNode1-ib,ioNode2-ib NETWORK might show
ib_rdma_port_width_low(mlx5_0/1, mlx5_1/1, mlx5_4/1) .
|
- Check existing anomaly in HDR-Y cables.
- Contact IBM Support for help to update /usr/lpp/mmfs
/lib/mmsysmon/NetworkService.py and
/usr/lpp/mmfs/lib/mmsysmon/network.json with appropriate code.
- After patching, restart mmsysmon to apply changes. Example:
systemctl restart mmsysmon.
-
Issue the mmhealth command to verify whether the condition is alleviated.
|
When EMS is updated from previous releases to ESS 6.1.5.0, setting up SELinux on EMS by issuing
the essrun selinux enable command in a container fails. The following error
appears: Failed to resolve typeattributeset statement
at /var/lib/selinux/targeted/tmp/modules/400/pcpupstream/cil:42
The issue may be related to a bug in Red Hat kernel 8.6
Product
- ESS 3000
- ESS 5000
- ESS 3200
- ESS 3500
- ESS 3500 (4u102)
|
- Reboot EMS.
- Restart the container.
- Ensure that selinux-policy is up-to-date by issuing the yum update
selinux-policy command.
- Reinstall pcp-selinux by issuing the yum reinstall pcp-selinux command.
|
When creating additional file systems in a tiered storage environment you might encounter a
MIGRATION callback error.
mmaddcallback: Callback identifier "MIGRATION"
already exists or was specified multiple times.
If a callback exists, file system
creation will fail.
Product
- ESS 3000
- ESS 3200
- ESS 5000
- ESS 3500
- ESS 3500 (4u102)
|
Delete the callback and create the file system again.
|