The POWER8 firmware level that is included in the container does not match to the level that is
verified in essinstallcheck. essinstallcheck verifies whether
the POWER8 firmware is updated to the latest recommended level. The following error was found:
ERROR] System Firmware: found FW860.B0 (SV860_240) expected min FW860.B3 (SV860_245)
The .img file that is included in the container does not bring systems to this
level. The level in the container is: 01SV860_243_165.
|
|
All MD devices do not stop when the utility node is rebooted to stop them.Product
- IBM® Storage Scale System Utility
Node
![End of change](./deltaend.gif) |
Power cycle the node by using the ipmitool command.![End of change](./deltaend.gif) |
In the Hardware panel of a system using a utility-node EMS, a node might show up as
Unknown.
Product
- IBM Storage Scale System Utility
Node
![End of change](./deltaend.gif) |
If this is detected, do the following:
- Run the mmsysmoncontrol restart command on the nodes showing up as
Unknown.
- Run the mmhealth node show --refresh command on all nodes.
- Check hardware states in the GUI.
![End of change](./deltaend.gif) |
In a utility node by using Infiniband (IB) high-speed networking, the typical use of
essgennetworks to create the network bond will fail.
Product
- IBM Storage Scale System Utility
Node
|
For IB systems you must use other means to create the network bonds on the Utility-Node EMS. The
following tools will work: nmtui, nmcli.
|
After updating to 6.1.8.3, essinstallcheck may show an error saying that
mmvdisk settings do not match to best practices.
The following configuration parameters are set at the cluster level:
- nsdRAIDDiskPerformanceMinLimitPct
- nsdRAIDDiskPerformanceUpdateInterval
- nsdRAIDEventLogToConsole
- nsdRAIDSSDPerformanceMinLimitPct
|
- To resolve this issue, you need issue following command to remove these 4
configuration:
mmchconfig
nsdRAIDDiskPerformanceMinLimitPct=DELETE,nsdRAIDDi
skPerformanceUpdateInterval=DELETE,nsdRAIDEventLogTo
Console=DELETE,nsdRAIDSSDPerformanceMinLimitPct=DELETE
- Issue server configure --update
again:
mmvdisk server configure --update --node-class <mmvdisknode class> --recycle one
|
Updating storage enclosure firmware on a daisy-chain system might fail by using the following
command: mmchfirmware --type storage-enclosure
Product
- ESS 3500
- ESS 5147-102 daisy-chain
|
When updating enclosure firmware on a daisy-chained ESS 3500 with 5 or more 4U102 enclosures, you
must update each enclosure individually by using the serial number instead of the normal parallel
method.
For example:
- Gather the enclosure serial numbers by using the following
command:
# mmlsenclosure all -N ess3500rw6a-hs A sample output is as
follows:serial number product id firmware level service nodes
------------- ---------- -------------- ------- ------
78E400Q 5141-FN2 E11Q yes ess3500rw6a-hs.test.net
78T2468 5147-102 4E29,4E29 no ess3500rw6a-hs.test.net
78T246A 5147-102 4E2A,4E29 no ess3500rw6a-hs.test.net
78T246C 5147-102 4E29,4E29 no ess3500rw6a-hs.test.net
78T246E 5147-102 4E29,4E29 no ess3500rw6a-hs.test.net
78T254a 5147-102 4E29,4E29 no ess3500rw6a-hs.test.net
- By using this serial number list, update each enclosure
individually.
# mmchfirmware --type storage-enclosure --serial-number <serial number>
|
An error occurs during the precheck portion of an upgrade, if the ESS 3500 canister
contains a mix of Micron 7300 and 7450 boot drives.
[ERROR] Expected quantity of NVME model: Micron_7450_MTFDKBA960TFR is 2, but only found: 1
[ERROR] Expected quantity of NVME model: Micron_7300_MTFDHBA960TDF is 2, but only found: 1
![End of change](./deltaend.gif) |
This is an acceptable configuration and the error can be ignored.![End of change](./deltaend.gif) |
In a Bring Your Own EMS (BYOE) environment, when you enter the container EMS value,
contradictory statements are shown.
|
You will be asked to enter a ‘resolvable’ container short name in the BYOE environment. However,
this fails if it is resolvable in the /etc/hosts file. In the BYOE environment,
you must remove the resolvable name from the /etc/hosts file or use a different
name that is not resolvable for the command to complete successfully.
An example of failure:
EMS hostname must be the Management network (also called Xcat). Other networks can be aliases (A) or canonical names (CNAME) on DNS and/or on hosts file.
Is the current EMS FQDN tucbyoe1vm.test.net correct? (y/n):y
Remember NOT to add the domain name test.net to the input Please type the desired and resolvable container short hostname [changeme] : cemsbyo
2023-06-14 21:23:20,845 ERROR: The container short name cemsbyo resolves to IP address 10.88.0.5
2023-06-14 21:23:20,845 ERROR: The container long name cemsbyo.test.net resolves to IP address 10.88.0.5 2023-06-14 21:23:20,845 ERROR: Container name can be resolved. This is not supported in this EMS. Do not add the container to /etc/hosts nor DNS, and try again. Or use a different container name that cannot be resolved.
|
Because of certain conditions such as after a cable failure or a storage firmware update, the
ESS 3500 4U102 enclosures might show a queue_depth other than the expected queue_depth of
1.
![End of change](./deltaend.gif) |
No action is required if the queue_depth is 2 or less. However, if queue_depth is > 2, the
following script can be used to update the parameter properly from each I/O node that exhibits the
issue: #! /bin/bash
# fix_queue_depth for ess4u102
lsscsi -g | awk '/5147-102/{print substr($7,6)}'| while read line
do
echo 1 > /sys/class/scsi_generic/${line}/device/queue_depth
done
exit
Run essstoragequickcheck from each I/O node to verify the
results.
![End of change](./deltaend.gif) |
Enablement of IPv6 for RoCE by using Service Network includes problematic entry in
/etc/sysconfig/networkscripts/ifcfg-bond-bond0.
Product
- ESS 3200
- ESS 3500
- ESS 5147-102
- ESS 5000
|
- Create a bond by using essgennetworks utility.
- Remove the ‘comment out’ line from the
/etc/sysconf/network-scripts/ifcfg-bond-bond0 file of the node.
- Run nmcli con reload.
- Run nmcli con down bond-bond0; nmcli con up bond-bond0.
- Run all other essgennetworks commands to enable RoCE as per
procedure.
Note: With this workaround the last RoCE enablement command should work as
designed.
Example:essgennetworks -N essio51 --suffix=-ce --interface
enP48p1s0f0,enP48p1s0f1 --bond bond0 --enableRoCE --mtu
9000
[INFO] Starting network generation...
[INFO] nodelist: essio51
[INFO] suffix used for network hostname:-ce
[INFO] Interface(s) available on node essio51-ce
[INFO] Considered interface(s) of node essio51-ce are
['enP48p1s0f0', 'enP48p1s0f1', 'bond0'] with RDMA Port
['mlx5_2', 'mlx5_3', 'mlx5_bond_0'] for this operation
[INFO] Supported Mellanox RoCE card found at node
essio51
[INFO] Supported version of Mellanox OFED found at node
essio51-ce
[INFO] Bond validation passed and found bonds bond0 has
been created using same physical network adapter at node
essio51-ce
[INFO] Bond MTU validation passed and found bonds MTU
set to 9000 at node essio51-ce
[INFO] Interface bond0 have the IPv4 Address assigned at
node essio51-ce
[INFO] Interface bond0 have the IPv6 Address assigned at
node essio51-ce
[INFO] Interface MTU also set to 9000 at node essio51-ce
[INFO] Interface enP48p1s0f0 have the IPv4 Address
assigned at node essio51-ce
[INFO] Interface enP48p1s0f1 have the IPv4 Address
ESS3500
5147-102
ESS3200
ESS5000
Page 2 of 9
assigned at node essio51-ce
[INFO] Interface enP48p1s0f0 have the IPv6 Address
assigned at node essio51-ce
[INFO] Interface enP48p1s0f1 have the IPv6 Address
assigned at node essio51-ce
[INFO] Enabling RDMA for Ports ['mlx5_bond_0', 'mlx5_2',
'mlx5_3']
[INFO] Enabled RDMA i.e. RoCE using bond bond0
[INFO] Enabled RDMA i.e. RoCE using interfaces
enP48p1s0f0,enP48p1s0f1
[INFO] Please recycle the GPFS daemon on those nodes
where RoCE has been enabled.
|
During the GUI setup in a dual-EMS environment, the backup EMS is shown twice on location
specification.Product
- ESS 3200
- ESS 3500
- ESS 5147-102
- ESS 5000
|
- During GUI wizard setup or during GUI edit components, you will be prompted twice for specifying
rack location of the backup EMS. Do the following steps to resolve this issue:
- During the first prompt for the rack location of the management servers, leave the location as
black for the backup EMS.
- Click Next to go to the panel specifying the location for other nodes and
choose the rack location for the backup EMS in this panel.
- During the GUI wizard setup or during GUI edit components, when actions are run via the GUI
after users specify the rack locations, the GUI action could fail due to the backup EMS being added
twice into the component database. Do the following steps to resolve this issue:
- Ignore the error and select click Close button to close the window.
-
Click Finish to continue.
|
The mmhealth command does not report failed cable (cable missing)
between ESS 3500 HBA and 5147-102 IOM.
|
If SAS cable is pulled between HBA adapter and IOM of enclosure, the mmhealth node
show command will not flag any error. Do the following steps to resolve this issue:
- Monitor the mmfs.log.latest file.
- If any suspicious errors are printed to indicating disk paths and/or IOM are missed, then run
mmgetpdisktopology and pass the output to topsummary to find out which path(s)
are missed.
|
During Dual EMS GUI setup, the backup EMS is no longer shown in the hardware
panel.Product
- ESS 3200
- ESS 3500
- ESS 5147-102
- ESS 5000
|
- Run the mmlscomp command from EMS to find the component ID for the backup
EMS.
- Run the mmdelcomp <componend_id> command (component_id is the component ID
obtained from above step 1.)
- Return to the GUI Hardware panel and click Edit Rack
Components, which is on left-top of the server list on the right side.
-
Choose Yes, discover new servers and enclosures first. This takes many
minutes.
-
Click Next to follow the screen prompt to complete Hardware Components
setup.
|
Call home deployment will fail if gpfsadmin username is created/exists as
part of SUDO enablement before the call home deployment.
|
- If SUDO is already configured before deploying esscallhomeconf, then the workaround is to:
- Disable SUDO (in relevant nodes where it is enabled).
- Deploy configuration of Callhome (in a default root mode).
- If SUDO is not enabled yet, then the recommendation is to:
- Deploy Callhome first (Can be achieved through GUI setup or manually via esscallhomeconf. For
more information, see ESS documentation.
- Enable SUDO on the cluster node(s).
|
For RoCE enablement, the bond interface creation may have problematic entry in
/etc/sysconf/networkscripts/ifcfg-bond-bond0. The identified parameter is:
IPV6_ADDR_GEN_MODE.
|
- Create bond using essgennetworks utility.
- Remove the ‘comment out’ line in the
/etc/sysconf/network-scripts/ifcfg-bond-bond0 file of the spcific node.
- Run nmcli con reload.
- Run nmcli con down bond-bond0; nmcli con up bondbond0.
- Run all other essgennetworks commands to enable RoCE as per procedure.
Note: With this workaround
the last RoCE enablement command should work as
designed.
Example:essgennetworks -N essio51 --suffix=-ce --interface
enP48p1s0f0,enP48p1s0f1 --bond bond0 --enableRoCE
--mtu 9000
[INFO] Starting network generation...
[INFO] nodelist: essio51
[INFO] suffix used for network hostname:-ce
[INFO] Interface(s) available on node essio51-ce
[INFO] Considered interface(s) of node essio51-ce are
['enP48p1s0f0', 'enP48p1s0f1', 'bond0'] with RDMA Port
['mlx5_2', 'mlx5_3', 'mlx5_bond_0'] for this operation
[INFO] Supported Mellanox RoCE card found at node
essio51
[INFO] Supported version of Mellanox OFED found at
node essio51-ce
[INFO] Bond validation passed and found bonds bond0
has been created using same physical network adapter
at node essio51-ce
[INFO] Bond MTU validation passed and found bonds
MTU set to 9000 at node essio51-ce
[INFO] Interface bond0 have the IPv4 Address
assigned at node essio51-ce
[INFO] Interface bond0 have the IPv6 Address
assigned at node essio51-ce
[INFO] Interface MTU also set to 9000 at node essio51-
ce
[INFO] Interface enP48p1s0f0 have the IPv4 Address
assigned at node essio51-ce
[INFO] Interface enP48p1s0f1 have the IPv4 Address
assigned at node essio51-ce
ESS5000
Page 5 of 9
[INFO] Interface enP48p1s0f0 have the IPv6 Address
assigned at node essio51-ce
[INFO] Interface enP48p1s0f1 have the IPv6 Address
assigned at node essio51-ce
[INFO] Enabling RDMA for Ports ['mlx5_bond_0',
'mlx5_2','mlx5_3']
[INFO] Enabled RDMA i.e. RoCE using bond bond0
[INFO] Enabled RDMA i.e. RoCE using interfaces
enP48p1s0f0,enP48p1s0f1
[INFO] Please recycle the GPFS daemon on those
nodes where RoCE has been enabled.
|
The mmvdisk sed enroll command might proceed when it is issued after
creating user vdisk sets instead of blocking it.
![End of change](./deltaend.gif) |
- Issue the mmvdisk sed enroll command after creating a recovery group and
before creating user vdisk sets.
- Contact IBM Support, if you issued the mmvdisk sed enroll command after
creating user vdisk sets.
![End of change](./deltaend.gif) |
Occasionally, some drive paths on a few drives in an ESS 3500 enclosure were found missing,
when the essfindmissingdisks command was issued.
# essfindmissingdisks
[INFO] Start find missing disk paths
[INFO] nodelist: localhost
[INFO] May take long time to complete search of all drive paths
[INFO] Checking node: localhost
[INFO] Checking missing disk paths from node localhost
[INFO] GNR server: name ess3500a-hs.test.net arch x86_64 model ESS3500-5141-FN2 serial 78E400XA
[INFO] Enclosure 78E40XA sees 22 disks (22 SSDs, 0 HDDs)
[INFO] Enclosure 78T254A sees 102 disks (0 SSDs, 102 HDDs)
[INFO] Enclosure 78T246A sees 102 disks (0 SSDs, 102 HDDs)
[ERROR] GNR server disk topology: ESS 3500 H2 (2 HBA 24 NVMe 2 Full 4U102) (match: 96/100)
[INFO] GNR configuration: 3 enclosures, 22 SSDs, 2 empty slots, 220 disks total, 0 NVRAM partitions
[ERROR] Location 78E40XA-1 appears empty but should have an SSD
[ERROR] Location 78E40XA-2 appears empty but should have an SSD
[[ERROR] essfindmissingdisks detected error in system. Please review output carefully.
[root@ess3500a ~]#
This error indicates that one path to the drive might be in a
‘stuck-state’ condition whereas other path is healthy.
![End of change](./deltaend.gif) |
Run the following script from the primary
canister: root@ess3500a ~]# /opt/ibm/ess/tools/samples/fix_stuck_drive_slots.sh
![End of change](./deltaend.gif) |
The ESS system reboots unexpected because mpt3sas messages fill logs.
The following error appears:
System crashed with 'swiotlb buffer is full' then 'scsi_dma_map failed' errors
![End of change](./deltaend.gif) |
To resolve this error, go to the Red Hat known issues..44 mpt3sas driver will be available
in ESS 6.1.8.
![End of change](./deltaend.gif) |
Customer may encounter false positive intermittent fan module failures in /var/log/messages. It
is also possible that a call home will be generated for each fan module.
Errors typically seen: mmsysmon[7819]: [W] Event raised: Fan
fan_module1_id4 has a fault.
mmsysmon[7819]: [W] Event raised: Fan
fan_module1_id4 state is FAILED.
|
If this is seen contact IBM support to verify false positive condition.
|
Running essrun ONLINE update might fail on the mmchfirmware -N
localhost --type drive section.
![End of change](./deltaend.gif) |
Manually issue the mmchfirmware after the deployment.![End of change](./deltaend.gif) |
In ESS 5000, if using Y-Type cables (used in HDR switches), for running high-speed network, the
mmhealth node show -N ioNode1-ib,ioNode2-ib NETWORK might show
ib_rdma_port_width_low(mlx5_0/1, mlx5_1/1, mlx5_4/1) .
![End of change](./deltaend.gif) |
- Check existing anomaly in HDR-Y cables.
- Contact IBM Support for help to update /usr/lpp/mmfs
/lib/mmsysmon/NetworkService.py and
/usr/lpp/mmfs/lib/mmsysmon/network.json with appropriate code.
- After patching, restart mmsysmon to apply changes. Example:
systemctl restart mmsysmon.
-
Issue the mmhealth command to verify whether the condition is alleviated.
![End of change](./deltaend.gif) |
When EMS is updated from previous releases to ESS 6.1.5.0, setting up SELlinux on EMS by issuing
the essrun selinux enable command in a container fails. The following error
appears: Failed to resolve typeattributeset statement at /var/lib/selinux/targeted/tmp/modules/400/pcpupstream/cil:42 The issue may be related to a bug in Red Hat kernel 8.6
Product
- ESS legacy
- ESS 3000
- ESS 5000
- ESS 3200
- ESS 3500
- ESS 3500 (4u102)
![End of change](./deltaend.gif) |
- Reboot EMS.
- Restart the container.
- Ensure that selinux-policy is up-to-date by issuing the yum update
selinux-policy command.
- Reinstall pcp-selinux by issuing the yum reinstall pcp-selinux command.
![End of change](./deltaend.gif) |
When creating additional file systems in a tiered storage environment you might encounter a
MIGRATION callback error.
mmaddcallback: Callback identifier "MIGRATION" already exists or was specified multiple times.
If
a callback exists, file system creation will fail.
Product
- ESS 3000
- ESS 3200
- ESS 5000
- ESS 3500
- ESS 3500 (4u102)
![End of change](./deltaend.gif) |
Delete the callback and create the file system again.
![End of change](./deltaend.gif) |