ESS known issues

Known issues in ESS

For information about ESS 5.3.7.x known issues, see Known issues in ESS 5.3.7.x Quick Deployment Guide.

The following table describes the known issues in IBM Elastic Storage® System (ESS) and how to resolve these issues.
Issue Resolution or action
Start of changeThe mmvdisk sed enroll command might proceed when it is issued after creating user vdisk sets instead of blocking it.
Product
  • ESS 3500
End of change
Start of change
  • Issue the mmvdisk sed enroll command after creating a recovery group and before creating user vdisk sets.
  • Contact IBM Support, if you issued the mmvdisk sed enroll command after creating user vdisk sets.
End of change
Start of changeThe following false positive error occurs in mmhealth for the boot_drive_endurance_unknown event during periods of high stress on the ESS 3500 system.
NATIVE_RAID     HEALTHY       1 hour ago        [bootdrive_endurance_unknown](Canister2:Boot1, Canister2:Boot2)
Product
  • ESS 3500 (4u102)
End of change
Start of change

This error resolves automatically. If it is not resolved, issue the mmsysmoncontrol restart command to temporarily resolve this error.

End of change
Start of changeOccasionally, some drive paths on a few drives in an ESS 3500 enclosure were found missing, when the essfindmissingdisks command was issued.
# essfindmissingdisks
[INFO]  Start find missing disk paths
[INFO]  nodelist: localhost
[INFO]  May take long time to complete search of all drive paths
[INFO]  Checking node:  localhost
[INFO]  Checking missing disk paths from node localhost
[INFO]  GNR server: name ess3500a-hs.test.net arch x86_64 model ESS3500-5141-FN2 serial 78E400XA
[INFO]  Enclosure 78E40XA sees 22 disks (22 SSDs, 0 HDDs)
[INFO]  Enclosure 78T254A sees 102 disks (0 SSDs, 102 HDDs)
[INFO]  Enclosure 78T246A sees 102 disks (0 SSDs, 102 HDDs)
[ERROR] GNR server disk topology: ESS 3500 H2 (2 HBA 24 NVMe 2 Full 4U102) (match: 96/100)
[INFO]  GNR configuration: 3 enclosures, 22 SSDs, 2 empty slots, 220 disks total, 0 NVRAM partitions
[ERROR] Location 78E40XA-1 appears empty but should have an SSD
[ERROR] Location 78E40XA-2 appears empty but should have an SSD
[[ERROR] essfindmissingdisks detected error in system. Please review output carefully.
[root@ess3500a ~]#

This error indicates that one path to the drive might be in a ‘stuck-state’ condition whereas other path is healthy.

Product
  • ESS 3200
  • ESS 3500
End of change
Start of change
Run the following script from the primary canister:
root@ess3500a ~]# /opt/ibm/ess/tools/samples/fix_stuck_drive_slots.sh
End of change
Start of change

The ESS system reboots unexpected because mpt3sas messages fill logs.

The following error appears:
System crashed with 'swiotlb buffer is full' then 'scsi_dma_map failed' errors
Product
  • ESS 3500
End of change
Start of changeTo resolve this error, go to the Red Hat known issues.End of change
Start of changeWhen you use GUI to unmount a file system on all nodes of the home cluster and the remote cluster, GUI unmounts only all_remote nodes. The file system will still be mounted on server nodes of the cluster when it is selected to unmount all nodes of the cluster and all_remote nodes.

However, the file system is not unmounted from both cluster nodes and remote cluster nodes. It is unmounted only from remote nodes.

Product
  • ESS 3500
End of change
Start of changeSelect unmount of file system one option at a time although GUI allows selection of multiple unmount options. End of change
Start of change

When creating a file system by using GUI, Declustered Array is not visible because of horizontal scrolling.

Product
  • ESS 3500
End of change
Start of changeMove cursor over the DA area and forcefully scroll the screen left or right, then the scroll bar appears. However, if the cursor is moved back to the Back button or the Next button, the scroll bar disappears.End of change
Start of change
False positive intermittent fan module failures occurred in /var/log/messages. The call home might generate a service request for each fan module. The following errors appear:
mmsysmon[7819]: [W] Event raised: Fan fan_module1_id4 has a fault.
mmsysmon[7819]: [W] Event raised: Fan fan_module1_id4 state is FAILED.
Product
  • ESS 3500
End of change
Start of changeNo fix available. Contact IBM Support to verify false positive condition.End of change
Start of change
After GPFS upgrades, the queue depth is 64 instead of 1 in storage quick check. The following errors appear during the Storage Quick Check:
[ERROR] Enclosure /dev/sg103 contains queue_depth=2, must be queue_depth=1
 [ERROR] Enclosure /dev/sg6 contains queue_depth=2, must be queue_depth=1
Product
  • ESS 3500
End of change
Start of changeReboot each ESS 3500 canister one more time so that the queue_depth is set to 1 as per the udev rule.End of change
Start of changeFor ESS 5000, running essrun ONLINE update might fail on the mmchfirmware -N localhost --type drive section.
Product
  • ESS 5000
End of change
Start of changeManually issue the mmchfirmware after the deployment.End of change
Start of change

In ESS 5000, if using Y-Type cables (used in HDR switches), for running high-speed network, the mmhealth node show -N ioNode1-ib,ioNode2-ib NETWORK might show ib_rdma_port_width_low(mlx5_0/1, mlx5_1/1, mlx5_4/1).

Product
  • ESS 5000
End of change
Start of change
  1. Check existing anomaly in HDR-Y cables.
  2. Contact IBM Support for help to update /usr/lpp/mmfs /lib/mmsysmon/NetworkService.py and /usr/lpp/mmfs/lib/mmsysmon/network.json with appropriate code.
  3. After patching, restart mmsysmon to apply changes. Example: systemctl restart mmsysmon.
  4. Issue the mmhealth command to verify whether the condition is alleviated.

End of change
Start of change
When EMS is updated from previous releases to ESS 6.1.5.1, setting up SELlinux on EMS by issuing the essrun selinux enable command in a container fails. The following error appears:
Failed to resolve typeattributeset statement at /var/lib/selinux/targeted/tmp/modules/400/pcpupstream/cil:42 The issue may be related to a bug in Red Hat kernel 8.6
Product
  • ESS legacy
  • ESS 3000
  • ESS 5000
  • ESS 3200
  • ESS 3500
  • ESS 3500 (4u102)
End of change
Start of change
  1. Reboot EMS.
  2. Restart the container.
  3. Ensure that selinux-policy is up-to-date by issuing the yum update selinux-policy command.
  4. Reinstall pcp-selinux by issuing the yum reinstall pcp-selinux command.
End of change
Start of change

In Monitoring > Hardware > Editing Rack Components, the edited a rack location is not canceled after clicking the Cancel button.

In GUI, Editing Rack Components works. However, by clicking Cancel, an operation is not aborted. The purpose of the Close button is to close the window after the operation (rack component location editing/component discovery) finishes.

Product
  • ESS legacy
  • ESS 3000
  • ESS 5000
  • ESS 3200
  • ESS 3500
  • ESS 3500 (4u102)
End of change
Start of change

Click Close but after the operation finishes.

End of change
Start of changeRunning ess_ssr_setup hangs if recovery group descriptors exist. This will simply hang and does not indicate that descriptors were found.

An SSR might see the following output:

Do you want to continue and perform changes and tests in this node? (y/n): y
 2022-08-19 09:38:44,234 INFO: Going to set the root user password of this node to the password typed before
 2022-08-19 09:38:44,433 INFO: Run 'Root_password_set' completed successfully
 2022-08-19 09:38:44,602 INFO: Run 'Passwordless root SSH localhost' completed successfully
 2022-08-19 09:38:44,602 INFO: Going to perform storage tests on this node
 2022-08-19 09:38:44,602 INFO: Going to run 'Quick storage configuration check'
 2022-08-19 09:38:45,484 INFO: Run 'Quick storage configuration check' completed successfully
 2022-08-19 09:38:45,484 INFO: Going to run 'Check enclosure cabling and paths to disks'
 2022-08-19 09:39:38,432 INFO: Run 'Check enclosure cabling and paths to disks' completed successfully
 2022-08-19 09:39:38,432 INFO: Going to run 'Check disks for IO operations'
Product
  • ESS 5000
  • ESS 3500 (4u102)
End of change
Start of changeThe SSR should contact service for help.End of change
Start of changeWhen the essinstallcheck command is run, an error might occur. Customers might face this issue when they run the essrun healthcheck command (that runs the essinstallcheck command) with multiple nodes or group. The mmvdisk locks other nodes when querying is the cause of this error.
Product
  • ESS 3000
  • ESS 3200
  • ESS 5000
  • ESS 3500
  • ESS 3500 (4u102)
End of change
Start of change
  • Check Ansible logs before you run the mmvdisk command (in essinstallcheck) and retry when ready.
  • In the field, run the healthcheck command on individual nodes if the error is seen.
End of change
Start of change

The essrun update might hang with 'waiting for free locks'. The mmapply policy causes locks.

Friday 12 August 2022  21:02:23 +0000 (0:00:00.536)       0:34:26.503 *********
FAILED - RETRYING: Waiting for free Locks (100 retries left).
FAILED - RETRYING: Waiting for free Locks (99 retries left).
Product
  • ESS 3000
  • ESS 3200
  • ESS 5000
  • ESS 3500
  • ESS 3500 (4u102)
End of change
Start of change
  1. Check if policy is being applied and wait until it finishes to run update.
  2. Run the mmcommon showlocks command to check what is causing the lock.

For more information about locks, see IBM Spectrum Scale Administration Guide.

End of change
Start of changeGUI: wizard setup is not allowed to move past Rack Locations. Moving backward through GUI and then trying to move forward is blocked.
Product
  • ESS 3000
  • ESS 3200
  • ESS 5000
  • ESS 3500
  • ESS 3500 (4u102)
End of change
Start of change
  1. Back up to a Location information and re-enter the data.
  2. Clean up the GUI DB and start the wizard setup again.
End of change
Start of changeWhen creating additional file systems in a tiered storage environment you might encounter a MIGRATION callback error.
mmaddcallback: Callback identifier "MIGRATION" already exists or was specified multiple times.
If a callback exists, file system creation will fail.
Product
  • ESS 3000
  • ESS 3200
  • ESS 5000
  • ESS 3500
  • ESS 3500 (4u102)
End of change
Start of change

Delete the callback and create the file system again.

End of change
Start of changeMany call home events for temperature sensor causes canister1_inlet_id1 failure.
TS008179389    2022-08-05 02:16:50  New Case Opened  78E4007:canister:78E4007A/1:canister1_inlet_id1:Temperature sensor canister1_inlet_id1 is failed
TS008179399    2022-08-05 03:14:19  New Case Opened  78E4007:canister:78E4007A/1:canister1_inlet_id1:Temperature sensor canister1_inlet_id1 is failed
TS008179432    2022-08-05 07:15:33  New Case Opened  78E4007:canister:78E4007B/0:canister2_inlet_id0:Temperature sensor canister2_inlet_id0 is failed
Product
  • ESS 3500
End of change
Start of change
  • If this issue occurs in field, turn up the AC to cool ambient temperature.
  • If this issue occurs, customers/CE must lower ambient lab temperature.
End of change
Start of changeBoth ESS 3500 power supplies blink red-orange LED once every second during the I/O load.
Product
  • ESS 3500
End of change
Start of changeIf this problem occurs, reseat the power supply.End of change
The essrun ONLINE update failed on the mmchfirmware -N localhost --type drive.
Product
  • ESS 5000
SSR and customers must manually run the mmchfirmware command after the deployment completes.