ESS known issues

Known issues in ESS

For information about ESS 5.3.7.x known issues, see Known issues in ESS 5.3.7.x Quick Deployment Guide.

The following table describes the known issues in IBM Elastic Storage® System (ESS) and how to resolve these issues.

Issue	Resolution or action
The mmvdisk sed enroll command might proceed when it is issued after creating user vdisk sets instead of blocking it. Product ESS 3500	Issue the mmvdisk sed enroll command after creating a recovery group and before creating user vdisk sets. Contact IBM Support, if you issued the mmvdisk sed enroll command after creating user vdisk sets.
The following false positive error occurs in mmhealth for the `boot_drive_endurance_unknown` event during periods of high stress on the ESS 3500 system. `NATIVE_RAID HEALTHY 1 hour ago [bootdrive_endurance_unknown](Canister2:Boot1, Canister2:Boot2)` Product ESS 3500 (4u102)	This error resolves automatically. If it is not resolved, issue the mmsysmoncontrol restart command to temporarily resolve this error.
Occasionally, some drive paths on a few drives in an ESS 3500 enclosure were found missing, when the essfindmissingdisks command was issued. # essfindmissingdisks [INFO] Start find missing disk paths [INFO] nodelist: localhost [INFO] May take long time to complete search of all drive paths [INFO] Checking node: localhost [INFO] Checking missing disk paths from node localhost [INFO] GNR server: name ess3500a-hs.test.net arch x86_64 model ESS3500-5141-FN2 serial 78E400XA [INFO] Enclosure 78E40XA sees 22 disks (22 SSDs, 0 HDDs) [INFO] Enclosure 78T254A sees 102 disks (0 SSDs, 102 HDDs) [INFO] Enclosure 78T246A sees 102 disks (0 SSDs, 102 HDDs) [ERROR] GNR server disk topology: ESS 3500 H2 (2 HBA 24 NVMe 2 Full 4U102) (match: 96/100) [INFO] GNR configuration: 3 enclosures, 22 SSDs, 2 empty slots, 220 disks total, 0 NVRAM partitions [ERROR] Location 78E40XA-1 appears empty but should have an SSD [ERROR] Location 78E40XA-2 appears empty but should have an SSD [[ERROR] essfindmissingdisks detected error in system. Please review output carefully. [root@ess3500a ~]# This error indicates that one path to the drive might be in a ‘stuck-state’ condition whereas other path is healthy. Product ESS 3200 ESS 3500	Run the following script from the primary canister: `root@ess3500a ~]# /opt/ibm/ess/tools/samples/fix_stuck_drive_slots.sh`
The ESS system reboots unexpected because mpt3sas messages fill logs. The following error appears: `System crashed with 'swiotlb buffer is full' then 'scsi_dma_map failed' errors` Product ESS 3500	To resolve this error, go to the Red Hat known issues.
When you use GUI to unmount a file system on all nodes of the home cluster and the remote cluster, GUI unmounts only all_remote nodes. The file system will still be mounted on server nodes of the cluster when it is selected to unmount all nodes of the cluster and all_remote nodes. However, the file system is not unmounted from both cluster nodes and remote cluster nodes. It is unmounted only from remote nodes. Product ESS 3500	Select unmount of file system one option at a time although GUI allows selection of multiple unmount options.
When creating a file system by using GUI, Declustered Array is not visible because of horizontal scrolling. Product ESS 3500	Move cursor over the DA area and forcefully scroll the screen left or right, then the scroll bar appears. However, if the cursor is moved back to the Back button or the Next button, the scroll bar disappears.
False positive intermittent fan module failures occurred in /var/log/messages. The call home might generate a service request for each fan module. The following errors appear: `mmsysmon[7819]: [W] Event raised: Fan fan_module1_id4 has a fault. mmsysmon[7819]: [W] Event raised: Fan fan_module1_id4 state is FAILED.` Product ESS 3500	No fix available. Contact IBM Support to verify false positive condition.
After GPFS upgrades, the queue depth is 64 instead of 1 in storage quick check. The following errors appear during the Storage Quick Check: `[ERROR] Enclosure /dev/sg103 contains queue_depth=2, must be queue_depth=1 [ERROR] Enclosure /dev/sg6 contains queue_depth=2, must be queue_depth=1` Product ESS 3500	Reboot each ESS 3500 canister one more time so that the queue_depth is set to 1 as per the udev rule.
For ESS 5000, running essrun ONLINE update might fail on the mmchfirmware -N localhost --type drive section. Product ESS 5000	Manually issue the mmchfirmware after the deployment.
In ESS 5000, if using Y-Type cables (used in HDR switches), for running high-speed network, the mmhealth node show -N ioNode1-ib,ioNode2-ib NETWORK might show `ib_rdma_port_width_low(mlx5_0/1, mlx5_1/1, mlx5_4/1)`. Product ESS 5000	Check existing anomaly in HDR-Y cables. Contact IBM Support for help to update /usr/lpp/mmfs /lib/mmsysmon/NetworkService.py and /usr/lpp/mmfs/lib/mmsysmon/network.json with appropriate code. After patching, restart mmsysmon to apply changes. Example: systemctl restart mmsysmon. Issue the mmhealth command to verify whether the condition is alleviated.
When EMS is updated from previous releases to ESS 6.1.5.1, setting up SELlinux on EMS by issuing the essrun selinux enable command in a container fails. The following error appears: `Failed to resolve typeattributeset statement at /var/lib/selinux/targeted/tmp/modules/400/pcpupstream/cil:42 The issue may be related to a bug in Red Hat kernel 8.6` Product ESS legacy ESS 3000 ESS 5000 ESS 3200 ESS 3500 ESS 3500 (4u102)	Reboot EMS. Restart the container. Ensure that selinux-policy is up-to-date by issuing the yum update selinux-policy command. Reinstall pcp-selinux by issuing the yum reinstall pcp-selinux command.
In Monitoring > Hardware > Editing Rack Components, the edited a rack location is not canceled after clicking the Cancel button. In GUI, Editing Rack Components works. However, by clicking Cancel, an operation is not aborted. The purpose of the Close button is to close the window after the operation (rack component location editing/component discovery) finishes. Product ESS legacy ESS 3000 ESS 5000 ESS 3200 ESS 3500 ESS 3500 (4u102)	Click Close but after the operation finishes.
Running ess_ssr_setup hangs if recovery group descriptors exist. This will simply hang and does not indicate that descriptors were found. An SSR might see the following output: Do you want to continue and perform changes and tests in this node? (y/n): y 2022-08-19 09:38:44,234 INFO: Going to set the root user password of this node to the password typed before 2022-08-19 09:38:44,433 INFO: Run 'Root_password_set' completed successfully 2022-08-19 09:38:44,602 INFO: Run 'Passwordless root SSH localhost' completed successfully 2022-08-19 09:38:44,602 INFO: Going to perform storage tests on this node 2022-08-19 09:38:44,602 INFO: Going to run 'Quick storage configuration check' 2022-08-19 09:38:45,484 INFO: Run 'Quick storage configuration check' completed successfully 2022-08-19 09:38:45,484 INFO: Going to run 'Check enclosure cabling and paths to disks' 2022-08-19 09:39:38,432 INFO: Run 'Check enclosure cabling and paths to disks' completed successfully 2022-08-19 09:39:38,432 INFO: Going to run 'Check disks for IO operations' Product ESS 5000 ESS 3500 (4u102)	The SSR should contact service for help.
When the essinstallcheck command is run, an error might occur. Customers might face this issue when they run the essrun healthcheck command (that runs the essinstallcheck command) with multiple nodes or group. The mmvdisk locks other nodes when querying is the cause of this error. Product ESS 3000 ESS 3200 ESS 5000 ESS 3500 ESS 3500 (4u102)	Check Ansible logs before you run the mmvdisk command (in essinstallcheck) and retry when ready. In the field, run the healthcheck command on individual nodes if the error is seen.
The essrun update might hang with 'waiting for free locks'. The mmapply policy causes locks. `Friday 12 August 2022 21:02:23 +0000 (0:00:00.536) 0:34:26.503 ******* FAILED - RETRYING: Waiting for free Locks (100 retries left). FAILED - RETRYING: Waiting for free Locks (99 retries left).` Product** ESS 3000 ESS 3200 ESS 5000 ESS 3500 ESS 3500 (4u102)	Check if policy is being applied and wait until it finishes to run update. Run the mmcommon showlocks command to check what is causing the lock. For more information about locks, see IBM Spectrum Scale Administration Guide.
GUI: wizard setup is not allowed to move past Rack Locations. Moving backward through GUI and then trying to move forward is blocked. Product ESS 3000 ESS 3200 ESS 5000 ESS 3500 ESS 3500 (4u102)	Back up to a Location information and re-enter the data. Clean up the GUI DB and start the wizard setup again.
When creating additional file systems in a tiered storage environment you might encounter a MIGRATION callback error. `mmaddcallback: Callback identifier "MIGRATION" already exists or was specified multiple times.` If a callback exists, file system creation will fail. Product ESS 3000 ESS 3200 ESS 5000 ESS 3500 ESS 3500 (4u102)	Delete the callback and create the file system again.
Many call home events for temperature sensor causes canister1_inlet_id1 failure. `TS008179389 2022-08-05 02:16:50 New Case Opened 78E4007:canister:78E4007A/1:canister1_inlet_id1:Temperature sensor canister1_inlet_id1 is failed TS008179399 2022-08-05 03:14:19 New Case Opened 78E4007:canister:78E4007A/1:canister1_inlet_id1:Temperature sensor canister1_inlet_id1 is failed TS008179432 2022-08-05 07:15:33 New Case Opened 78E4007:canister:78E4007B/0:canister2_inlet_id0:Temperature sensor canister2_inlet_id0 is failed` Product ESS 3500	If this issue occurs in field, turn up the AC to cool ambient temperature. If this issue occurs, customers/CE must lower ambient lab temperature.
Both ESS 3500 power supplies blink red-orange LED once every second during the I/O load. Product ESS 3500	If this problem occurs, reseat the power supply.
The essrun ONLINE update failed on the mmchfirmware -N localhost --type drive. Product ESS 5000	SSR and customers must manually run the mmchfirmware command after the deployment completes.