ESS known issues

Known issues in ESS

For information about ESS 5.3.7.x known issues, see Known issues in ESS 5.3.7.x Quick Deployment Guide.

The following table describes the known issues in IBM Elastic Storage® System (ESS) and how to resolve these issues.

Issue	Resolution or action
After initial deployment, the EMS may show SERVER2U instead of 5105-22E as the MTM. Product ESS Legacy ESS 3000 ESS 3200 ESS 3500 ESS 5000	Navigate to the Hardware Panel of the GUI and select the image of the EMS to verify that the MTM shows as SERVER2U. If the condition is seen, log in as root to the EMS command line and issue the following command to determine the Component ID of the EMS node: Example: Delete the ems server using either the Comp ID, Serial Number or Name (using Comp ID for this example): : Example: `[root@ems ~]# mmdelcomp 12 INFO: Deleting component 12 mmcomp: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process.` Return to the Hardware Panel in GUI, click the Edit Component tab. The Edit Rack Components Wizard appears. Select the following option: Yes, discover new servers and enclosure first. This takes many minutes Continue to click Next without changing any parameters until you get to the Rack Locations section of the Edit Rack Components page. Ensure to respecify the location of the EMS node on the page, and click Next. Go to the final page and click Finish, ESS applies the change.
The Ansible tool essrun cannot add more than one building block at a time in a cluster. Product ESS Legacy ESS 3000 ESS 3200 ESS 3500 ESS 5000	If it is necessary to add more than one building block in a cluster, the following two options are available: Use the essrun command and add each building block individually. Use the mmvdisk command to add the building blocks.
During upgrade, if the container had an unintended loss of connection with the target canister(s), there might be a timeout of up to 2 hours in the Ansible® update task. Product ESS Legacy ESS 3000 ESS 3200 ESS 3500 ESS 5000	Wait for the timeout and retry the essrun update task.
When running essrun commands, you might see messages such as these: `Thursday 16 April 2020 20:52:44 +0000 (0:00:00.572) 0:13:19.792 ****** Thursday 16 April 2020 20:52:45 +0000 (0:00:00.575) 0:13:20.367 **** Thursday 16 April 2020 20:52:46 +0000 (0:00:00.577) 0:13:20.944 ****` Product** ESS Legacy ESS 3000 ESS 3200 ESS 3500 ESS 5000	This is a restriction in the Ansible timestamp module. It shows timestamps even for the “skipped” tasks. If you want to remove timestamps from the output, change the ansible.cfg file inside the container as follows: vim /etc/ansible/ansible.cfg Remove `,profile_tasks` on line 7. Save and quit: `esc + :wq`
After reboot of an ESS 5000 node, systemd could be loaded incorrectly. Users might see the following error when trying to start GPFS: `Failed to activate service 'org.freedesktop.systemd1': timed out` Product ESS 5000	Power off the system and then power it on again. Run the following command from the container: `rpower <node name> off` Wait for at least 30 seconds and run the following command to verify that the system is off: `rpower <node name> status` Restart the system with the following command: `rpower <node name> on`
In ESS 5000 SLx series, after pulling a hard drive out for a long time wherein the drive has finished draining, when you re-insert the drive, the drive could not be recovered. Product ESS 5000	Run the following command from EMS or IO node to revive the drive: `mmvdisk pdisk change --rg RGName --pdisk PdiskName --revive` Where `RGName` is the recovery group that the drive belongs to and `PdiskName` is the drive's pdisk name.
After the deployment is complete, if firmware on the enclosure, drive, or HBA adapter does not match the expected level, and if you run essinstallcheck, the following mmvdisk settings related error message is displayed: `[ERROR] mmvdisk settings do NOT match best practices. Run mmvdisk server configure --verify --node-class ess5k_ppc64le_mmvdisk to debug.` Product ESS Legacy ESS 3000 ESS 3200 ESS 3500 ESS 5000	The error about mmvdisk settings can be ignored. The resolution is to update the mismatched firmware levels on enclosure, adapter, or HBA adapters to the correct levels. You can run the mmvdisk configuration check command to confirm. The mmvdisk settings do not match best practices. Run the mmvdisk server configure --verify --node-class <nodeclass> command. List the mmvdisk node classes: mmvdisk nc list Note: essinstallcheck detects inconsistencies from mmvdisk best practices for all node classes in the cluster and stops immediately if an issue is found.
When running essinstallcheck you might see an error message similar to: `System Firmware could not be obtained which will lead to a false-positive PASS message when the script completes.` Only one time in the container Product ESS Legacy ESS 5000	Run vpdupdate on each I/O node. Rerun essinstallcheck which should properly query the firmware level.
During command-less disk replacement, there is a limit on how many disks can be replaced at one time. Product ESS 3000 ESS 3200 ESS 3500 ESS 5000	For command-less disk replacement using commands, only replace up to 2 disks at a time. If command-less disk replacement is enabled, and more than 2 disks are replaceable, replace the 1st 2 disks, and then use the commands to replace the 3rd and subsequent disks.
Issue reported with command-less disk replacement warning LEDs. Product ESS 5000	The replaceable disk will have the amber led on, but not blinking. Disk replacement should still succeed.
After upgrading an ESS node to version, the `pmsensors` service needs to be manually started. Product ESS 3000 ESS 3200 ESS 3500	After the ESS upgrade is complete, the `pmsensors` service does not automatically start. You must manually start the service for performance monitoring to be restored. On each ESS node, run the following command: `systemctl start pmsensors` For checking the status of the service, run the following command: `systemctl status --no-pager pmsensors`
The canister_failed event does not surface amber LED on the canister or the enclosure LED front panel. Product ESS 3200 ESS 3500	Root cause: The failed canister is not the master canister, and the other canister is not up/running. Action required: No
Migration from ESS Legacy releases (5.3.7.x) to the container version (ESS 6.1.x.x) might revert values in the mmvdisk to default settings. Product ESS Legacy	For more information about this issue, see IBM Support.
Node call home might not work for nodes that are designated as protocol nodes. If a power supply (or any other Opal related node problem) is damaged or pulled, a call home will not be available on the Salesforce system. Opal PRD might not log error from FSP that is caussing this issue. Product ESS 5000	Determine a power supply problem by manually inspecting the ASMI error/event logs by using FSP and open problem with support if required.
If the essrun gui –configure command is run after the GUI and performance monitoring is already set up, you might get an error prompting you to remove any existing GUI config before continuing. Product ESS 3200 ESS 3500	If the GUI is already set up, it is not required to remove the existing GUI config. Exit the container. Run the mmhealth node show gui -acommand. Verify that performance sensors and collectors are healthy. Verify that the gui daemon is started. `systemctl status gpfsgui` Access the GUI. `https://GUI_Node_IP` Verify that performance monitoring is active and all nodes are seen properly using the GUI.
The mmcallhome ticket list reports multiple tickets opened for the same issue. Product ESS 3500	On the EMS check if there are more duplicates events in the queue to be sent to IBM, issue the following command: `ls /tmp/mmfs/callhome/incomingFTDC2CallHome/` If this directory contains more entries of duplicate call home events: Stop mmsysmon on EMS. `mmsysmoncontrol stop` Clear staging area on EMS. `/tmp/mmfs/callhome/incomingFTDC2CallHome/` Start mmsysmon* on EMS. `mmsysmoncontrol start`
The mmcallhome ticket list still reports “New Case Opened” after the PMR is closed by IBM. Product ESS 3500	Remove the ticket. `mmcallhome ticket delete <ticket number TSxxxxxxx>`
After deploying the protocol VM on an ESS 3500 canister the Mellanox OFED driver is not installed. Example: `ofed_info -s -bash: ofed_info: command not found` Product ESS 3500	Manually run the ess_ofed postscript after the VM is deployed. Log in to the VM and run the following to install the driver manually: `/opt/ibm/ess/tools/postscripts/ess_ofed.essvm` After installation, verify the driver is installed (ofed_info -s) and reboot the VM (sync; systemctl reboot).
Cannot create CES file system, if I/O nodes are deployed with versions prior to 6.1.2.0 by using the essrun command. Example of old naming convention: ess5k_7894DBA ess5k_7894E4A Product ESS 3200 ESS 3500 ESS 5000	Ansible tries to gather the RG by using the new name format. Example: `ess5k_essio1_ib_essio2_ib` Create the CES file system by using the mmvdisk command directly in the EMS or any I/O node in the cluster. Gather the desired RG name(s). `mmvdisk nc list` Define Vdiskset: `mmvdisk vs define --vs vs_cesSharedRoot_essio1_hs_essio2_hs --rg ess5k_7894DBA,ess5k_7894E4A --code 8+2p --bs 4M --ss 20G` Create Vdiskset. `mmvdisk vs create –vs vs_cesSharedRoot_essio1_hs_essio2_hs` Create a file system. `mmvdisk fs create --fs cesSharedRoot --vs vs_cesSharedRoot_essio1_hs_essio2_hs --mmcrfs -T /gpfs/cesSharedRoot` Mount the file system. `mmmount cesSharedRoot -a`
During the file system creation in Mixed environments (ESS 5000 and ESS 3500), the following error can appear: `TASK [/opt/ibm/ess/deploy/ansible/roles/mmvdiskcreate: Define Vdiskset] ************` An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ZeroDivisionError: division by zero Bad access to a specific variable of ESS 5000 I/O node causes this issue. Produce** ESS 3200 ESS 3500 ESS 5000	Issue the following command only one time in the container. `sed -i "s/enclQty/hostvars[item].enclQty/g" /opt/ibm/ess/deploy/ansible/roles/mmvdiskcreate\ /tasks/create_filesystem_mixed.yml` Continue with the essrun -N <node list>filesystem command.
BMC network may become unresponsive when configured with VLAN. VLAN configuration failed to properly activate in the BMC network stack. Product ESS 3500	Log in to the canister corresponding to the BMC. Unconfigure the VLAN. `ipmitool lan set 1 vlan id off` Reconfigure the VLAN. `ipmitool lan set 1 vlan id <vlan id>`
Amber LED on the power supply may flash or turn solid without any amber LED in the front of the enclosure. Power supply may incorrectly detect out of range operating parameters such as incoming voltage or power supply temperature. Product ESS 3500	Contact Service if power supply presents false positive status. Run the mmhealth command. `mmhealth node show NATIVE_RAID` If failure is real, it will show NATIVE_RAID->ENCLOSURE DEGRADED Review the mmhealth command output for power supply related issues. `mmhealth node show NATIVE_RAID -v \| grep psu` Follow normal service procedures if necessary.
The esscallhomeconf command may not be able to automatically create call home group. It may present the following message: `[ERROR] Unable to create auto group for callhome` Product ESS 3500	From the EMS: Backup call home staging area. `mkdir /tmp/unifiedCallhome mmsysmoncontrol stop “mv /tmp/mmfs/callhome/incomingFTDC2CallHome/* /tmp/unifiedCallhome” mmsysmoncontrol start` Rerun the esscallhomeconf command. Restore call home staging area. `“mv /tmp/unifiedCallhome/* /tmp/mmfs/callhome/incomingFTDC2CallHome”`