ESS known issues

Known issues in ESS

For information about ESS 5.3.7.x known issues, see Known issues in ESS 5.3.7.x Quick Deployment Guide.

The following table describes the known issues in IBM Elastic Storage® System (ESS) and how to resolve these issues.
Issue Resolution or action
After initial deployment, the EMS may show SERVER2U instead of 5105-22E as the MTM.
Product
  • Start of changeESS LegacyEnd of change
  • ESS 3000
  • ESS 3200
  • Start of changeESS 3500End of change
  • ESS 5000
  • Navigate to the Hardware Panel of the GUI and select the image of the EMS to verify that the MTM shows as SERVER2U.
  • If the condition is seen, log in as root to the EMS command line and issue the following command to determine the Component ID of the EMS node:

    Example:

    Start of changeExample EMS MTM is shown as SERVER2UEnd of change
  • Delete the ems server using either the Comp ID, Serial Number or Name (using Comp ID for this example): :
    Example:
    [root@ems ~]# mmdelcomp 12
    INFO: Deleting component 12
    mmcomp: Propagating the cluster configuration data to all 
    affected nodes. This is an asynchronous process. 
  • Return to the Hardware Panel in GUI, click the Edit Component tab. The Edit Rack Components Wizard appears. Select the following option:
    • Yes, discover new servers and enclosure first. This takes many minutes

    Continue to click Next without changing any parameters until you get to the Rack Locations section of the Edit Rack Components page. Ensure to respecify the location of the EMS node on the page, and click Next.

    Go to the final page and click Finish, ESS applies the change.

The Ansible tool essrun cannot add more than one building block at a time in a cluster.
Product
  • Start of changeESS LegacyEnd of change
  • ESS 3000
  • ESS 3200
  • Start of changeESS 3500End of change
  • ESS 5000
If it is necessary to add more than one building block in a cluster, the following two options are available:
  • Use the essrun command and add each building block individually.
  • Use the mmvdisk command to add the building blocks.
During upgrade, if the container had an unintended loss of connection with the target canister(s), there might be a timeout of up to 2 hours in the Ansible® update task.
Product
  • ESS Legacy
  • ESS 3000
  • ESS 3200
  • Start of changeESS 3500End of change
  • ESS 5000
Wait for the timeout and retry the essrun update task.
When running essrun commands, you might see messages such as these:
Thursday 16 April 2020 20:52:44 +0000
(0:00:00.572) 0:13:19.792 ********
Thursday 16 April 2020 20:52:45 +0000
(0:00:00.575) 0:13:20.367 ********
Thursday 16 April 2020 20:52:46 +0000
(0:00:00.577) 0:13:20.944 ********
Product
  • Start of changeESS LegacyEnd of change
  • ESS 3000
  • ESS 3200
  • Start of changeESS 3500End of change
  • ESS 5000
This is a restriction in the Ansible timestamp module. It shows timestamps even for the “skipped” tasks. If you want to remove timestamps from the output, change the ansible.cfg file inside the container as follows:
  1. vim /etc/ansible/ansible.cfg
  2. Remove ,profile_tasks on line 7.
  3. Save and quit: esc + :wq
After reboot of an ESS 5000 node, systemd could be loaded incorrectly.
Users might see the following error when trying to start GPFS:
Failed to activate service 'org.freedesktop.systemd1':
 timed out
Product
  • ESS 5000
Power off the system and then power it on again.
  1. Run the following command from the container:
    rpower <node name> off
  2. Wait for at least 30 seconds and run the following command to verify that the system is off:
    rpower <node name> status
  3. Restart the system with the following command:
    rpower <node name> on
In ESS 5000 SLx series, after pulling a hard drive out for a long time wherein the drive has finished draining, when you re-insert the drive, the drive could not be recovered.
Product
  • ESS 5000
Run the following command from EMS or IO node to revive the drive:
mmvdisk pdisk change --rg RGName --pdisk PdiskName --revive

Where RGName is the recovery group that the drive belongs to and PdiskName is the drive's pdisk name.

After the deployment is complete, if firmware on the enclosure, drive, or HBA adapter does not match the expected level, and if you run essinstallcheck, the following mmvdisk settings related error message is displayed:
[ERROR] mmvdisk settings do NOT match best practices. 
Run mmvdisk server configure --verify --node-class 
ess5k_ppc64le_mmvdisk to debug.  
Product
  • Start of changeESS LegacyEnd of change
  • ESS 3000
  • ESS 3200
  • Start of changeESS 3500End of change
  • ESS 5000

The error about mmvdisk settings can be ignored. The resolution is to update the mismatched firmware levels on enclosure, adapter, or HBA adapters to the correct levels.

You can run the mmvdisk configuration check command to confirm.

The mmvdisk settings do not match best practices. Run the mmvdisk server configure --verify --node-class <nodeclass> command.

List the mmvdisk node classes: mmvdisk nc list
Note: essinstallcheck detects inconsistencies from mmvdisk best practices for all node classes in the cluster and stops immediately if an issue is found.
When running essinstallcheck you might see an error message similar to:
System Firmware could not be obtained 
which will lead to a false-positive 
PASS message when the script completes.
Only one time in the container
Product
  • ESS Legacy
  • ESS 5000

Run vpdupdate on each I/O node.

Rerun essinstallcheck which should properly query the firmware level.
During command-less disk replacement, there is a limit on how many disks can be replaced at one time.
Product
  • ESS 3000
  • ESS 3200
  • ESS 3500
  • ESS 5000
For command-less disk replacement using commands, only replace up to 2 disks at a time. If command-less disk replacement is enabled, and more than 2 disks are replaceable, replace the 1st 2 disks, and then use the commands to replace the 3rd and subsequent disks.
Issue reported with command-less disk replacement warning LEDs.
Product
  • ESS 5000
The replaceable disk will have the amber led on, but not blinking. Disk replacement should still succeed.
After upgrading an ESS node to version, the pmsensors service needs to be manually started.
Product
  • ESS 3000
  • ESS 3200
  • ESS 3500
After the ESS upgrade is complete, the pmsensors service does not automatically start. You must manually start the service for performance monitoring to be restored. On each ESS node, run the following command:
systemctl start pmsensors
For checking the status of the service, run the following command:
systemctl status --no-pager pmsensors
The canister_failed event does not surface amber LED on the canister or the enclosure LED front panel.
Product
  • ESS 3200
  • ESS 3500
Root cause: The failed canister is not the master canister, and the other canister is not up/running.

Action required: No

Start of change

Migration from ESS Legacy releases (5.3.7.x) to the container version (ESS 6.1.x.x) might revert values in the mmvdisk to default settings.

Product
  • ESS Legacy
End of change
Start of change

For more information about this issue, see IBM Support.

End of change
Start of changeNode call home might not work for nodes that are designated as protocol nodes. If a power supply (or any other Opal related node problem) is damaged or pulled, a call home will not be available on the Salesforce system.

Opal PRD might not log error from FSP that is caussing this issue.

Product
  • ESS 5000
End of change
Start of changeDetermine a power supply problem by manually inspecting the ASMI error/event logs by using FSP and open problem with support if required. End of change
Start of changeIf the essrun gui –configure command is run after the GUI and performance monitoring is already set up, you might get an error prompting you to remove any existing GUI config before continuing.
Product
  • ESS 3200
  • ESS 3500
End of change
Start of changeIf the GUI is already set up, it is not required to remove the existing GUI config.
Exit the container.
  1. Run the mmhealth node show gui -acommand.

    Verify that performance sensors and collectors are healthy.

  2. Verify that the gui daemon is started.
    systemctl status gpfsgui
  3. Access the GUI.
    https://GUI_Node_IP

    Verify that performance monitoring is active and all nodes are seen properly using the GUI.

End of change
Start of changeThe mmcallhome ticket list reports multiple tickets opened for the same issue.
Product
  • ESS 3500
End of change
Start of change
On the EMS check if there are more duplicates events in the queue to be sent to IBM, issue the following command:
ls /tmp/mmfs/callhome/incomingFTDC2CallHome/
If this directory contains more entries of duplicate call home events:
  1. Stop mmsysmon on EMS.
    mmsysmoncontrol stop
  2. Clear staging area on EMS.
    /tmp/mmfs/callhome/incomingFTDC2CallHome/*
  3. Start mmsysmon on EMS.
    mmsysmoncontrol start
End of change
Start of changeThe mmcallhome ticket list still reports “New Case Opened” after the PMR is closed by IBM.
Product
  • ESS 3500
End of change
Start of changeRemove the ticket.
mmcallhome ticket delete <ticket number TSxxxxxxx>
End of change
Start of change

After deploying the protocol VM on an ESS 3500 canister the Mellanox OFED driver is not installed.

Example:
ofed_info -s
-bash: ofed_info: command not found
Product
  • ESS 3500
End of change
Start of change
  1. Manually run the ess_ofed postscript after the VM is deployed.
  2. Log in to the VM and run the following to install the driver manually:
    /opt/ibm/ess/tools/postscripts/ess_ofed.essvm
  3. After installation, verify the driver is installed (ofed_info -s) and reboot the VM (sync; systemctl reboot).
End of change
Start of change

Cannot create CES file system, if I/O nodes are deployed with versions prior to 6.1.2.0 by using the essrun command.

Example of old naming convention:
  • ess5k_7894DBA
  • ess5k_7894E4A
Product
  • ESS 3200
  • ESS 3500
  • ESS 5000
End of change
Start of change
Ansible tries to gather the RG by using the new name format. Example:
ess5k_essio1_ib_essio2_ib
Create the CES file system by using the mmvdisk command directly in the EMS or any I/O node in the cluster.
  1. Gather the desired RG name(s).
    mmvdisk nc list 
  2. Define Vdiskset:
    mmvdisk vs define 
    --vs vs_cesSharedRoot_essio1_hs_essio2_hs
    --rg ess5k_7894DBA,ess5k_7894E4A 
    --code 8+2p --bs 4M --ss 20G
  3. Create Vdiskset.
    mmvdisk vs create 
    –vs vs_cesSharedRoot_essio1_hs_essio2_hs
  4. Create a file system.
    mmvdisk fs create 
    --fs cesSharedRoot --vs vs_cesSharedRoot_essio1_hs_essio2_hs 
    --mmcrfs -T /gpfs/cesSharedRoot
  5. Mount the file system.
    mmmount cesSharedRoot -a
End of change
Start of change
During the file system creation in Mixed environments (ESS 5000 and ESS 3500), the following error can appear:
TASK [/opt/ibm/ess/deploy/ansible/roles/mmvdiskcreate:
 Define Vdiskset] **************

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ZeroDivisionError: division by zero

Bad access to a specific variable of ESS 5000 I/O node causes this issue.

Produce
  • ESS 3200
  • ESS 3500
  • ESS 5000
End of change
Start of change
  1. Issue the following command only one time in the container.
    sed -i "s/enclQty/hostvars[item].enclQty/g" 
    /opt/ibm/ess/deploy/ansible/roles/mmvdiskcreate\
    /tasks/create_filesystem_mixed.yml
  2. Continue with the essrun -N <node list>filesystem command.

End of change
Start of changeBMC network may become unresponsive when configured with VLAN.

VLAN configuration failed to properly activate in the BMC network stack.

Product
  • ESS 3500
End of change
Start of change
  1. Log in to the canister corresponding to the BMC.
  2. Unconfigure the VLAN.
    ipmitool lan set 1 vlan id off
  3. Reconfigure the VLAN.
    ipmitool lan set 1 vlan id <vlan id>
End of change
Amber LED on the power supply may flash or turn solid without any amber LED in the front of the enclosure.

Power supply may incorrectly detect out of range operating parameters such as incoming voltage or power supply temperature.

Product
  • ESS 3500
  1. Contact Service if power supply presents false positive status.

  2. Run the mmhealth command.
    mmhealth node show NATIVE_RAID

    If failure is real, it will show NATIVE_RAID->ENCLOSURE DEGRADED

  3. Review the mmhealth command output for power supply related issues.
    mmhealth node show NATIVE_RAID -v | grep psu
  4. Follow normal service procedures if necessary.
The esscallhomeconf command may not be able to automatically create call home group. It may present the following message:
[ERROR]    Unable to create auto group for callhome
Product
  • ESS 3500
From the EMS:
  1. Backup call home staging area.
    mkdir /tmp/unifiedCallhome
    
    mmsysmoncontrol stop
    
    “mv /tmp/mmfs/callhome/incomingFTDC2CallHome/* 
    /tmp/unifiedCallhome”
    
    mmsysmoncontrol start
  2. Rerun the esscallhomeconf command.
  3. Restore call home staging area.
    “mv /tmp/unifiedCallhome/* 
    /tmp/mmfs/callhome/incomingFTDC2CallHome”