ESS known issues
Known issues in ESS version 6.0.2.x
For information about ESS 5.3.7.x known issues, see Known issues in ESS 5.3.7.x Quick Deployment Guide.
Issue | Resolution or action |
---|---|
The existence of the xcat repo files (xcat-otherpkgsX) might cause update issues. If a PXE was deployed recently, the xcat-oth\u0002erpkgs{0,1,..X} repository files might exist and subsequently cause issues when you upgrade a node from the container by using the essrun command. The following issue might occur:
Product
|
To fix this issue complete the following steps:
|
Failed to start IBM.ESAGENT subsystem due to wrong
JAVA_HOME value which may cause ESA start to fail. In the following example,
note how Java™ is pointing to the wrong location. This causes
the ESA startup to fail:
Product
|
To fix the issue, remove the current java symbolic link and update the
java pointer, then retry ESA activation.
|
The hardware CPU validation GPFS callback is only active for one node in the cluster.
This callback prevents GPFS from starting if a CPU socket is missing. Product
|
No action is required. |
During rolling upgrade, mmhealth might show the error
local_exported_ fs_unavail even though the file system is still
mounted. Product
|
During a rolling upgrade (Updating of one ESS I/O node at a time but maintaining quorum), mmhealth might display an error indicating that the local exported file system is unavailable. This message is erroneous.
|
During upgrade, if the container had an unintended loss of connection with the target
canister(s), there might be a timeout of up to 2 hours in the Ansible® update task. Product
|
Wait for the timeout and retry the essrun update task. |
During storage MES upgrade you are required to update the drive firmware to complete the
task. Some of the drives may not update on the first pass of running the command. Product
|
Rerun the mmchfirmware --type drive command again which should resolve the issue and update the remaining drives. |
When running essrun commands, you might see messages such as
these:
Product
|
This is a restriction in the Ansible timestamp
module. It shows timestamps even for the “skipped” tasks. If you want to remove timestamps from the
output, change the ansible.cfg file inside the container as follows:
|
When running the essrun config load command, you might see a failure
such as this:
Product
|
This failure means that the pems module is not running the canister. For fixing this, do the
following:
|
Running essrun -N node1,node2,… config load command with high-speed
names causes issues with the upgrade task using the -G flag.Product
|
The essrun config load command is an Ansible wrapper that attempts to discover the ESS 3000 canister node positions,
place them into groups, and fix the SSH keys between the servers. This command must always be run
using the low-speed or management names. You must not use the high-speed names with this command.
This command should always be run using the low-speed or management names. For example:essrun -N ess3k1a,ess3k1b config load If you run this command using the high-speed or cluster names, this might result in issues when performing the update task. Example of what not to do: essrun -N ess3k1a-hs,ess3k1b-hs config load To confirm that the config run is set up correctly, use the lsdef command. This command returns only the low-speed or management names defined in /etc/hosts. |
After reboot of an ESS 5000 node, systemd could be loaded incorrectly. Users might see
the following error when trying to start
GPFS:
Product
|
Power off the system and then
power it on again.
|
In ESS 5000 SLx series, after pulling a hard drive out for a long time wherein the drive
has finished draining, when you re-insert the drive, the drive could not be
recovered. Product
|
Run the following command from EMS or IO node to revive the
drive:
Where RGName is the recovery group that the drive belongs to and PdiskName is the drive's pdisk name. |
After the deployment is complete, if firmware on the enclosure, drive, or HBA adapter does
not match the expected level, and if you run essinstallcheck, the following
mmvdisk settings related error message is
displayed:
Product
|
The error about mmvdisk settings can be ignored. The resolution is to update the mismatched firmware levels on enclosure, adapter, or HBA adapters to the correct levels. You can run the mmvdisk configuration check command to confirm. The mmvdisk settings do not match best practices. Run the mmvdisk server configure --verify --node-class <nodeclass> command. List the mmvdisk node classes: mmvdisk nc
list
Note: essinstallcheck detects inconsistencies from
mmvdisk best practices for all node classes in the cluster and stops immediately
if an issue is found.
|
When running essinstallcheck you might see an error message similar to:
Product
|
Run vpdupdate on each IO node. Rerun essinstallcheck which should properly query the firmware level. |
When running the essrun - N Node
healthcheck command, the essinstallcheck script
might fail due to incorrect error verification which might lead to an impression that there is a
problem where there is none.
Command:
Product
|
This health check command (essrun - N Node
healthcheck) is removed from the ESS documentation and it is advised to
use the manual commands to verify system health after deployment. Run the following commands for
health check:
|
During command-less disk replacement, there is a limit on how many disks can be replaced at
one time. Product
|
For command-less disk replacement using commands, only replace up to 2 disks at a time. If command-less disk replacement is enabled, and more than 2 disks are replaceable, replace the 1st 2 disks, and then use the commands to replace the 3rd and subsequent disks. |
Issue reported with command-less disk replacement warning LEDs. Product
|
The replaceable disk will have the amber led on, but not blinking. Disk replacement should still succeed. |
After upgrading an ESS 3000 node to version 6.0.2.6, the pmsensors service needs
to be manually started.Product
|
After the ESS 3000 upgrade is complete, the pmsensors service does not
automatically start. You must manually start the service for performance monitoring to be restored.
On each ESS 3000 canister, run the following
command:
For checking the
status of the service, run the following command:
|
ESS commands such as essstoragequickcheck,
essinstallcheck must be run using -N localhost. If using the
hostname such as -N ess3k1a, an error occurs. Product
|
There is currently an issue with running the ESS deployment commands by using the hostname
of a node. The workaround is to run checks locally on each node by using localhost. For example,
instead of using essstoragequickcheck -N ess3k1a, use the following command:
|
Hyperthreading might be enabled on an ESS 3000 system due to an incorrect kernel grub flag
being set. Product
|
Hyperthreading needs to be disabled on ESS 3000 systems. This is ensured in following ways:
Note: After you made the change, re-boot the node
|
The ESS 3000 container contains the rhels8.2-ppc64le-install-ces
image. However, the pxe cannot be installed by using it because it is not creating repo in the
container. An example is as follows:
Product
|
This issue has been resolved in 6.1.1.1 build. |
P8 protocol node update is not supported. Product
|
This issue has been resolved in 6.1.1.1 build. |
With ESS 5000 container, P9 IO node, PXE install is not supported. Product
|
This issue has been resolved in 6.1.1.1 build. |
For ESS 3000 container on P9 EMS node, PXE install is not supported on P9 protocol node. Product
|
This issue has been resolved in 6.1.1.1 build. |
In an existing cluster with quorum nodes not exceeding 7 nodes, addition of new nodes will fail irrespective of the firmware level. Product
|
This is not considered a problem, thus, no workaround is needed. |