Known issues
This topic describes known issues for ESS.
ESS 5.3.1.1 issues
Issue | Environment affected | Description | Resolution or action |
---|---|---|---|
The gssgennetworks script requires high-speed host names to be derived from I/O server (xCAT) host names using suffix, prefix, or both. | High-speed network generation Type: Install Version: All Arch: All Affected nodes: I/O server and EMS nodes |
gssgennetworks requires that the target host name provided in -N or -G
option are reachable to create the high-speed network on the target node. If the xCAT node name does
not contain the same base name as the high-speed name you might be affected by this issue. A typical
deployment scenario is: gssio1 // xCAT name gssio1-hs // high-speed An Issue scenario is: gssio1 // xCAT name foo1abc-hs // high-speed name |
Create entries in the /etc/hosts with node names that are reachable over
the management network such that the high-speed host names can be derived from it using some
combination of suffix and/or prefix. For example, if the high-speed host names are foo1abc-hs,
goo1abc-hs:
// Before <IP><Long Name><Short Name> 192.168.40.21 gssio1.gpfs.net gssio1 192.168.40.22 gssio2.gpfs.net gssio2 X.X.X.X foo1abc-hs.gpfs.net foo1abc-hs X.X.X.Y goo1abc-hs.gpfs.net goo1abc-hs // Fix 192.168.40.21 gssio1.gpfs.net gssio1 foo1 192.168.40.22 gssio2.gpfs.net gssio2 goo1 X.X.X.X foo1abc-hs.gpfs.net foo1abc-hs X.X.X.Y goo1abc-hs.gpfs.net goo1abc-hs gssgennetworks -N foo1, goo1 --suffix=abc-hs --create-bond |
Running gssutils over PuTTY might shows horizontal lines as “qqq” and vertical lines as “xxx”. | ESS Install and Deployment Toolkit Type: Install or Upgrade Version: All Arch: All Affected Nodes: EMS and I/O server nodes |
PuTTY translation default Remote Character set UTF-8 might not translate horizontal line and vertical character sets correctly. | 1. On the PuTTY terminal Window > Translation, change Remote character set from UTF-8 to
ISO-8859-1:1998 (Latin-1, West Europe) (this should be the first option after UTF-8). 2. Open session. |
gssinstallcheck might flag an error regarding page pool size in multi-building block situations if the physical memory sizes differ. | Software Validation Type: Install or Upgrade Arch: Big Endian or Little Endian Version: All Affected nodes: I/O server nodes |
gssinstallcheck is a tool introduced in ESS 3.5, that helps validate software,
firmware, and configuration settings. If adding (or installing) building blocks of a different
memory footprint installcheck will flag this as an error. Best practice states that your I/O servers
must all have the same memory footprint, thus pagepool value. Page pool is currently set at ~60% of
physical memory per I/O server node. Example from gssinstallcheck: [ERROR] pagepool: found 142807662592 expected range 147028338278 - 179529339371 |
1. Confirm each I/O server node's individual memory footprint. From the EMS, run the following command against your I/O xCAT group: xdsh gss_ppc64 "cat/ proc/meminfo | grep MemTotal" Note: This value is in KB.
If the physical memory varies between servers and/or building blocks, consider adding memory and re-calculating pagepool to ensure consistency. 2. Validate the pagepool settings in IBM Spectrum Scale™: mmlsconfig | grep -A 1 pagepool Note: This value is in MB.
If the pagepool value setting is not roughly ~60% of physical
memory, then you must consider recalculating and setting an updated value. For information about how
to update the pagepool value, see IBM Spectrum
Scale documentation on IBM® Knowledge
Center. |
The GUI might display the long-waiters warning: Spectrum Scale long-waiters monitoring returned unknown result | GUI Type: Upgrade Arch: Big Endian Version: All Affected nodes: All |
Upon new installs (or upgrades) to ESS 5.3.1, the GUI might show an error due to a bad return code from mmhealth in its querying of long-waiters information. /usr/lpp/mmfs/bin/mmdiag --deadlock Failed to connect to file system daemon: No such process RC=50 |
There is no current workaround but it is advised to verify on the command line that no long-waiters exist. If the system is free from this symptom, mark the event as read on the GUI by clicking under the Action column. Doing so will clear the event. |
Creating small file systems in the GUI (below 16G) will result in incorrect sizes | GUI Type: Install or Upgrade Arch: Big Endian or Little Endian Version: All Affected nodes: All |
When creating file systems in the GUI smaller than 16GB (usually done to create CES_ROOT for protocol nodes) the size will come out larger than expected. |
There is currently no resolution. The smallest size you might be able to create is 16GB. Experienced users might consider creating a customer vdisk.stanza file for specific sizes you require. You can try one of the following workarounds:
|
Creating file systems in the GUI might immediately result in lack of capacity data | GUI Type: Install or Upgrade Arch: Big Endian or Little Endian Version: All Affected nodes: All |
When creating file systems in the GUI you might not immediately see the capacity data show up. | You may wait up to 24 hours for the capacity data to display or simply use the command line which should accurately show the file system size. |
The GUI might show ‘unknown’ hardware states for storage enclosures and Power® 8 servers in the ESS building block. Part info and firmware levels under the Hardware Details panel might also be missing. Upon adding ESS PPC64LE building-blocks to an existing PPC64BE environment, you might encounter this same issue. |
GUI Type: Upgrade Arch: Big Endian Version: All Affected nodes: All |
The ESS GUI (running on the EMS) might show ‘unknown’ under the Hardware panel for the ESS
building block members. The ESS GUI might also be missing information under Part Info and Firmware version within the Hardware Details panel. |
The workaround for this issue is the following:
Where CLUSTER is either the cluster name or the cluster ID that can be determined by using the mmlscluster command. After running, the GUI should refresh with the issues resolved. Note: If this issue is
encountered on adding ESS PPC64LE building-blocks to an existing PPC64BE environment, there is no
current workaround as the GUI does not support multiple xCATs.
|
Canceling disk replacement through GUI leaves original disk in unusable state | GUI Type: Install or Upgrade Arch: Big Endian or Little Endian Version: All Affected nodes: I/O server nodes |
Canceling a disk replacement can lead to an unstable system state and must not be performed. However, if you did this operation, use the provided workaround. | Do not cancel disk replacement from the GUI. However, if you did, then use the following
command to recover the disk took state: mmchpdisk <RG> --pdisk <pdisk> --resume |
Upon upgrades to ESS 5.3.1x, you might notice missing groups and users in the | GUI panelGUI Type: Upgrade Arch: All Version: All Affected nodes: N/A |
You might notice one or more missing pools or users after upgrading to ESS 5.3.1.x in the You may also see missing capacity and throughput data under the GUI panel. panel. |
There is currently no resolution or workaround. Try waiting 24 hours for the GUI to refresh. You can also try clicking Refresh. |
Upon upgrades to ESS 5.3.1.1, you might see several Mellanox OFED weak-updates and unknown symbols messages on the console during gss_updatenode. | OFED Type: Upgrade Arch: Big Endian and Little Endian Version: All Affected nodes: N/A |
When building the new OFED driver against the kernel, you might see many messages such as weak-updates and unknown symbols. | There is currently no resolution or workaround. These messages can be ignored. |
During firmware upgrades on PPC64LE, update_flash might show the following
warning: Unit kexec.service could not be found. |
Firmware Type: Installation or Upgrade Arch: Little Endian Version: All Affected nodes: N/A |
This warning can be ignored. | |
Setting target node names within gssutils might not persist for all panels. The default host names, such as ems1 might still show. | Deployment Type: Install or Upgrade Arch: Big Endian or Little Endian Version: All Affected nodes: All |
gssutils allows users to conveniently deploy, upgrade, or manage systems within a GUI-like interface. If you run gssutils –N NODE, it must store that node name and use it throughout the menu system. There is a bug that might prevent this from working as designed. | Use on one of the following resolutions:
|
The GUI wizard might fail due to an error when issuing mmaddcomp. | GUI Type: Install Arch: Big Endian or Little Endian Version: All Affected nodes: N/A |
During the GUI wizard setup users might hit an error similar to the following. ERROR: column name is not unique | Run the final wizard setup step again. After doing this, the error does not occur and you can proceed to the GUI login. |
The GUI does not display the firmware levels for drives. | GUI Type: Upgrade Arch: Big Endian Version: All |
This behavior is seen during upgrade. | Use the mmlsfirmware command to view this information. |
The 1Gb links show as unknown or unhealthy. | GUI Type: Install and Upgrade Arch: Big Endian or Little Endian Version: All |
This behavior is seen during installation or upgrade. | mmhealth does not monitor the health state of IP interfaces that are not used by IBM Spectrum Scale. These are the IP interfaces that have the value None in the grid column Networks. |
The mmhealth command shows the status as degraded for an empty slot. (DCS3700 only – 5U84) | GUI / mmhealth Type: Install and Upgrade Arch: Big Endian or Little Endian Version: All |
The handling of mmlsfirmware now marks empty slots. (DCS3700 only –
5U84) For example: Running mmlsfirmware --serial-number enclosure_serial results in: drive EMPTYSLOT <enclosure serial> not_available not_available is marked for slots that have no drive inserted, by design. |
Currently there is no workaround for this issue. It is limited to DCS3700 – 5U84 enclosures. |
The md5sum command works only under a folder where binaries are available. | Type: Install and Upgrade Arch: Big Endian or Little Endian Version: All |
Upon running this command: md5sum -c /home/deploy/gss_install-5.3.1.1_ppc64le_ datamanagement_ 20180617T125746Z.md5 The following error occurs: md5sum: gss_install-5.3.1.1_ppc64le_ datamanagement_ 20180617T125746Z: No such file or directory gss_install-5.3.1.1_ppc64le_ datamanagement_ 20180617T125746Z: FAILED open or read md5sum: WARNING: 1 listed file could not be read Press Enter to continue... |
The md5sum -c command must be ran from CLI mode and from the folder in
which the binary resides. For example: md5sum -c /home/deploy/gss_install-5.3.1.1_ppc64le_ datamanagement_ 20180617T125746Z.md5 |
Infiniband with multiple fabric is not supported. | Type: Install and Upgrade Arch: Big Endian or Little Endian Version: All |
In a multiple fabric network, the Infiniband Fabric ID might not be properly appended in the verbsPorts configuration statement during the cluster creation. Incorrect verbsPort setting might cause the outage of the IB network. | It is advised to do the following to ensure that the verbsPorts
setting is accurate:
For
example:
In
this example, the adapter mlx5_0, port 1 is connected to fabric 4 and the adapter
mlx5_1 port 1 is connected to fabric 7. Now, run the following command and ensure
that verbsPorts settings are correctly configured to the GPFS
cluster.
Here, it
can be seen that the fabric has not been configured even though IB was configured with multiple
fabric. This is a known issue.Now using mmchconfig, modify the verbsPorts setting for each node or
node class to take the subnet into account. Here, the node can be any GPFS node or node class. Once the verbsPorts
setting is changed, make sure that the new, correct verbsPorts setting is listed in
the output of the mmlsconfig
command.
|
During an ESS upgrade, part information and firmware levels under the Hardware Details might be missing. | GUI Type: Upgrade Arch: Big Endian or Little Endian Version: All Affected nodes: N/A |
The ESS GUI might be missing information under Part Info and Firmware version within the Hardware Details panel. | There are two workarounds:
Where CLUSTER is either the cluster name or the cluster ID that can be determined by using the mmlscluster command. After running these tasks, the GUI should refresh with the issues resolved. |
During file system creation in the ESS GUI, several inputs are ignored under Configure Properties. | GUI Arch: Big Endian or Little Endian Version: 5.3.1.x Affected nodes: N/A |
When creating file systems in the ESS GUI, there are several properties that can be set under
Configure Properties. Some of those values are:
The GUI ignores these input fields and instead just passes only default values to the mmcrfs command. |
You can use the following workarounds:
|
ESS GUI System Setup wizard fails on the Verify Installation screen in the IBM Spectrum Scale active check. | Type: GUI Arch: Big Endian or Little Endian Version: 5.3.1.1 Affected nodes: N/A |
ESS System Setup wizard fails on the Verify Installation screen. The displayed error message is: Health monitoring is not active on ‘X’ nodes. Run ‘mmhealth node show GPFS -N all’ command to check why mmhealth does not provide health information for those nodes. |
Click Verify again. The error should clear after that. |
Syslog /var/log/messages is not properly redirecting to the EMS node. The log only shows up on each node locally. | Type: RAS Arch: Big Endian or Little Endian Version: ESS 5.3.1.x Affected Nodes: N/A |
There is an issue with the rsyslog daemon redirecting /var/log/messages to the EMS node. In an ESS environment, typically syslog is centralized on the EMS for all nodes. | There is currently no workaround for this issue. Note: There is no centralization of syslog in
this release (probably due to a RHEL bug). When gathering debug data, you need to log in to each
node individually to access /var/log/messages to investigate system level
issues.
|
gssupg531.sh might fail and gssinstallcheck might show errors regarding the GPFS best practice settings. | Type: Upgrade Arch: Both Version: ESS 5.3.1.x Affected Nodes: Server |
During upgrades, gssupg531.sh might fail due to the need for maxblocksize to be set independently or to be set while GPFS is down across the cluster. This prevents a successful upgrade of the GPFS best practice settings resulting in gssinstallcheck failure while checking them. | Manually set the GPFS best practice settings using gssServerConfig.sh and gssServerConfig531.sh. You need to run these without maxblocksize being set. |