Troubleshooting by symptom

You might encounter some common problems while using the IBM® Pattern for IBM Spectrum Scale.

Spectrum Scale active node might show as "Passive Node" type after restart

After the restart of Spectrum Scale nodes, the Active primary node might show up as Passive node. This behavior can cause issues while you use the nodes or add new nodes to the cluster.

Symptom: Action such as adding new member on the Passive node type fails.

Resolution: Do the following actions to get the nodes back into Active Primary type:
  1. Check whether the IBM Spectrum Scale daemon service is up and running. If not, then run the following command to start the daemon service:
    su - gpfsprod -c 'sudo /usr/lpp/mmfs/bin/mmstartup'
  2. After the services are started successfully, make sure GPFS filesystem is mounted by using the df-hk or mmlsmount command.

IBM Spectrum Scale Server block volume attachment fails with errors for mmdsh

Symptom: The IBM Spectrum Scale Server block volume attachment fails with the following errors for mmdsh, respectively:
Block volume attachment failed with error : mmdsh: Invalid or missing remote shell command: /usr/bin/sshwrap.pl
Block volume attachment failed with error : mmdsh: Invalid or missing remote shell command: /usr/bin/scpwrap.pl
Resolution: Do the following actions, respectively:
  • Copy sshwrap.pl from /usr/lpp/mmfs/bin and paste it to /usr/bin/.
    cp /usr/lpp/mmfs/bin/sshwrap.pl /usr/bin/
  • Copy scpwrap.pl from /usr/lpp/mmfs/bin and paste it to /usr/bin/.
    cp /usr/lpp/mmfs/bin/scpwrap.pl /usr/bin/

Download of client private key and client key from mirror node might fail

Symptom: The retrieve a client private key or client key from the Retrieve key operation might fail with the following error message:
Retrieve Client Key: The Client key was not found for this configuration.

Resolution: Retrieve the client private key and the client key from the primary node.

Network Shared Disk (NSD) on node goes down after IBM Spectrum Scale auto revert

Symptom: After the IBM Spectrum Scale auto revert, Network Shared Disk (NSD) on the node goes down.

Resolution: Restart NSD on that node by running the following command:
su - gpfsprod -c 'sudo /usr/lpp/mmfs/bin/mmchdisk <Name of file system> start -d <Name of NSD>'

Troubleshooting issues in GPFS/IBM Spectrum Scale pattern type

Sometimes, when you upgrade the GPFS / IBM Spectrum Scale pattern type, the cluster may hang indefinitely awaiting active state of GPFS / IBM Spectrum Scale. The issue may occur whenever you upgrade the Kernel version without changing the versions of other Kernel packages. For more details about Kernel and Kernel packages, see Building IBM Spectrum Scale portability layer after Linux kernel updates. As a resolution, run the following manual steps to recover the cluster from hung state and to start the auto revert:
  1. Compile GPFS portability layer for this kernel version in a different virtual machine by using the steps mentioned in the Building IBM Spectrum Scale portability layer after Linux kernel updates topic.
    Note: In the Building IBM Spectrum Scale portability layer after Linux kernel updates topic, you can skip the sub-steps of step 3 to start the node and check for node active state.
  2. Copy the content that is available in the /lib/modules/<upgraded kernel version>/extra folder from the system where the GPFS portability layer is successful and paste it in the /lib/modules/<upgraded kernel version>/extra folder of the virtual machine where the upgrade failed.
  3. Run the following command to start GPFS:
    su - gpfsprod -c 'sudo /usr/lpp/mmfs/bin/mmstartup’
  4. Run the following command to check whether all the nodes are in active state:
    su - gpfsprod -c  'sudo /usr/lpp/mmfs/bin/mmgetstate -aL

Mixing of IP address formats

Never mix instances of IPv4 and IPv6 IBM Spectrum Scale Pattern deployments, whether they be client deployments, primary, mirror, tiebreaker or passive deployments. This scenario is not supported.

IBM Spectrum Scale Client - mmauth credentials is not deleted from the server

If you delete IBM Spectrum Scale Client when it is in a stopped state, the mmauth credentials are not removed from the server. To do so, delete those credentials manually from the server by using the mmauth command. For more information, see https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adm_mmauth.htm.

IBM Spectrum Scale Client - File set names

Do not include blank spaces in file set names.

IBM Spectrum Scale Client - Link directories

Do not include blank spaces in link directory names.

IBM Spectrum Scale Client - Page pool memory not available

If you see a message saying that the Page Pool Memory can not be obtained in /var/adm/ras/mmfs.logs.latest, this means that the virtual machine does not have enough memory to support IBM Spectrum Scale, and the client pattern most likely will have to allocate more memory to itself in its configuration values. Ensure that the allocated memory is at least 4 Gb.

If your client fails to deploy, run the Status operation. You might see a IBM Spectrum Scale error that prevents the page pool from being allocated. Correct any errors and try the deployment again.

IBM Spectrum Scale Client - File set already exists

Check whether a file set name used by your client deployment already exists. If it does, unintentional file overwriting might occur. Use the Cluster status operation on the server to list the existing file sets.

IBM Spectrum Scale Client - File set quota size is not as expected

If you find that the quota size is not what you expected, use the Cluster status operation on the server to list the existing file sets. If the size is not what the client expects, the reason most likely is that some other client created the file set. If you need a different value, contact the original owner. If a change is agreed to, run the Change File Set operation on the Primary IBM Spectrum Scale instance to change the size of the quota.

IBM Spectrum Scale Client - Quota Size Constraints on the file set are ignored

Only non-root users are affected by the file set quota settings.

IBM Spectrum Scale Client - Connect to server operation fails to update the remote file system information

IBM Spectrum Scale might not be able to determine if the file system is mounted. If the file system is not mounted, the Connect to server operation might fail, resulting in an error message similar to the following example:
Web_Application-was.11406729401441.GPFSClient: Connect to server: Failed to update the remote file system information for kent ['/usr/lpp/mmfs/bin/mmremotefs', 'update', 'kent', '-f', 'kent', '-C', 'testClusterPassive_pass.purescale.raleigh.ibm.com', '-A', 'yes', '-T', '/gpfs/kent']
This error indicates that the remote file system information was not updated successfully. The IBM Spectrum Scale Client trace.log will include multiple instances of this message:
mmremotefs: Command was unable to determine whether file system is mounted.

The IBM Spectrum Scale product documentation notes that when this type of problem occurs, message 6027-1996 is issued with similar wording.

If you encounter this message, perform problem determination, resolve the problem, and reissue the command. If you cannot determine or resolve the problem, you might be able to run the command successfully by first shutting down the IBM Spectrum Scale daemon on all nodes of the cluster (using mmshutdown -a), ensuring that the file system is not mounted.

If you still cannot resolve the problem, complete the following steps:
  1. Log in to the IBM Spectrum Scale Client virtual machine instance.
  2. Navigate to /usr/lpp/mmfs/bin/ and run the mmshutdown -a command.
  3. Run the mmstartup command.
  4. Perform the Connect to server operation again.

IBM Spectrum Scale Client - Connect to server operation fails to unmount the file system

IBM Spectrum Scale might not be able to unmount a file system if the resource is busy. If the file system is not unmounted, the Connect to server operation might fail, resulting in an error message indicating that the device or resource is busy, similar to the following example:
Web_Application-was.11407238943746.GPFSClient: Connect to server: Failed to unmount the testFSys file system ['/usr/lpp/mmfs/bin/mmumount', 'testFSys', '-f'] umount2: Device or resource busy umount: /gpfs/testFSys: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy
The IBM Spectrum Scale Client trace.log will include a message similar to the following example:
umount: /gpfs/testFSys: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy

Refer to the IBM Spectrum Scale Problem Determination Guide for actions to take when the file system will not unmount.

Ensure that all processes finish accessing the file system, then run the Connect to server operation again.

IBM Spectrum Scale Server - Disk volume limit exceeded

Only a maximum of 14 storage volumes can be added to any IBM Spectrum Scale configuration.

IBM Spectrum Scale Server - Disk volume not in list

Ensure that the correct storage volume is attached.

IBM Spectrum Scale Server - sudo error: sorry, you must have a tty to run sudo

Ensure that the requiretty option is disabled on the virtual machine. requiretty is an option in the /etc/sudoers file, which prevents sudo operations from non-TTY sessions. The IBM Spectrum Scale nodes must be able to run sudo commands from scripts.

IBM shared service for IBM Spectrum Scale - Not deployed before IBM Spectrum Scale clients

The IBM shared service for IBM Spectrum Scale must be deployed to a cloud group before deploying any IBM Spectrum Scale Clients to that same cloud group unless you specified a IBM Spectrum Scale server at deployment or through the Connect to server operation for virtual application patterns and virtual system patterns.

Deployment will conclude with an error if a IBM Spectrum Scale Client is deployed to a cloud group which does not have an instance of the IBM shared service for IBM Spectrum Scale deployed unless you specified a IBM Spectrum Scale server at deployment or through the Connect to server operation for virtual application patterns and virtual system patterns.

The GPFS > .../GPFS Pattern/gpfs_logs > gpfs_pattern_install.log displays the following messages indicating that the shared service is not deployed to the cloud group:
[2015-02-26 14:22:23.243192] GPFSAgent - Retrieve Manager Info from shared service
[2015-02-26 14:22:23.705671] Failed to retrieve values from the IBM Shared Service for GPFS. Ensure that the IBM Shared Service for GPFS is deployed in the same cloud group with this deployment. If IBM Shared Service for GPFS is deployed, ensure that the input value is valid.

IBM Spectrum Scale portability failures are not reported promptly on Linux

During the IBM Spectrum Scale installation process, the build of the IBM Spectrum Scale portability layer might fail. You usually encounter IBM Spectrum Scale portability failures if the base image that is used to deploy your IBM Spectrum Scale instance does not have all of the required IBM Spectrum Scale dependencies.

This problem can occur when you are deploying a IBM Spectrum Scale Client, IBM Spectrum Scale Primary or Passive configurations, or when attaching a IBM Spectrum Scale Mirror or Tiebreaker instance to a IBM Spectrum Scale Primary configuration.

When this failure occurs, the error is reported in the IBM Spectrum Scale logs but the execution is not aborted and the installation or add member operation continues, but will eventually fail because IBM Spectrum Scale was not configured properly (due to the portability layer build failure).

To identify any IBM Spectrum Scale portability failures after you deploy your instance or after you add new members to the cluster, ensure that the cluster has been configured properly by running the Get Cluster Status operation and verify that all IBM Spectrum Scale nodes and NSD are reported to be up and running.

To help debug the problem and identify the root cause, open the IBM Spectrum Scale trace log (IWD trace.log for the GPFSMainServer role or GPFSClient role) and search for a Build GPFS portability FAILED message.

Primary instance remains in maintenance mode after an auto-revert operation

Problem: After a primary instance completes an auto-revert operation, it remains in maintenance mode.

Resolution: Manually resume the instance from the Instance management page to bring it to a Running state. You can then do the other IBM Spectrum Scale operations on that primary instance.

Some IBM Spectrum Scale operations show up in languages other than English

Symptom: When you use IBM Spectrum Scale, some operations show up in languages other than English.

Resolution: Set the locale to EN_US to make the operations show up in English language. Use the following commands for the IBM Spectrum Scale Server and manager instances of the IBM Spectrum Scale Server cluster.
  1. Check the language value with the following command.
    bash-4.2# echo $LANG
  2. Check locale on the instance with the following command.
    bash-4.2# locale
  3. Check environment on the instance with the following command.
    bash-4.2# env |grep -e LANG -e LC
  4. Change the locale.
  5. Change the LANG value on the instance with the following command.
    bash-4.2# export LANG=en_US.UTF-8
  6. Change the LC_ALL value on the instance with the following command.
    bash-4.2# export LC_ALL="en_US.UTF-8"
  7. Check the locale.conf file on the instance with the following command. Ensure that the file must have an entry for the en_US locale.
    bash-4.2# cat /etc/locale.conf
  8. Modify the bash_profile name on the instance. Add the following statements at the end of the file to set and export the LANG value on the instances.
    LANG=en_US.UTF-8
    export LANG
  9. Restart the instances.

Client key is not accepted on the IBM Cloud Pak System user interface when installing the IBM Spectrum Scale client on RHEL 8.4

Symptom: On Red Hat® Enterprise Linux (RHEL) 8.4, the client keys are generated as OPENSSH but the IBM Cloud Pak System user interface requires the RSA key.

Resolution: Retrieve the client key from the Manage > Operations page on the IBM Cloud Pak System user interface. Extract and convert it into an RSA key by using the following command:
ssh-keygen -p -m PEM -f <opensshkeyfile>
Provide that converted key file in the IBM Spectrum Scale Manager IP and Client Key field along with the IP address of the manager node.