GDPC and IBM Spectrum Scale replication FAQ
This FAQ provides you with answers to common questions about geographically dispersed Db2® pureScale® cluster (GDPC) and IBM Spectrum Scale replication environment problems.
What do I do when I cannot bring the disks online after a storage failure on one site was rectified?
If nodes come online before the storage device, you must ensure that the disk configurations are defined and available before you try to restart the failed disk. If the Device and DevType fields are marked by a - when you try to list the network shared disk (NSD) using themmlsnsd -X
command, you
must ensure that the device configurations for the disk services are
available before attempting to restart the disks. Please consult the
operating system and device driver manuals for the exact steps to
configure the device. On AIX® platforms,
you can run the cfgmgr command to automatically
configure devices that have been added since the system was last rebooted.What do I do if a computer's IP address, used for the IB interface, cannot be pinged after a reboot?
Ensure the InfiniBand (IB) related devices are available:root> lsdev -C | grep ib
ib0 Available IP over InfiniBand Network Interface
iba0 Available InfiniBand host channel adapter
If the devices are not available, bring them online with chdev:chdev -l ib0 -a state=up
Ensure
that the ib0, icm and iba0 properties
are set correctly, that ib0 references an IB adapter
such as iba0, and that properties are persistent
across reboots. Use the -P option of chdev to
make changes persistent across reboots.What do I do if access to the IBM Spectrum Scale file systems hangs for a long time on a storage controller failure?
Ensure the device driver parameters are set properly on each machine in the cluster.What do I do if the cluster comes down following a site failure?
Check the system logs on the surviving site to see if IBM® Spectrum Scale has triggered a kernel panic due to outstanding I/O requests:GPFS Deadman Switch timer has expired and there are still outstanding I/O requests
If
this is the case, then ensure that the device driver parameters have
been properly setWhat happens when one site loses Ethernet connectivity and the LPARs on that site are expelled from the IBM Spectrum Scale cluster?
If the IBM Spectrum Scale cluster manager is on the tiebreaker site this behavior is expected, as the cluster manager does not have IB or Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE) connectivity and can no longer talk to the site which has lost Ethernet connectivity. If the IBM Spectrum Scale cluster manager is not on the tiebreaker, but is on the site that retains Ethernet connectivity then ensure that the tiebreaker site is a IBM Spectrum Scale quorum-client, not a quorum-manager, as per the mmaddnode command. If the tiebreaker host is a quorum-manager its status can be changed to client with the/usr/lpp/mmfs/bin/mmchnode -–client
-N hostT
command. The status of the nodes as managers or
clients can be verified with the /usr/lpp/mmfs/bin/mmlscluster
command.
Also ensure that the IBM Spectrum Scale subnets
parameter has been set properly to refer to the IP subnet that uses
the IB or RoCE interfaces. The /usr/lpp/mmfs/bin/mmlsconfig
command
can be used to verify that the subnets parameter has been set correctly. What do I do when one site loses ethernet connectivity and the members on that site remain stuck in STOPPED state instead of doing a restart light and going to WAITING_FOR_FAILBACK state?
Ensure that LSR has been disabled.How can I remove unused IBM Spectrum Scale Network Shared Disks (NSD)?
Scenario that can lead to the need to manually remove unused NSDs:- User-driven or abnormal termination of db2cluster CREATE FILESYSTEM or ADD DISK command.
- The unused NSDs were created manually at some point earlier but left in the system.
Note: Run all the following commands on
the same host.
- Run mmlsnsd -XF to list the free NSD and its
corresponding device name.
root@coralpib21a:/> mmlsnsd -XF Disk name NSD volume ID Device Devtype Node name Remarks --------------------------------------------------------------------------------------------------- gpfs2118nsd 09170151FFFFD473 /dev/hdisk7 hdisk coralpib21a.torolab.ibm.com
- Find the NSD name that matches the target device to be removed.
gpfs2118nsd
- Run mmdelnsd <NSD name> to remove the desired
unused NSD.
root@coralpib21a:/> mmdelnsd gpfs2118nsd mmdelnsd: Processing disk gpfs2118nsd mmdelnsd: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process.