Installation and upgrade related information and checklists
Review the following installation and upgrade related information before starting with the installation or the upgrade of Elastic Storage Server (ESS).
- New features and enhancements
- Component versions for this release
- Supported editions on each architecture
- ESS best practices and support statements
- Obtaining the required Red Hat Enterprise Linux and ESS code
- Support for signed RPMs
- Supported upgrade paths
- Mixed environment recommendations
- ESS 3000 considerations (POWER8 EMS)
- Security law changes
- Support for hardware call home
- Pre-installation or pre-upgrade checklist
- Post-installation or post-upgrade checklist
- Other topics
- Sample installation and upgrade flow
New features and enhancements
Release | Changes |
---|---|
5.3.7.6 |
|
![]() ![]() |
![]()
![]() |
5.3.7.4 |
|
5.3.7.3 |
|
ESS 5.3.7.2 |
|
![]() ![]() |
![]()
![]() |
![]() ![]() |
![]()
![]() |
ESS 5.3.6.2 |
|
ESS 5.3.6.1 |
|
ESS 5.3.6 |
|
ESS 5.3.5.2 |
|
ESS 5.3.5.1 |
|
ESS 5.3.5 |
|
ESS 5.3.4.2 |
|
ESS 5.3.4.1 |
|
ESS 5.3.4 |
|
Component versions for this release
- Supported architecture: PPC64LE
IBM Spectrum Scale: 5.0.5.14
xCAT: 2.16.3
- HMC: V9R2M951
System firmware: SV860_240 (FW860.B0)
Red Hat Enterprise Linux: 7.9 (PPC64LE)
Kernel: 3.10.0-1160.62.1.el7.ppc64le
Systemd: 219-78.el7_9.5
- Network Manager: 1.18.8-2.el7_9
mpt3sas: 34.00.00.00
mpt2sas = 20.00.04.00
IPR: 19512c00
SAS adapter driver: 16.00.11.00
- Support RPM:
- gpfs.gnr.support-ess3000-1.0.0-3.noarch.rpm
- gpfs.gnr.support-essbase-1.0.0-3.noarch.rpm
- gpfs.gnr.support-ess5000-1.0.0-3.noarch.rpm
- gpfs.gnr.support-ess3200-1.0.0-2.noarch.rpm
Firmware RPM: gpfs.ess.firmware-6.0.0-23.ppc64le.rpm
ESA: 4.5.7-0
- Enclosure firmware:
- PPC64LE:
- 2U24 = 4230
- 5U84 = 4087
- 4U106 = 5266
- PPC64LE:
OFED: MLNX_OFED_LINUX-4.9-4.1.7.2
OFED firmware levels:
- MT27500 = 10.16.1200
- MT4099 = 2.42.5000
- MT26448 = 2.9.1326
- MT4103 = 2.42.5000
- MT4113 = 10.16.1200
- MT4115 = 12.28.2006
- MT4117 = 14.31.1014
- MT4119 = 16.31.1014
- MT4120 = 16.31.1014
- MT4121 = 16.31.1014
- MT4122 = 16.31.1014
Supported editions on each architecture
The following ESS edition is supported on the available architecture.- Data Access Edition
- Data Management Edition
ESS best practices and support statements
If you are running a stretch cluster, you must ensure that each node has a unique
hostid
. Thehostid
might be non-unique if the same IP addresses and host names are being used on both sides of the stretch cluster. Run gnrhealthcheck before creating recovery groups when adding nodes in a stretch cluster environment. You can manually check thehostid
on all nodes as follows:mmdsh -N { NodeClass | CommaSeparatedListofNodes } hostid
If
hostid
on any node is not unique, you must fix by running genhostid. These steps must be done when creating a recovery group in a stretch cluster.versionlocks
are not enabled in ESS. In the past,versionlocks
were used to protect against unwarranted kernel and OFED updates. Althoughversionlocks
no longer exist, the same rules apply regarding ESS packages verified by gssinstallcheck or gssinstall. You must get specific approval and guidance from the L2 Service to make any changes to an ESS configuration. The only exceptions are RHEL packages in addition to kernel, systemd, and network manager packages. RHEL packages other than the mentioned packages are customer responsibility and they might be updated for security purposes.- If you are upgrading to ESS 5.3.7.6, you must convert the environment to mmvdisk after the upgrade is completed.
- It is advised that you set autoload to on to enable GPFS to recover automatically in case of a daemon problem. Deployment
automatically enables this on new installations but you should disable
autoload
for an upgrade and re-enable it after an upgrade.To disable, issue the following command:mmchconfig autoload=no
Once the maintenance operation or upgrade is complete, re-enable autoload.mmchconfig autoload=yes
- Do not mount the file system on the ESS I/O server nodes.
- It is advised that you disable automount for file systems when performing an upgrade
to ESS 5.3.1 or
later.
mmchfs Device -A no
Device is the device name of the file system.
Automount should automatically be disabled when creating new file systems with gssgenvdisks.Remember: Mount the file system only on the EMS node where the GUI and the PM collector run. - Do not configure more than 5 failure groups in a single file system.
- Consider moving all supported Infiniband devices to the Datagram mode (CONNECTED_MODE=no). For more information, see ESS networking considerations.
- Running any additional service or protocols on any ESS node is not supported. This includes installing any additional RPMs, running any protocols (or any other type of service), or mounting the file system on any ESS I/O server node. This also applies to the EMS node, although you must mount the file system to support the IBM Spectrum Scale GUI.
- RoCE (RDMA over Ethernet) is not supported in ESS.
- Consider moving quorum, cluster, and file system management responsibilities from the ESS nodes to other server license nodes within the cluster.
- It is not required, though highly recommended, that the code levels match during a building block addition. Be mindful of changing the release and file system format in mixed IBM Spectrum Scale environments.
- You must take down the GPFS cluster to run firmware updates in parallel.
- Do not independently update IBM Spectrum Scale (or any component) on any ESS node unless specifically advised from the L2 service. Normally this is only needed to resolve an issue. Under normal scenarios it is advised to only upgrade in our tested bundles.
- It is acceptable for LBS or customers to update any security errata available from Red Hat Network (RHN). Only components checked and protected by ESS (for example, kernel, network manager, systemd) must not be modified unless advised by the IBM® service. For more information on applying security erratas see https://access.redhat.com/solutions/10021
- Client node deployment is not supported from the ESS management node.
- You must deploy or add building blocks from an EMS with the same architecture. There must be a dedicated EMS for the architecture (PPC64LE).
- If running in a mixed architecture environment, the GUI and PM collector are recommended to run on the PPC64LE EMS node.
- Modifying any ESS nodes as a proxy server is not supported.
- Multiple building blocks are ideal as ESS now by default uses file system level metadata replication. If a single building block is used, by default gssgenvdisks uses one failure group and only IBM Spectrum Scale RAID level metadata replication.
- It is recommended to use the highest available block size when creating vdisks or NSDs. The default block size is 16M (current maximum). If the customer primarily generates many tiny files (metadata heavy), consider splitting metadata and data NSDs and using smaller block sizes.
- It is recommended that all nodes in a cluster run the same version of Mellanox OFED.
- Automatic EMS failover is not supported. For help in setting up a redundant, standby EMS, contact the L2 service.
- 4K MTU (InfiniBand) and 9000 MTU (Ethernet) are recommended. Changing to these MTU values requires associated switch-side changes.
- Stretch clusters are supported in various configurations. Contact development or service for guidance.
- If using a PPC64LE building block (8247), note that HMC is not used in that configuration. HMC is applicable for PPC64BE only.
- Connect X-2 (ConnectX-EN) adapters are still supported by ESS.
Obtaining the required Red Hat Enterprise Linux and ESS code
Red Hat Enterprise Linux 7.9 ISO
9893045dbb02ed9439bf571d24202935d37e74de74a07aa65a2827c3bc193335 rhels-7.9-server-ppc64le.iso
- Network manager version :
1.18.8-2.el7_9
5e3b54a031f0bd9f04c89ebfbfe722c797ba7175c9deeaa6bf523a8192ea51e2 netmgr_5376_LE.tgz
Systemd version: 219-78.el7_9.5
14f289572e7f7d35fd8452d4e6eff83918a8c260d9ad9b63d474aabe53b59883 systemd_5376_LE.tgz
Kernel version: 3.10.0-1160.62.1
af8eba6b1588e3dc1d2e1e424351fc102590f24a0fc7d396c9381eaf66f8d909 kernel_5376_LE.tgz
- Power 8 OPAL
patch
ea9c602234f446f009009eaba9634f40750da4071523d0e2c59dc646a35a2766 opal-patch-le.tar.gz
On ESS 5.3.7.6 systems shipped from manufacturing, these items can be found on the management server node in the /home/deploy directory.
ESS_DA_BASEIMAGE-5.3.7.6-ppc64le-Linux.tgz
ESS_DM_BASEIMAGE-5.3.7.6-ppc64le-Linux.tgz
ESS 5.3.7.6 can be downloaded from IBM FixCentral.
tar -zxvf ESS_DA_BASEIMAGE-5.3.7.6-ppc64le-Linux.tgz
For example, from
the BASEIMAGE tar file, files such as the following get extracted with the preceding command: - ESS_5.3.7.6_ppc64le_Release_note_Data_Access.txt: This file contains the release notes for the latest code.
- gss_install-5.3.7.6_ppc64le_dataaccess_20220503T134712Z.tgz: This .tgz file contains the ESS code.
- gss_install-5.3.7.6_ppc64le_dataaccess_20220503T134712Z.sha256sum: This .sha256 file to check the integrity of the tgz file.
Support for signed RPMs
ESS or IBM Spectrum Scale RPMs are signed by IBM.
-rw-r-xr-x 1 root root 907 Dec 1 07:45 SpectrumScale_public_key.pgp
- Import the PGP
key.
rpm --import /opt/ibm/gss/tools/conf/SpectrumScale_public_key.pgp
- Verify the RPM.
rpm -K RPMFile
Supported upgrade paths
- RHEL 7.5 -> RHEL 7.6 upgrade can be done in one hop
Mixed environment recommendations
- Nodes within a building block must be at the same levels.
- Nodes between building blocks should not be greater than N-2 (OFED 4.4 and OFED 4.6, for example).
ESS 3000 considerations (POWER8 EMS)
- If your system came racked with EMS but no ESS 3000 (Any supported legacy node and an EMS node
are in the order):
Use the installation flow in this document (gsschenv)
- If any other configuration but no ESS 3000:
Refer to the Legacy deployment instructions.
- If your system comes in any configuration with ESS 3000, refer to the IBM ESS 3000 Version 6.0.1.x documentation.
- Your EMS node must be at version 5.3.5 or later, with podman installed and C10-T2 connection, to support ESS 3000
- You do not have to upgrade to ESS 5.3.6 to support ESS 3000.
- If EMS + ESS 3000:
The minimum configuration is EMS at version 5.3.5 with podman + C10-T2 connection + IBM Spectrum Scale 5.0.5.1 (Updated from container).
- If EMS + ESS 3000 + ESS:
It is advised to upgrade the EMS and ESS first to version 5.3.7.6. This is not a hard requirement, but the version must be 5.3.5 or later.
Security law changes
- New systems and switches shipped from manufacturing now have either an expired password or one set to the serial number of the component.
- You must take input from the customer before deployment starts and change the desired passwords.
- The default root password for the OS is
ibmesscluster
. You are required to change it upon first login. This password must be set the same on each node. - The default ASMI passwords (login, IPMI, HMC, etc.) are set to the serial number of the server. IPMI must be the same on each node.
- If the 1Gb Cumulus switch is shipped racked, the default password is the serial
number (S11 number - label found on the back of the switch). If the switch is shipped unracked, you
are required to set the password upon first login. The default password is
CumulusLinux!
but you will be prompted to change the password upon first login. If you have any issues logging in or you need help in setting up a VLAN with the switch, consult this documentation link. - You must set all required passwords before the deployment begins.
Support for hardware call home
PPC64LE | |
Call home when disk needs to be replaced | Supported |
Enclosure call home | Unsupported |
Server call home | Supported |
Pre-installation or pre-upgrade checklist
Obtain the kernel, systemd, network manager, RHEL ISO, Power8 OPAL patch
(Provided by ESS development or L2 | Service), and ESS tarball (FixCentral). Verify that the
checksum match with what is listed in this document. Also, ensure that you have the correct
architecture packages (PPC64LE). To obtain these items from the internal folder, you must be an IBM employee. Business partners or customers cannot access this data directly. |
|
Ensure that you read all the information in the ESS Quick Deployment Guide. Make sure that you have the latest copy from the IBM Documentation and the version matches accordingly. You should also refer to the related ESS 5.3.7 documentation. | |
Obtain the customer RHEL license. | |
Contact the local SSR and ensure that all hardware checks have been completed. Make sure all hardware found to have any issues has been replaced. | |
If the 1Gb switch is not included in the order, contact the local network administrator to ensure isolated xCAT and FSP VLANs are in place. | |
Develop an inventory and plan for how to upgrade, install, or tune the client nodes. | |
Consider talking to the local network administrator regarding ESS switch best practices, especially the prospect of upgrading the high-speed switch firmware at some point prior to moving the system into production, or before an upgrade is complete. For more information, see Customer networking considerations. | |
Review Elastic Storage Server: Command Reference. | |
Review ESS FAQ and ESS best practices. | |
Review the ESS 5.3.7 known issues. | |
Ensure that all client node levels are compatible with the ESS version. If needed, prepare to update the client node software on site and possibly other items such as the kernel and the network firmware or driver. | |
Power down the storage enclosures, or remove the SAS cables, until the gssdeploy -x operation is completed. Note: You would only use gssdeploy -x if the legacy installation sequence is
used.
|
|
If installing or upgrading protocol nodes, carefully review Elastic Storage Server: Protocol Nodes Quick Deployment Guide. | |
Carefully study the network diagram for the architecture used. For more information, see ESS networking considerations and 5148-22L protocol node diagrams in Elastic Storage Server: Protocol Nodes Quick Deployment Guide. | |
It is recommended to use a larger block size with IBM Spectrum Scale 5.0.0 or later, even for small I/O tasks. Consult the documentation carefully. | |
Ensure that the correct edition of ESS is to be deployed. For example, do not install the Data Management Edition if Data Access Edition is on the order. This must be verified even before Plug-N-Play is attempted. | |
Determine the supported high-speed switch bonding mode and, if Infiniband, determine which MTU will be used. The default MTU is now 2048 but it can be changed to 4092. | |
Consult the local network team to see if a Fabric diagnostic has taken place. The use of ibdiagnet is one way to debug an unhealthy network environment. | |
Develop a plan to tune the client nodes. Deployment offers a template but depending on the workload or application type, you might need to make many adjustments. Consult the IBM Spectrum Scale tuning guide. | |
Discuss with the customer about password changes required. You must be prepared
before starting to set the desired passwords for the customer for the various components:
|
Post-installation or post-upgrade checklist
Hardware and software call home have been set up and tested. If applicable, consider
postponing the call home setup until the protocol nodes are deployed.
|
|
GUI has been set up and demonstrated to the customer. If applicable, consider postponing the GUI setup until the protocol nodes are deployed. | |
GUI SNMP and SMTP alerts have been set up, if desired. | |
The customer RHEL license is registered and active. | |
No issues have been found with mmhealth, GUI, gnrhealthcheck, gssinstallcheck, serviceable events. | |
Client nodes are properly tuned. For more information, see Adding IBM Spectrum Scale nodes to an ESS cluster. | |
It is advised that you turn on autoload to enable GPFS to recover automatically in case of a daemon problem.
|
|
Connect all nodes to Red Hat Network (RHN). | |
Update any security related erratas from RHN if the customer desires (yum –y security). Do not update any kernel, systemd, or network manager erratas. | |
Ensure that you have saved a copy of the xCAT database off to a secure location. | |
Install or upgrade the protocols. For more information, see Elastic Storage Server: Protocol Nodes Quick Deployment Guide. | |
Ensure (if possible) that all network switches have had the firmware updated. | |
IBM Spectrum Scale release level and file system format have been updated, if applicable. | |
If there are more than one building blocks, make sure multiple failure groups are used in a file system and that metadata replication is turned on (-m 2 and -M 2). Do not exceed 5 failure groups or more than 2 metadata replicas. | |
Upgrade file system format and move release level to LATEST depending on the IBM Spectrum Scale client node compatibility. Move the Recovery Group format to LATEST . |
|
Consider offering your clients the following trainings to better enable the proper administration of an IBM Spectrum Scale cluster and ESS storage. | |
Take a CCR backup and save it to a secure location in case you need to restore the
cluster. Consider implementing the mmsdrbackup user exit. For more information, see mmsdrbackup user exit documentation. |
|
New deployments of ESS 5.3.7.6 are under mmvdisk management by default.
After upgrading to ESS 5.3.7.6, you must convert the recovery groups to mmvdisk
as well. Convert to mmvdisk, if currently in the legacy mode.
|
|
You must ensure that the Mellanox performance script is run before the system placed into
production (post deployment or upgrade). This script is run automatically on new deployments. For
upgrades, after the OFED is installed, do the following steps.
|
Other topics
- Restoring a management server
- Part upgrades or replacements
- VLAN reconfiguration on the 1Gb switch
- Extending the 1Gb Cumulus management switch
- Stretch cluster considerations
Sample installation and upgrade flow
New installations go through manufacturing CSC. The system is fully installed with ESS 5.3.7.6, tested, malfunctioning parts replaced, and required RHEL pieces shipped in /home/deploy.
Installation
- SSR checkout complete
- LBS arrival on site
- Plug-n-Play mode demonstrated
- Decisions made on file system names and sizes, block size, host names, IP addresses, and so on
- Check high-speed switch settings and firmware
- Deploy EMS and building block
- Network bonds created
- Cluster created
- Recovery groups, NSDs, file system created
- Stress test performed
- Final checks performed
- GUI setup (w/SNMP alerts if desired)
- Call home setup
- Nodes attached to RHN and security updates applied
Upgrade
- Check high speed switch settings and firmware
- Ensure that there are no hardware issues
- Ensure client / protocol node compatibility
- Ensure no heavy I/O operations are being performed
- Upgrade ESS (rolling upgrade or with cluster down)
- Always ensure you have quorum (if rolling upgrade)
- Always carefully balance the recovery groups and scale management functions as you upgrade each node (if rolling upgrade)
- Move the release level and the file system format, if applicable.
Move the Recovery Group format to
LATEST
. - Final checks are performed
- If applicable, upgrade the ESS protocol nodes
- Ensure that call home and GUI are still working as expected
- Use
yum
to upgrade any security related errata (yum -y security). Do not update any kernel, systemd, or network manager erratas.