ESS software deployment preparation

Install an ESS software package and deploy the storage servers by using the following information. The goal is to create a cluster that allows client or protocol nodes to access the file systems.

Changes in this release

Support for IBM Storage Scale 5.2.0
ESS 3500 BMC update (12.68)
Mellanox OFED 24.01-0.3.3.5
Red Hat 8.8 (3500/3200/5000)
Red Hat 9.2 (Utility Node)
Security Updates
Mpt3sas driver 46.00.01.00 (Red Hat 8.8)
Bug fixes, Security fixes, general improvements
POWER 9 Firmware FW950.A0 (VL950_145)

Note:

POWER 8 or ESS Legacy Systems are not supported in IBM Storage Scale System 6.2.x.x. If you have a P8 or ESS Legacy system in a cluster, go to LTS stream, that is, IBM Storage Scale System 6.1.9.x. End of change

Support matrix

Release	OS	Runs on	Can upgrade or deploy
ESS Utility Node	Red Hat® Enterprise Linux 9.2 (PPC64LE)	N/A	ESS 3500 nodes ESS Utility Node ESS Utility Protocol Node
ESS 3500 6.2.0.1	Red Hat Enterprise Linux® 8.8 (x86_64)	POWER9™ EMS x86 EMS (BYOE)¹ from 6.2.0 Utility node	ESS 3500 nodes POWER9 EMS POWER9 protocol nodes
ESS 3200 6.2.0.1	Red Hat Enterprise Linux 8.8 (x86_64)	POWER9 EMS	ESS 3200 nodes POWER9 EMS POWER9 protocol nodes
ESS 3000 6.2.0.1	Red Hat Enterprise Linux 8.8 (x86_64)	POWER9 EMS	ESS 3000 nodes POWER9 EMS POWER9 protocol nodes
ESS 5000 6.2.0.1	Red Hat Enterprise Linux 8.8 (PPC64LE)	POWER9 EMS	ESS 5000 nodes POWER9 EMS POWER9 protocol nodes
¹ x86 EMS (BYOE) can only upgrade or deploy the ESS 3500 node(s) and the VM image itself.

Prerequisites

This document (ESS Software Quick Deployment Guide)
SSR completes physical hardware installation and code 20.
- SSR uses Worldwide Customized Installation Instructions (WCII) for racking, cabling, and disk placement information.
- SSR uses the respective ESS Hardware Guide (ESS 5000, ESS 3500, ESS Utility Node) for hardware checkout and setting IP addresses.
Worksheet notes from the SSR
Latest ESS xz downloaded to the EMS node from Fix Central (If a newer version is available).
- Data Access Edition or Data Management Edition: Must match the order. If the edition does not match your order, open a ticket with the IBM® Service.
High-speed switch and cables have been run and configured.
Low-speed host names are ready to be defined based on the IP addresses that the SSR have configured.
High-speed host names (suffix of low speed) and IP addresses are ready to be defined.
Host and domain name (FQDN) are defined in the /etc/hosts file.

What is in the /home/deploy directory on the EMS node?

Unified container (May not be the latest and contains support for all platforms.)

Support for signed RPMs

ESS or IBM Storage Scale RPMs are signed by IBM.

The PGP key is located in /opt/ibm/ess/tools/conf.

-rw-r-xr-x 1 root root 907 Dec 1 07:45 SpectrumScale_public_key.pgp

You can check whether an ESS or IBM Storage Scale RPM is signed by IBM as follows.

Import the PGP key.

rpm --import  /opt/ibm/ess/tools/conf/SpectrumScale_public_key.pgp

Verify the RPM.
```
rpm -K RPMFile
```

ESS networking requirements

In any scenario you must have an EMS node and a management switch. The management switch must be split into two VLANs.

Management VLAN
Service/FSP VLAN
Note: To future proof your environment for ESS 3200, modify any existing management switches to the new VLAN configuration. For more information, see Switch VLAN configuration instructions.

You also need a high-speed switch (IB or Ethernet) for cluster communication.

ESS 3000

POWER9 EMS

It is recommended have a POWER9 EMS with ESS 3000. If you have a legacy environment (POWER8), you cannot update to 6.2.x.x. End of change

If you are using an ESS 3000 with a POWER9 EMS:
- C11-T1 must be connected on the EMS to the management VLAN.
- Port 1 on each ESS 3000 canister must be connected to the management VLAN.
- C11-T2 must be connected on the EMS to the FSP VLAN.
- HMC1 must be connected on the EMS to the FSP VLAN.

Note: It is highly recommended that you connect C11-T3 to a campus connection or run an additional management connection. This is not a requirement but allows for a simple way to access the EMS and maintain a connection when the container starts.

ESS 5000 or ESS 3200

POWER9 EMS support only

EMS must have the following connections:

C11-T1 to the management VLAN
C11-T2 to the FSP VLAN
C11-T3 to the campus network (recommended)
HMC1 to the FSP VLAN

ESS 5000 nodes must have the following connections:

C11-T1 to the management VLAN
HMC1 to the FSP VLAN

ESS 3200 nodes must have the following connections:

Single management connection per canister:
- Each connection is split between 2 MAC addresses:
  1. BMC
  2. Operating system
- The BMC connection requires a VLAN tag to be set for proper communication with the EMS node.

ESS 3200 requirements

Management connections
- Shared management port (visible to OS)
BMC connection
- Shared management port (visible to BMC)
- BMC traffic routed via VLAN 101
High-speed connections
- InfiniBand or Ethernet

Management switch

Typically, a 48-port switch
Two VLANs required
- Management VLAN (VLAN 102)
- FSP/BMC VLAN (VLAN101)
ESS 3200/ESS 3500 dedicated trunk ports
- Routes BMC traffic to VLAN 101

Note: The VLANs shown here are default for the IBM Cumulus switch. The VLAN value can be modified according to your environment.

Figure 3. ESS 3200 Ethernet ports and switch

The ports highlighted in green are the ESS 3200 trunk ports. These are special ports that are for the ESS 3200 only. The reason for these ports is that each ESS 3200 canister has a single interface for both the BMC and the OS but unique MAC addresses. By using a VLAN tag, canister BMC MAC addresses are routed to the BMC/FSP/Service VLAN (Default is 101).
IBM racked orders have the switch preconfigured. Only the VLAN tag needs to be set. If you have an existing IBM Cumulus switch or customer supplied switch, it needs to be modified to accommodate the ESS 3200 trunk port requirement. For more information, see Switch VLAN configuration instructions.

Note: It is mandatory that you connect C11-T3 to a campus connection or run an additional management connection. If you do not do this step, you will lose the connection to the EMS node when the container starts.

ESS 3500 network requirements

ESS Utility Node

Utility node networking — Figure 5. ESS Utility Node network

Code version

All supported ESS nodes are supported within two editions: Data Management Edition and Data Access Edition. An example of package names is as follows: Start of change

ess_6.2.0.1_0604-21_dme_ppc64le.tar.xz
ess_6.2.0.1_0604-21_dae_ppc64le.tar.xz
ess_6.2.0.1_0604-21_dme_x86_64.tar.xz
ess_6.2.0.1_0604-21_dae_x86_64.tar.xz

Note:

The x86 packages run on ESS Utility Node EMS or BYOE.
The versions shown here might not be the GA version available on IBM Fix Central. It is recommended to go to IBM Fix Central and download the latest code.
ppc64le in the package name implies that each container runs on a POWER®-based EMS. For details about functions supported by respective containers, see Support matrix.

You can download the latest 6.1.x.x code (6.2.0.1 is the latest) from IBM Fix Central by using the following link.

IBM Fix Central download link

A unified container is offered with two versions each (Data management + Data access). Example package names for each container are as follows:

Scale_System_DME_UNIFIED-6.2.0.1-x86_64-EMS.tgz
Scale_System_DAE_UNIFIED-6.2.0.1-x86_64-EMS.tgz
Scale_System_DME_UNIFIED-6.2.0.1-ppc64LE-EMS.tgz
Scale_System_DAE_UNIFIED-6.2.0.1-ppc64LE-EMS.tgz

Remote management considerations

Data center access has become more restrictive nowadays. Here are some considerations to enable remote support:

Consider adding campus connections to the HMC2 ports on all POWER servers (ESS 5000, or POWER9 EMS). Consider cabling this port to a public network and setting a campus IP. This will allow remote recovery or debug of the EMS in case of an outage.
Consider adding campus connections to C11-T3 (POWER9 nodes).
Consult with service about adding USB to Ethernet dongle to enable campus connections on the ESS 3200 system.
Add campus connection to a free port on each ESS 3000 canister. Also consider adding SMART PDUs on ESS 3000 frames to help remotely power cycle the system.
For the ESS Utility node, the campus connections are optional but these are recommended.

POWER9 considerations

Only a single instance of all management services supported (that is, single EMS).
It is recommended that all nodes in the storage cluster contain the same ESS release and IBM Storage Scale version.
It is recommended that you upgrade to the latest level before adding a building block.

Other notes

The following tasks must be complete before starting a new installation (tasks done by manufacturing and the SSR):
- SSR has ensured all hardware is clean, and IP addresses are set and pinging over the proper networks (through the code 20 operation).
- /etc/hosts is blank.
- The ESS tgz file (for the correct edition) is in the /home/deploy directory. If upgrade is needed, download from Fix Central and replace.
- Network bridges are cleared.
- Images and containers are removed.
- SSH keys are cleaned up and regenerated.
- All code levels are at the latest at time of manufacturing ship.
Customer must make sure that the high-speed connections are cabled and the switch is ready before starting.
All node names and IP addresses in this document are examples.
Changed root password should be same on each node, if possible. The default password is ibmesscluster. It is recommended to change the password after deployment is completed.
Each server's IPMI (x86) and ASMI passwords (POWER nodes) are set to the server serial number. Consider changing these passwords when the deployment is complete.
Check whether the SSSD service is running on EMS and other nodes. Shut down the SSSD service on those nodes manually, before you upgrade the nodes.
RHEL server nodes might be communicating to root DNS directly and are not routed through internal DNS. If this is not permitted in the environment, you might override the default service configuration or disable it. For more information about background and resolution options, see https://access.redhat.com/solutions/3553031.

ESS best practices

ESS 6.x.x.x uses a new embedded license. It is important to know that installation of any Red Hat packages outside of the deployment upgrade flow is not supported. The container image provides everything required for a successful ESS deployment. If additional packages are needed, contact IBM for possible inclusion in future versions.
For ESS 3000, consider enabling TRIM support. This is outlined in detail in IBM Storage Scale RAID Administration. By default, ESS 3000 only allocates 80% of available space. Consult with IBM development, if going beyond 80% makes sense for your environment, that is if you are not concerned about the performance implications due to this change.
It is recommended to setup a campus or additional management connection before deploying the container.
You must run the essrun config load command against all the storage nodes (including EMS and protocol nodes) in the cluster before enabling admin mode central or deploying the protocol nodes by using the installation toolkit. For more information, see Deploying protocols.
If you are running a stretch cluster, you must ensure that each node has a unique hostid. The hostid might be non-unique if the same IP addresses and host names are being used on both sides of the stretch cluster. Run gnrhealthcheck before creating recovery groups when adding nodes in a stretch cluster environment. You can manually check the hostid on all nodes as follows:
```
mmdsh -N { NodeClass | CommaSeparatedListofNodes } hostid
```
If hostid on any node is not unique, you must fix by running genhostid. These steps must be done when creating a recovery group in a stretch cluster.
Consider placing your protocol nodes in file system maintenance mode before upgrades. This is not a requirement but you should strongly consider doing it. For more information, see File system maintenance mode.
Do not try to update the EMS node while you are logged in over the high-speed network. Update the EMS node only through a separate management or the campus connection (or virtual console).
After adding an I/O node to the cluster, run the gnrhealthcheck command to ensure that there are no issues before creating vdisk sets. For example, duplicate host IDs. Duplicate host IDs cause issues in the ESS environment.
Run the container from a direct SSH connection (or virtual console). Do not SSH from an I/O node or any node that might be rebooted by the container.
Do not log in and run the container over the high-speed network. It is recommended to log in through the campus connection, additional management connection, or virtual console.
You must stop IBM Storage Scale tracing (mmtrace | mmtracectl) before starting the container or deploying any node. The container attempts to block if tracing is detected, it is recommended to manually inspect each ESS node before attempting to deploy.
Heavy IBM Storage Scale and I/O operations must be suspended before upgrading ESS.
Wait for any of the following commands that are performing file system maintenance tasks to complete:
- mmadddisk
- mmapplypolicy
- mmcheckquota
- mmdeldisk
- mmfsck
- mmlssnapshot
- mmrestorefs
- mmrestripefile
- mmrestripefs
- mmrpldisk
Stop the creation and deletion of the snapshots by using the mmcrsnapshot and mmdelsnapshot commands during the upgrade.
High-speed node names must contain a suffix of the management network names. An example is as follows:
- Management network: essio1.test.net
- High-speed network: essio1-hs.test.net
The use of ciphers (vs AUTHONLY) in an ESS environment is still under investigation. This is especially true for the campus connection that are required for the EMS/EMSVM. You must be aware of any security implications that might be of concern in a customer environment due to the ciphers that are currently being used and the added concern over the campus connection.
When you are updating IBM Storage Scale CES on an ESS protocol node, use the toolkit to upgrade the Storage Scale IBM Storage Scale code and then update the node from the deployment container (such as MOFED, Kernel). This practice ensures whether IBM Storage Scale is compatible with the underlying operating system.

Support notes and rules

You can mix IBM Storage Scale server levels in the same cluster, but this is not recommended. Best practice is to have all levels the same. This also includes the client cluster.
ESA install/configure properly for call home to work. ESA runs on the POWER EMS.
If possible, run the client nodes in a separate cluster than the storage.
The essrun (ESS deployment Ansible wrapper tool run within the container) tool does not use the GPFS admin network. It uses the management network only to communicate from the container to each of the nodes.
When you update ESS to 6.1.2.x for the first time, you must consider the implications of moving to MOFED 5.x. Review the following flash carefully for more information Mellanox OFED 5.x considerations in IBM ESS V6.1.2.x.
IBM Storage Fusion, IBM Storage Scale Container Native, and IBM Storage Scale CSI utilize the GUI rest-api server for provisioning of storage to container applications. Persistent Volume (PV) provisioning will halt when the ESS GUI is shut down and remain halted for the duration of the ESS upgrade, until the GUI is restarted. Ensure that the OpenShift and Kubernetes administrators are aware of this impact before proceeding.
For ESS 3500, you must keep 1.5 TB or more space free if future capacity MES is planned (performance to hybrid). Thus, it is recommended to not use all available space when you create a file system for the performance model. The default allocation is 80% of available space when you use the essrun filesystem command (for x86 nodes).
Disable proxy server connections during fresh deployment/upgrade operations from the ESS container .
If you are enabling SED, the recovery group must be enrolled (for SED) before creating any user vdisk.
IBM Storage Scale Object is not supported on ESS 6.1.5.x.
The protocol node deployment (adding a protocol node to a cluster, upgrading IBM Storage Scale RPMs, etc) is handled by using the IBM Storage Scale installation toolkit. A protocol node is not deployed by using the ESS deployment container.
You can run a DNS vs /etc/hosts as long as the container can properly resolve all the nodes correctly during that deployment.

Stretch cluster considerations

Consider A and B sites of the same cluster.

All management networks (FSP/BMC/MGMT) and the high-speed network (InfiniBand/Ethernet) must be flat and single namespace.
All nodes must have unique IP addresses and hostnames for the management networks and the high-speed network.
All nodes must be reachable to each other over all networks.
A common /etc/hosts file (or properly configured DNS) must be defined as per the best practices, and copied by using the deployment tools to all ESS nodes in the cluster.

Client nodes

Client nodes need to be at MOFED 4.9.x or higher and converted to verbsRDMA core libs after the ESS cluster is moved to 6.1.2.x or higher. Moving to verbsRDMA core libs is especially important if verbsRDMA is in use in the storage cluster.

Upgrade guidance

Review the IBM Storage Scale System (https://www.ibm.com/support/pages/ibmsearch?tc=STHMCM&dc=D600&&sortby=desc) and IBM Storage Scale (https://www.ibm.com/support/pages/ibmsearch?tc=STXKQY&dc=D600&&sortby=desc) flashes and advisories before an upgrade to decide which version would be better to upgrade IBM Storage Scale System.

Note:

Upgrades to ESS 6.1.2.x follow the N-2 rule. You can upgrade from ESS 6.1.2.x, 6.1.1.x (that is, 6.1.1.2) or 6.1.0.x.
Upgrades to ESS 6.1.5.x follow N-3 rule. You can upgrade from 6.1.2.x, 6.1.3.x, and 6.1.4.x.
Starting with ESS 6.1.5.x, further jumps adhere to the N-3 rule