Storage considerations

To install IBM® Cloud Pak for Data, you must have a supported file storage system on your Red Hat® OpenShift® cluster.

Storage providers

Cloud Pak for Data supports and is optimized for several types of persistent storage:

Red Hat OpenShift Container Storage
Version: 4.6 or later fixes
Available in the IBM Storage Suite for IBM Cloud® Paks.
IBM Spectrum® Fusion
Version: 2.1.2 or later fixes
IBM Spectrum Scale Container Native storage
IBM Spectrum Scale Container Native Storage Access Version: 5.1.1.3 or later fixes
Container Storage Interface Version: 2.3.0 or later fixes
Available in the IBM Storage Suite for IBM Cloud Paks.
Network File System (NFS)
Version: 4
Portworx
Version: 2.7.0 or later fixes
IBM Cloud File Storage
Version: Not applicable
Tip: The preceding storage options have been evaluated by IBM. You can run the Cloud Pak for Data storage validation tool to assess storage that is provided by other vendors. However, this tool does not guarantee support for other types of storage. You can use other storage environments at your own risk.

Storage comparison

The following table can help you decide which storage solution is right for you.

As you plan your system, remember that not all services support all types of storage. For complete information on the storage types supported by each service, see Storage requirements.

If the services that you want to install don't support the same type of storage, you can have a mixture of different storage types on your cluster.

Details OpenShift Container Storage IBM Spectrum NFS Portworx IBM Cloud File Storage
Deployment environments
  • On-premises deployments on VMware or bare metal

    For details, see the OpenShift Container Storage Infrastructure requirements.

  • Amazon Web Services Self-managed Red Hat OpenShift only.

  • Microsoft Azure

    Self-managed Red Hat OpenShift only.

  • Google Cloud

    Self-managed Red Hat OpenShift only.

  • On-premises deployments
  • IBM Cloud

    Self-managed Red Hat OpenShift only.

  • Microsoft Azure locally redundant Premium SSD

    Self-managed Red Hat OpenShift only.

  • Google Cloud Cloud Filestore

    Self-managed Red Hat OpenShift only.

  • On-premises deployments
  • IBM Cloud

    Managed Red Hat OpenShift

  • Amazon Web Services

    Self-managed Red Hat OpenShift only.

  • Microsoft Azure

    Self-managed Red Hat OpenShift only.

  • Google Cloud

    Self-managed Red Hat OpenShift only.

Red Hat OpenShift 4.8 Supported Supported Supported Supported Supported
Red Hat OpenShift 4.6 Supported Not supported Supported Supported Supported
x86-64 Supported Supported Supported Supported Supported
POWER® Not supported Not supported Supported on Red Hat OpenShift 4.6 only. Not supported Not supported
IBM Z® Supported Not supported Supported Not supported Not supported
License requirements
Cloud Pak for Data customers are entitled to use Red Hat OpenShift Data Foundation (previously Red Hat OpenShift Container Storage), fully supported by IBM in production environments (Level 1 and Level 2), with the following limitations:
  • You can use up to 12 TB of Red Hat OpenShift Data Foundation Advanced storage.
  • You can use Red Hat OpenShift Data Foundation Advanced storage for up to 36 months.

If you exceed the terms, you must purchase a separate license. For more information, see Red Hat OpenShift Data Foundation overview.

IBM Spectrum Fusion
A separate license is required. For more information on IBM Spectrum Fusion, see the IBM Spectrum Fusion documentation.
IBM Spectrum Scale Container Native
Cloud Pak for Data customers are entitled to use IBM Spectrum Scale Container Native.

You can use up to 12 TB of IBM Spectrum Scale Container Native, fully supported by IBM in production environments (Level 1 and Level 2), for up to 36 months.

If you exceed these terms, a separate license is required. For details, see the IBM Storage Suite for IBM Cloud Paks documentation.

No license required. A separate license is required. For details, see Portworx Enterprise. No separate license required.

For details about the amount of storage you can use, see How many volumes can be ordered.

Storage classes The required storage classes are automatically created when you install OpenShift Container Storage.

Cloud Pak for Data uses the following storage classes:

  • ocs-storagecluster-cephfs
  • ocs-storagecluster-ceph-rbd
ibm-spectrum-scale-sc with permissions value set within StorageClass permissions: "777"

For more details, see Setting up IBM Spectrum storage.

NFS storage classes are user-defined.

Use a storage class with ReadWriteMany (RWX) access.

The required storage classes are listed in Creating Portworx storage classes.

You can run the provided script to create the storage classes.

ibmc-file-gold-gid
Data replication for high availability Supported

By default, all services use multiple replicas for high availability. OpenShift Container Storage maintains each replica in a distinct availability zone.

Supported.

Replication is supported and can be enabled within the Spectrum Scale Storage Cluster in a variety of ways. For details, see Data Mirroring and Replication in the IBM Spectrum Scale documentation.

Replication support depends on your NFS server. Supported

By default, most services use a storage class that supports 3 replicas.

For details about the replicas for each storage class, see Creating Portworx storage classes.

For details about the storage classes required for each service, see Storage requirements.

Supported, but not enabled by default.

You can enable replication from the IBM Cloud console. For details, see Replicating data.

Backup and restore Container Storage Interface support for snapshots and clones.

Tight integration with Velero CSI plugin for Red Hat OpenShift Container Platform backup and recovery.

IBM Spectrum Protect Plus is not supported for application-consistent backup and restore.

For storage level backup:

IBM Spectrum Fusion
See Back up and restore in the IBM Spectrum Fusion documentation.
IBM Spectrum Scale Container Native
Use the IBM Spectrum Scale Container Storage Interface Volume snapshot as the primary backup and restore method and combine it with Container Backup Support provided by IBM Spectrum Protect Plus.

Additionally, there are multiple methods you can use to backup the Spectrum Scale Storage Cluster. For details, see Data protection and disaster recovery in the IBM Spectrum Scaledocumentation.

Limited support.
On-premises
Limited support.

For details, see Backing up the Cloud Pak for Data file system on Portworx.

IBM Cloud
Supported with the Portworx Enterprise Disaster Recovery plan.
Supported, but not enabled by default.

For details, see Backing up and restoring data.

Encryption of data at rest Supported

OpenShift Container Storage uses Linux Unified Key System (LUKS) version 2 based encryption with a key size of 512 bits and the aes-xts-plain64 cipher.

You must enable encryption for your whole cluster during cluster deployment to ensure encryption of data at rest. Encryption is disabled by default. Working with encrypted data incurs a small performance penalty.

Support for FIPS cryptography
By storing all data in volumes that use RHEL-provided disk encryption and enabling FIPS mode for your cluster, both data at rest and data in motion, or network data, are protected by FIPS Validated Modules in Process encryption. You can configure your cluster to encrypt the root filesystem of each node, as described in Customizing nodes.
 
Supported

For details, see Encryption in the IBM Spectrum Scale documentation.

Check with your storage vendor on the steps to enable encryption of data at rest. Supported with Portworx Enterprise for IBM only.

Portworx uses the LUKS format of dm-crypt and AES-256 as the cipher with xts-plain64 as the cipher mode.

On-premises deployments
Refer to Enabling Portworx volume encryption in the Portworx documentation.
IBM Cloud deployments
To protect the data in your Portworx volumes, encrypt the volumes with IBM Key Protect or Hyper Protect Crypto Services.
Supported
Network requirements Your network must support a minimum of 10 Gbps. You must have sufficient network performance to meet the storage I/O requirements. You must have sufficient network performance to meet the storage I/O requirements. Your network must support a minimum of 10 Gbps.

For details, see Prerequisites.

You must have sufficient network performance to meet the storage I/O requirements.

For details, see Network connection.

I/O requirements Each node must have at least one enterprise-grade SSD or NVMe device that meets the Disk requirements in the system requirements.

For more information, see Planning your deployment.

If SSD or NVMe aren't supported in your deployment environment, use an equivalent or better device.

For details, see Disk requirements in the system requirements. For details, see Disk requirements in the system requirements.

For details, see Disk requirements in the system requirements.

For details on performance, see FIO performance in the Portworx documentation.

For details, see Disk requirements in the system requirements.

The default I/O settings are typically lower than the minimums specified in the Disk requirements section.

To improve the I/O performance for production environments, you must adjust the I/O settings. Contact IBM Software Support for guidance on how to adjust the settings according to Changing the size and IOPS of your existing storage device.

Minimum amount of storage A minimum of three nodes.

On each node, you must have at least one SSD or NVMe device. Each device should have at least 1TB of available storage.

For details, see Storage device requirements.

1 TB or more of available space 1 TB or more of available space A minimum of three storage nodes.
On each storage node, you must have:
  • A minimum of 1 TB of raw, unformatted disk
  • An additional 100 GB of raw, unformatted disk for a key-value database.
500 GB or more

Storage is not automatically expanded and is created in smaller chunks.

Increasing the size of the volumes improves I/O performance for production environments. Contact IBM Software Support as indicated in the preceding row.

Minimum amount of vCPU
  • 10 vCPU per node on three initial nodes.
  • 2 vCPU per node on any additional nodes

For details, see Resource requirements.

8 vCPU on each worker node to deploy IBM Spectrum Scale Container Native and IBM Spectrum Scale Container Storage Interface Driver.
IBM Spectrum Fusion
See the IBM Spectrum Scale Container Native hardware requirements.
IBM Spectrum Scale Container Native
See the IBM Spectrum Scale Container Native requirements
8 vCPU on the NFS server
On-premises
4 vCPU on each storage node
IBM Cloud
For details see the following sections of Storing data on software-defined-storage (SDS) with Portworx:
  • What worker node flavor in Red Hat OpenShift on IBM Cloud is the right one for Portworx?
  • What if I want to run Portworx in a classic cluster with non-SDS worker nodes?
Not applicable for managed services.
Minimum amount of memory
  • 24 GB of RAM on initial three nodes.
  • 5 GB of RAM on any additional nodes.

For details, see Resource requirements.

16 GB of RAM on each worker node to deploy IBM Spectrum Scale Container Native and IBM Spectrum Scale Container Storage Interface Driver.
IBM Spectrum Fusion
See the IBM Spectrum Scale Container Native hardware requirements.
IBM Spectrum Scale Container Native
See the IBM Spectrum Scale Container Native requirements
32 GB of RAM on the NFS server 4 GB of RAM on each storage node Not applicable for managed services
Installation documentation Product documentation for Red Hat OpenShift Container Storage 4.5
IBM Spectrum Fusion
See the IBM Spectrum Fusion Installation documentation.
IBM Spectrum Scale Container Native
See the IBM Spectrum Scale Container Native Installation documentation.
Kubernetes NFS-Client Provisioner Install Portworx Enterprise for IBM on OpenShift Installed by default when you install managed Red Hat OpenShift on IBM Cloud. For details, see Storing data on classic IBM Cloud File Storage.
Troubleshooting documentation Product documentation for Troubleshooting OpenShift Container Storage 4.5
IBM Spectrum Fusion
See the IBM Spectrum Fusion Troubleshooting documentation.
IBM Spectrum Scale
See the IBM Spectrum Scale Container Native documentation.
IBM Spectrum Scale Container Storage Interface
See the IBM Spectrum Scale Container Storage Interface documentation.
Refer to the documentation from your NFS provider. Troubleshoot Portworx on Kubernetes Troubleshooting persistent storage

Storage configuration and provisioning

Cloud Pak for Data supports dynamic storage provisioning. A Red Hat OpenShift cluster administrator must properly configure the storage before Cloud Pak for Data is installed. The person who installs Cloud Pak for Data and the services on the cluster must know which storage classes to use during installation.

For guidance on how to configure your storage for use with Cloud Pak for Data, see Setting up shared persistent storage.

Requirements

For information about the minimum amount of storage that is required for your environment, see Storage requirements.

Important: Work with your IBM Sales representative to ensure that you have sufficient storage for the services that you plan to run on Cloud Pak for Data and for your expected workload.
There might be additional requirements based on the type of storage that you plan to use or your environment:
  • If you are using Portworx storage, your Red Hat OpenShift Container Platform cluster must use the CRI-O container runtime.
  • If you are running the Prometheus Cluster Monitoring stack on IBM Cloud, you might notice that pods consume more local storage. You can reduce the retention periods of your logs or you can configure logs to be saved in persistent storage instead of local storage. For more information, see Configuring the monitoring stack. To troubleshoot issues, see Worker nodes show status of disk pressure.
Best practice: Run the Cloud Pak for Data storage validation tool on your Red Hat OpenShift cluster to evaluate whether the storage on your cluster is sufficient for use with Cloud Pak for Data.

As mentioned in Storage providers, you can use this tool to assess storage that is provided by other vendors; however, this tool does not guarantee support for storage that has not been evaluated by IBM. You can use other storage environments at your own risk.

Recommended disks

For optimal performance, the following storage disks are recommended:
On-premises deployments
  • SSD drives
  • NVMe drives
Amazon Web Services deployments
  • GP2 disks
  • IO1 disks or better

For details, see Amazon EBS volume types.

Microsoft Azure
Ultra disks or better