IBM Software Hub online backup and restore to a different cluster with NetApp Trident protect

A Red Hat® OpenShift® Container Platform cluster administrator can create an online backup and restore it to a different cluster with NetApp Trident protect.

Before you begin

Do the following tasks before you back up and restore a IBM Software Hub deployment.

  1. Check whether the services that you are using support platform backup and restore by reviewing Services that support backup and restore. You can also run the following command:
    cpd-cli oadp service-registry check \
    --tenant-operator-namespace ${PROJECT_CPD_INST_OPERATORS} \
    --verbose \
    --log-level debug

    If a service is not supported, check if one of the following alternatives is available:

  2. On the source cluster, install the software that is needed to back up and restore IBM Software Hub with NetApp Trident protect.

    For more information, see Installing backup and restore software.

  3. Check that your IBM Software Hub deployment meets the following requirements:
    • The minimum deployment profile of IBM Cloud Pak foundational services is Small.

      For more information about sizing IBM Cloud Pak foundational services, see Hardware requirements and recommendations for foundational services.

    • All services are installed at the same IBM Software Hub release.

      You cannot back up and restore a deployment that is running service versions from different IBM Software Hub releases.

    • The control plane is installed in a single project (namespace).
    • The IBM Software Hub instance is installed in zero or more tethered projects.
    • IBM Software Hub operators and the IBM Software Hub instance are in a good state.

Overview

Backing up a IBM Software Hub deployment and restoring it to a different cluster involves the following high-level steps:

  1. Preparing to back up IBM Software Hub
  2. Creating an online backup
  3. Preparing to restore IBM Software Hub
  4. Restoring IBM Software Hub
  5. Completing post-restore tasks

1. Preparing to back up IBM Software Hub

Complete the following prerequisite tasks before you create an online backup. Some tasks are service-specific, and need to be done only when those services are installed.

1.1 Creating environment variables

Create the following environment variables so that you can copy commands from the documentation and run them without making any changes.

Environment variable Description
OC_LOGIN Shortcut for the oc login command.
CPDM_OC_LOGIN Shortcut for the cpd-cli manage login-to-ocp command.
PROJECT_CPD_INST_OPERATORS The project where the IBM Software Hub instance operators are installed.
PROJECT_CPD_INST_OPERANDS The project where IBM Software Hub control plane and services are installed.
PROJECT_SCHEDULING_SERVICE The project where the scheduling service is installed.

This environment variable is needed only when the scheduling service is installed.

OADP_PROJECT The project where the OADP operator is installed.
BACKUP_NAME The name of your IBM Software Hub backup.
APPVAULT_NAME The name of the NetApp Trident protect AppVault custom resource.
RESTORE_NAME The name of your IBM Software Hub restore.

1.2 Configuring NetApp Trident protect for IBM Software Hub

Configure NetApp Trident protect for IBM Software Hub by deploying and running the cpd-trident-protect.py script from a workstation that is connected to the NetApp Trident protect deployment.

  1. Log in to Red Hat OpenShift Container Platform as an instance administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
    1. Set up the Python virtual environment:
      chmod +x ./venv.sh
      ./venv.sh
    2. Activate the Python virtual environment:
      source venv/bin/activate
  2. Download and copy the cpd-trident-protect.py script to the current directory.
    Tip: This script is located in the GitHub repository.
  3. If you need to update an existing NetApp Trident protect application, you must uninstall the NetApp Trident protect application custom resource by running the following command:
    python cpd-trident-protect.py uninstall \
        --application_name=cpd-operators-tenant \
        --namespace=${PROJECT_CPD_INST_OPERATORS} \
        --trident_protect_operator_ns=trident-protect
  4. Install the NetApp Trident protect application and execution hook custom resources by running the following command:
    python cpd-trident-protect.py install \
        --appvault_name=${APPVAULT_NAME} \
        --application_name=cpd-operators-tenant \
        --trident_protect_operator_ns=trident-protect \
        --cpd_operator_ns=${PROJECT_CPD_INST_OPERATORS} \
        --cpdbr_tenant_service_image_prefix="icr.io/cpopen/cpd/cpdbr-oadp"
    1. Verify that the NetApp Trident protect application custom resource is in Ready state:
      tridentctl-protect get application -n ${PROJECT_CPD_INST_OPERATORS}
    2. Verify that the NetApp Trident protect execution hook custom resources were created:
      tridentctl-protect get exechook -n ${PROJECT_CPD_INST_OPERATORS}

1.3 Checking the primary instance of every PostgreSQL cluster is in sync with its replicas

The replicas for Cloud Native PostgreSQL and EDB Postgres clusters occasionally get out of sync with the primary node. To check whether this problem exists and to fix the problem, see the troubleshooting topic PostgreSQL cluster replicas get out of sync.

1.4 Removing MongoDB-related ConfigMaps

If you upgraded from IBM Cloud Pak® for Data version 4.8.4 or older, some backup and restore ConfigMaps related to MongoDB might remain in the IBM Software Hub operand project (namespace), and must be removed. Ensure that these ConfigMaps do not exist in the operand project by running the following commands:
oc delete cm zen-cs-aux-br-cm -n ${PROJECT_CPD_INST_OPERANDS}
oc delete cm zen-cs-aux-ckpt-cm -n ${PROJECT_CPD_INST_OPERANDS}
oc delete cm zen-cs-aux-qu-cm -n ${PROJECT_CPD_INST_OPERANDS}
oc delete cm zen-cs2-aux-ckpt-cm -n ${PROJECT_CPD_INST_OPERANDS}

1.5 Preparing IBM Knowledge Catalog

If large metadata enrichment jobs are running while an online backup operation is triggered, the Db2 pre-backup hooks might fail because the database cannot be put into a write-suspended state. It is recommended to have minimal enrichment workload while the online backup is scheduled.

1.6 Excluding external volumes from backups

You can exclude external Persistent Volume Claims (PVCs) in the IBM Software Hub instance project (namespace) from online backups.

You might want to exclude PVCs that were manually created in the IBM Software Hub project (namespace) but are not needed by services. These volumes might be too large for a backup, or they might already be backed up by other means.

Note: During restore, you might need to manually create excluded PVCs if pods fail to start because of an excluded PVC.
  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Use the following command to label the PVCs that you want to exclude:
    oc label pvc <pvc-name> icpdsupport/ignore-on-nd-backup=true

1.7 Checking the status of installed services

Ensure that the status of all installed services is Completed. Do the following steps:
  1. Log the cpd-cli in to the Red Hat OpenShift Container Platform cluster:
    ${CPDM_OC_LOGIN}
    Remember: CPDM_OC_LOGIN is an alias for the cpd-cli manage login-to-ocp command.
  2. Run the following command to get the status of all services.
    cpd-cli manage get-cr-status \
    --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS}

2. Creating an online backup

Create an online backup of a IBM Software Hub deployment by doing the following tasks.

Note: Backing up the scheduling service is not supported.

2.1 Backing up an IBM Software Hub instance

Create an online backup of each IBM Software Hub instance, or tenant, in your environment by completing the following steps.

Check the Known issues and limitations for IBM Software Hub page for any workarounds that you might need to do before you create a backup.

  1. Create a backup of the IBM Software Hub instance:
    python cpd-trident-protect.py backup create \
      --backup_name=${BACKUP_NAME} \
      --appvault_name=${APPVAULT_NAME} \
      --application_name=cpd-operators-tenant \
      --namespace=${PROJECT_CPD_INST_OPERATORS} \
      --trident_protect_operator_ns=trident-protect
  2. Run the following command to validate the backup is complete. In the following command, the --wait option indicates that the command should wait for the backup process to complete. You can remove the --wait option to check the backup at a single point in time:
    python cpd-trident-protect.py backup status \
      --backup_name=${BACKUP_NAME} \
      --namespace=${PROJECT_CPD_INST_OPERATORS} \
      --trident_protect_operator_ns=trident-protect \
      --wait

2.2 Optional: Deleting unused NetApp Trident protect backups

You can delete any on-demand backups that you created with NetApp Trident protect if you no longer need them. Deleting unused backups can free up the object storage disk and cleanup old backups.

On-demand backups of IBM Software Hub have two parts:

  • A NetApp Trident protect volume backup
  • A hybrid Velero resource backup

Use the following commands to delete the entire NetApp Trident protect backup:

tridentctl-protect delete backup ${BACKUP_NAME} \
--namespace=${PROJECT_CPD_INST_OPERATORS} \
--tp-namespace=trident-protect
cpd-cli oadp tenant-backup delete ${BACKUP_NAME}

3. Preparing to restore IBM Software Hub to a different cluster

Complete the following prerequisite tasks before you restore an online backup. Some tasks are service-specific, and need to be done only when those services are installed.

3.1 Preparing the target cluster

Prepare the target cluster that you want to use to restore IBM Software Hub.

  1. Make sure that the target cluster meets the following requirements:
    • The target cluster has the same storage classes as the source cluster.
    • For environments that use a private container registry, such as air-gapped environments, the target cluster has the same image content source policy as the source cluster. For details on configuring the image content source policy, see Configuring an image content source policy for IBM Software Hub images.
    • The target cluster must be able to pull software images. For details, see Updating the global image pull secret for IBM Software Hub.
    • The deployment environment of the target cluster is the same as the source cluster.
      • The target cluster uses the same hardware architecture as the source cluster. For example, x86-64.
      • The target cluster is on the same OpenShift version as the source cluster.
      • The target cluster allows for the same node configuration as the source cluster. For example, if the source cluster uses a custom KubeletConfig, the target cluster must allow the same custom KubeletConfig.
      • Moving between IBM Cloud and non-IBM Cloud deployment environments is not supported.
  2. If you are using node labels as the method for identifying nodes in the cluster, re-create the labels on the target cluster.
    Best practice: Use node labels instead of node lists when you are restoring a IBM Software Hub deployment to a different cluster, especially if you plan to enforce node pinning. Node labels enable node pinning with minimal disruption. To learn more, see Passing node information to IBM Software Hub.
  3. Install NetApp Trident protect.
    Note: Install the same version that you installed on the source cluster.
  4. Install OADP.
    Note: Install the same OADP version that you installed on the source cluster. Follow the instructions in 4. Installing IBM Software Hub OADP backup and restore utility components, except for setting up object storage to store backups.
  5. Install or upgrade cpdbr service role-based access controls (RBACs).
    Notes:
    • Ensure that the same version of the cpdbr service is installed on the source and target clusters.
    • It is recommended that you install the latest version of the cpdbr service. If you previously installed the service, upgrade the service by doing the upgrade steps.

    • When you install the cpdbr service on the target cluster, a warning that the IBM Software Hub instance (tenant) operator project (namespace), ${PROJECT_CPD_INST_OPERATORS}, doesn't exist appears. You can ignore this warning.

    • When the cpdbr service is installed on a target cluster, only the required permissions and cluster role bindings are created, because the IBM Software Hub projects (namespaces) aren't yet restored.

    Install the cpdbr service RBACs on the target cluster
    1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
      ${OC_LOGIN}
      Remember: OC_LOGIN is an alias for the oc login command.
    2. Configure the client to set the OADP project:
      cpd-cli oadp client config set namespace=${OADP_PROJECT}
    3. In the IBM Software Hub instance operators project, install the cpdbr service.
      Note: Run the cpdbr installation command in the IBM Software Hub operators project even though the project does not yet exist in the target cluster. Do not manually create the project on the target cluster. The project is created during the IBM Software Hub restore process.
      Environments with the scheduling service
      cpd-cli oadp install \
      --component=cpdbr-tenant \
      --namespace=${OADP_PROJECT} \
      --tenant-operator-namespace=${PROJECT_CPD_INST_OPERATORS} \
      --cpd-scheduler-namespace=${PROJECT_SCHEDULING_SERVICE} \
      --log-level=debug \
      --rbac-only=true \
      --verbose
      Environments without the scheduling service
      cpd-cli oadp install \
      --component=cpdbr-tenant \
      --namespace=${OADP_PROJECT} \
      --tenant-operator-namespace=${PROJECT_CPD_INST_OPERATORS} \
      --log-level=debug \
      --rbac-only=true \
      --verbose
    4. Verify that the ClusterRole and ClusterRoleBinding were created:
      oc get clusterrole cpdbr-tenant-service-clusterrole
      oc get clusterrolebinding cpdbr-tenant-service-crb
      If the cluster role bindings were created successfully, these commands return output like in the following examples:
      NAME                               CREATED AT
      cpdbr-tenant-service-clusterrole   <timestamp>
      NAME                       ROLE                                           AGE
      cpdbr-tenant-service-crb   ClusterRole/cpdbr-tenant-service-clusterrole   45h
    Upgrade the cpdbr service on the target cluster
    1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.
      ${OC_LOGIN}
      Remember: OC_LOGIN is an alias for the oc login command.
    2. Upgrade the cpdbr service.
      Environments with the scheduling service
      cpd-cli oadp install \
      --upgrade=true \
      --component=cpdbr-tenant \
      --namespace=${OADP_PROJECT} \
      --tenant-operator-namespace=${PROJECT_CPD_INST_OPERATORS} \
      --cpd-scheduler-namespace=${PROJECT_SCHEDULING_SERVICE} \
      --log-level=debug \
      --rbac-only=true \
      --verbose
      Environments without the scheduling service
      cpd-cli oadp install \
      --upgrade=true \
      --component=cpdbr-tenant \
      --namespace=${OADP_PROJECT} \
      --tenant-operator-namespace=${PROJECT_CPD_INST_OPERATORS} \
      --log-level=debug \
      --rbac-only=true \
      --verbose
  6. Install Certificate manager and the IBM License Service.

    For details, see Installing shared cluster components for IBM Software Hub.

    Note: You must install the same version of Certificate manager and the IBM License Service that is installed on the source cluster.
  7. If your IBM Software Hub deployment includes the following services, install and set up prerequisite software.

    Instructions for installing prerequisite software are located in Installing prerequisite software.

    Prerequisite software Service
    GPU operators

    An asterisk (*) indicates that the service requires GPU in some situations.

    • IBM Knowledge Catalog Premium *
    • IBM Knowledge Catalog Standard *
    • Watson Machine Learning *
    • Watson Studio Runtimes *
    • watsonx.ai™
    • watsonx Assistant *
    • Watsonx BI
    • watsonx Code Assistant™
    • watsonx Code Assistant for Red Hat Ansible® Lightspeed
    • watsonx Code Assistant for Z
    • watsonx Code Assistant for Z Agentic 5.2.1 and later
    • watsonx Code Assistant for Z Code Explanation
    • watsonx Code Assistant for Z Code Generation 5.2.1 and later
    • watsonx.data™ *
    • watsonx.data Premium
    • watsonx.data intelligence
    • watsonx™ Orchestrate *
    Red Hat OpenShift AI

    An asterisk (*) indicates that the service requires Red Hat OpenShift AI in some situations.

    • IBM Knowledge Catalog Premium *
    • IBM Knowledge Catalog Standard *
    • watsonx.ai
    • watsonx Assistant *
    • Watsonx BI
    • watsonx Code Assistant
    • watsonx Code Assistant for Red Hat Ansible Lightspeed
    • watsonx Code Assistant for Z
    • watsonx Code Assistant for Z Agentic 5.2.1 and later
    • watsonx Code Assistant for Z Code Explanation
    • watsonx Code Assistant for Z Code Generation 5.2.1 and later
    • watsonx.data Premium
    • watsonx.data intelligence
    • watsonx Orchestrate *
    Multicloud Object Gateway
    • Watson Discovery
    • Watson Speech services
    • watsonx Assistant
    • watsonx Orchestrate
    Red Hat OpenShift Serverless Knative Eventing
    • watsonx Assistant
    • watsonx Orchestrate
    Warning: Do not create the secrets that the service needs to communicate with Multicloud Object Gateway. Instead, the secrets must be created as a post-restore task.

3.2 Cleaning up the target cluster after a previous restore

If you previously restored a IBM Software Hub backup or a previous restore attempt was unsuccessful, delete the IBM Software Hub instance projects (namespaces) in the target cluster before you try another restore.

Resources in the IBM Software Hub instance are watched and managed by operators and controllers that run in other projects. To prevent corruption or out of sync operators and resources when you delete a IBM Software Hub instance, Kubernetes resources that have finalizers specified in metadata must be located, and those finalizers must be deleted before you can delete the IBM Software Hub instance.

  1. Log in to Red Hat OpenShift Container Platform as an instance administrator.
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Download the cpd-pre-restore-cleanup.sh script from https://github.com/IBM/cpd-cli/tree/master/cpdops/5.2.0.
  3. If the tenant operator project exists and has the common-service NamespaceScope custom resource that identifies all the tenant projects, run the following command:
    ./cpd-pre-restore-cleanup.sh --tenant-operator-namespace="${PROJECT_CPD_INST_OPERATORS}"
  4. If the tenant operator project does not exist or specific IBM Software Hub projects need to be deleted, run the following command.

    If the common-service NamespaceScope custom resource is not available and additional projects, such as tethered projects, need to be deleted, modify the list of comma-separated projects in the --additional-namespaces option as necessary.

    ./cpd-pre-restore-cleanup.sh --additional-namespaces="${PROJECT_CPD_INST_OPERATORS},${PROJECT_CPD_INST_OPERANDS}"
  5. If the IBM Software Hub scheduling service was installed, uninstall it.

    For details, see Uninstalling the scheduling service.

4. Restoring IBM Software Hub to a different cluster

Restore an online backup to a different cluster by doing the following tasks.

4.1 Restoring the scheduling service

Restore for the scheduling service is not supported. Reinstall it on your target cluster.

4.2 Restoring an IBM Software Hub instance

Restore an IBM Software Hub instance by completing the following steps.

Check the Known issues and limitations for IBM Software Hub page for any workarounds that you might need to do before you restore a backup.

  1. Identify the PATH value of the NetApp Trident protect backup that you want to restore:
    tridentctl-protect get appvaultcontent cpd-appvault --show-paths -n trident-protect --app cpd-operators-tenant --show-paths --show-resources backup
  2. Restore the IBM Software Hub instance. In the following script, add the PATH information from the previous step for the APP_ARCHIVE_PATH environment variable:
    APP_ARCHIVE_PATH="Use the PATH value from the previous step."
    
    python cpd-trident-protect.py restore create \
      --restore_name=${RESTORE_NAME} \
      --namespace=${PROJECT_CPD_INST_OPERATORS} \
      --path=${APP_ARCHIVE_PATH} \
      --namespace_mappings="${PROJECT_CPD_INST_OPERATORS}:${PROJECT_CPD_INST_OPERATORS},${PROJECT_CPD_INST_OPERANDS}:${PROJECT_CPD_INST_OPERANDS}" \
      --appvault_name=${APPVAULT_NAME} \
      --trident_protect_operator_ns=trident-protect
      --oadp_namespace=${OADP_PROJECT}
    Note: Because of a limitation in NetApp Trident protect, the --namespace_mappings field is required. Each namespace must be specified and mapped to the original value. The IBM Software Hub integration does not support restoring to different namespaces.
  3. Run the following command to validate the restore is complete. In the following command, the --wait option indicates that the command should wait for the restore process to complete. You can remove the --wait option to check the restore at a single point in time:
    python cpd-trident-protect.py restore status \
      --restore_name=${RESTORE_NAME} \
      --namespace=${PROJECT_CPD_INST_OPERATORS} \
      --trident_protect_operator_ns=trident-protect \
      --wait
Best practice: If your IBM Software Hub deployment has services that connect to an external database, and you followed the recommendation to back up the database at the same time that you back up IBM Software Hub, restore the database backup that was taken at the same time as the IBM Software Hub backup.

5. Completing post-restore tasks

Complete additional tasks for the control plane and for some services after you restore a IBM Software Hub deployment.

5.1 Passing node information and applying cluster HTTP proxy settings or other RSI patches to thecontrol plane

If you use node lists to pin pods to nodes, you must re-run the cpd-cli manage apply-entitlement command after you restore IBM Software Hub on the target cluster. Any pods that need to be rescheduled will be unavailable while they are moved to different nodes. For more information, see Passing node information to IBM Software Hub.

If you applied cluster HTTP proxy settings or other RSI patches to an IBM Software Hub instance in the source cluster, the evictor cronjob runs every 30 minutes to patch pods that did not get patched. Optionally, you can run the following command to apply the patches:
cpd-cli manage apply-rsi-patches --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} -vvv

5.2 Patching Cognos Analytics instances

If a Db2 OLTP database within the cluster is used for a Cognos Analytics content store or audit database, the Cognos Analytics service instance must be patched. Because the Db2 database host and port might be different in the target cluster, update these values in the Cognos Analytics service instance to the correct values to ensure that the instance starts successfully. Do the following steps:
  1. Patch the content store and audit database ports in the Cognos Analytics service instance by running the following script:
    #!/usr/bin/env bash
    #-----------------------------------------------------------------------------
    #Licensed Materials - Property of IBM
    #IBM Cognos Products: ca
    #(C) Copyright IBM Corp. 2024
    #US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule
    #-----------------------------------------------------------------------------
    set -e
    #set -x
    
    function usage {
        echo $0: usage: $0 [-h] -t tethered_namespace -a audit_db_port_number -c cs_db_port_number [-v]
    }
    
    function help {
        usage
        echo "-h prints help to the console"
        echo "-t tethered namespace (required)"
        echo "-a Audit DB port number"
        echo "-c CS DB port number"
        echo "-v turn on verbose mode"
        echo ""
        exit 0
    }
    
    while getopts ":ht:a:c:v" opt; do
        case ${opt} in
            h)
                help
                ;;
            t)
                tethered_namespace=$OPTARG
                ;;
            a)
                audit_db_port_number=$OPTARG
                ;;
            c)
                cs_db_port_number=$OPTARG
                ;;
            v)
                verbose_flag="true"
                ;;
            ?)
                usage
                exit 0
                ;;
        esac
    done
    
    if [[ -z ${tethered_namespace} ]]; then
        echo "A tethered namespace must be provided"
        help
    fi
    
    echo "Get CAServiceInstance Name"
    cr_name=$(oc -n ${tethered_namespace} get caserviceinstance --no-headers -o custom-columns=NAME:.metadata.name)
    if [[ -z ${cr_name} ]]; then
        echo "Unable to find CAServiceInstance CR for namespace: ${tethered_namespace}"
        help
    fi
    
    if [[ ! -z ${cs_db_port_number} ]]; then
        echo "Updating CS Database Port Number in the Custom Resource ${cr_name}..."
        oc patch caserviceinstance ${cr_name} --type merge -p "{\"spec\":{\"cs\":{\"database_port\":\"${cs_db_port_number}\"}}}" -n ${tethered_namespace}
    fi
    
    if [[ ! -z ${audit_db_port_number} ]]; then
        echo "Updating Audit Database Port Number in the Custom Resource ${cr_name}..."
        oc patch caserviceinstance ${cr_name} --type merge -p "{\"spec\":{\"audit\":{\"database_port\":\"${audit_db_port_number}\" }}}" -n ${tethered_namespace}
    fi
    
    sleep 20
    check_status="Completed"
  2. Check the status of the Cognos Analytics reconcile action:
    for i in {1..240};do
    caStatus=$(oc get caserviceinstance ${cr_name} -o jsonpath="{.status.caStatus}" -n ${tethered_namespace})
    
    if [[ ${caStatus} == ${check_status} ]];then
        echo "ca ${check_status} Successfully"
        break
    elif [[ ${caStatus} == "Failed" ]];then
        echo "ca ${caStatus}!"
        exit 1
    fi
    echo "ca Status: ${caStatus}"
    sleep 30
    
    done

5.3 Patching Watson OpenScale database

If a Db2 or Db2 Warehouse database in the cluster is used as data mart database in Watson OpenScale, the port must be updated. Because the Db2 database port might be different after restoring, update the port in the Watson OpenScale instance database to the correct value. You can update the port through the Watson OpenScale user interface or API.

5.4 Resetting status of ongoing Watson OpenScale model evaluations

When a Watson OpenScale instance is restored, some of its features, such as scheduled or on-demand model evaluations, might not function properly. You must reset the status of ongoing model evaluations. For details, see Resetting status of ongoing Watson OpenScale model evaluations.

5.5 Restarting IBM Knowledge Catalog metadata import jobs

After IBM Software Hub is restored, long running metadata import jobs might not resume. The job run status might still be Running, even though the actual import job isn't running. The job must be canceled and manually restarted. You can cancel and restart a job in IBM Knowledge Catalog or by using an API call.
Cancel and restart a job in IBM Knowledge Catalog
  1. Go to a Jobs page, either the general one or the one for the project that contains the metadata import asset.
  2. Look for the job and cancel it.
  3. Restart the job.
Cancel and restart a job by using an API call
Note: You must have the Admin role to use this API call.
post /v2/metadata_imports/recover_task

The request payload must look like the following example. For recovery_date, specify the date when IBM Knowledge Catalog was restored from the backup image. Any jobs that were started before the specified date are restarted automatically.

{
  "recovery_date": "2022-05-05T01:00:00Z",
  "pending_type": "running"
}

5.6 Restarting IBM Knowledge Catalog metadata enrichment jobs

After IBM Software Hub is restored, running metadata enrichment jobs might not complete successfully. Such jobs must be manually restarted.

To restart a metadata enrichment job, do the following steps:
  1. In IBM Knowledge Catalog, open the project that contains the metadata enrichment asset.
  2. Select the asset.
  3. Click the Button to start or delete an enrichment job. button of the asset and then click Enrich to start a new enrichment job.

5.7 Rerunning IBM Knowledge Catalog lineage data import jobs

If a lineage data import job is running at the same time that an online backup is taken, the job is in a Complete state when the backup is restored. However, users cannot see lineage data in the catalog. Rerun the lineage import job.

5.8 Restarting IBM Knowledge Catalog lineage pods

After a restore, restart the following lineage pods so that you can access lineage data from the knowledge graph:
  • wkc-data-lineage-service-xxx
  • wdp-kg-ingestion-service-xxx
Do the following steps:
  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator:
    ${OC_LOGIN}
    Remember: OC_LOGIN is an alias for the oc login command.
  2. Restart the wkc-data-lineage-service-xxx pod:
    oc delete -n ${PROJECT_CPD_INST_OPERANDS} "$(oc get pods -o name -n ${PROJECT_CPD_INST_OPERANDS} | grep wkc-data-lineage-service)"
  3. Restart the wdp-kg-ingestion-service-xxx pod:
    oc delete -n ${PROJECT_CPD_INST_OPERANDS} "$(oc get pods -o name -n ${PROJECT_CPD_INST_OPERANDS} | grep wdp-kg-ingestion-service)"

5.9 Retraining existing watsonx Assistant skills and creating secrets to connect to Multicloud Object Gateway

After restoring the watsonx Assistant backup, it is necessary to retrain the existing skills. This involves modifying a skill, to trigger training. The training process for a skill typically requires less than 10 minutes to complete. For more information, see the Retraining your backend model section in the IBM Cloud documentation.

In addition, create the secrets that watsonx Assistant uses to connect to Multicloud Object Gateway. Do the following steps:
  1. Log the cpd-cli in to the Red Hat OpenShift Container Platform cluster:
    ${CPDM_OC_LOGIN}
    Remember: CPDM_OC_LOGIN is an alias for the cpd-cli manage login-to-ocp command.
  2. Get the names of the secrets that contain the NooBaa account credentials and certificate:
    oc get secrets --namespace=openshift-storage
  3. Set the following environment variables based on the names of the secrets on your cluster.
    1. Set NOOBAA_ACCOUNT_CREDENTIALS_SECRET to the name of the secret that contains the NooBaa account credentials. The default name is noobaa-admin.

      If you created multiple backing stores, ensure that you specify the credentials for the appropriate backing store.

      export NOOBAA_ACCOUNT_CREDENTIALS_SECRET=<secret-name>
    2. Set NOOBAA_ACCOUNT_CERTIFICATE_SECRET to the name of the secret that contains the NooBaa account certificate. The default name is noobaa-s3-serving-cert.
      export NOOBAA_ACCOUNT_CERTIFICATE_SECRET=<secret-name>
  4. Create the secrets that watsonx Assistant uses to connect to Multicloud Object Gateway.
    1. Run the setup-mcg command to create the secrets:
      cpd-cli manage setup-mcg \
      --components=watson_assistant \
      --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
      --noobaa_account_secret=${NOOBAA_ACCOUNT_CREDENTIALS_SECRET} \
      --noobaa_cert_secret=${NOOBAA_ACCOUNT_CERTIFICATE_SECRET}
    2. Wait for the cpd-cli to return the following message before proceeding to the next step:
      [SUCCESS] ... setup-mcg completed successfully.
    3. Confirm that the secrets were created in the operands project for the instance:
      oc get secrets --namespace=${PROJECT_CPD_INST_OPERANDS} \
      noobaa-account-watson-assistant \
      noobaa-cert-watson-assistant \
      noobaa-uri-watson-assistant
    4. If the command returns Error from server (NotFound), re-run the setup-mcg command.
  5. If present, delete the following resources that connect to Multicloud Object Gateway.
    Tip: After these resources are deleted, they are recreated with the updated object store secrets.
    1. Set the instance name environment variable to the name that you want to use for the service instance.
      export INSTANCE=<Watson_Assistant_Instance_Name>
    2. If they are present, delete the following resources:
      oc delete job $INSTANCE-create-bucket-store-cos-job
      oc delete secret registry-$INSTANCE-clu-training-$INSTANCE-dwf-training
      oc delete job $INSTANCE-clu-training-update

5.10 Creating secrets to connect Watson Discovery to Multicloud Object Gateway

Create the secrets that Watson Discovery uses to connect to Multicloud Object Gateway.
  1. Log the cpd-cli in to the Red Hat OpenShift Container Platform cluster:
    ${CPDM_OC_LOGIN}
    Remember: CPDM_OC_LOGIN is an alias for the cpd-cli manage login-to-ocp command.
  2. Get the names of the secrets that contain the NooBaa account credentials and certificate:
    oc get secrets --namespace=openshift-storage
  3. Set the following environment variables based on the names of the secrets on your cluster.
    1. Set NOOBAA_ACCOUNT_CREDENTIALS_SECRET to the name of the secret that contains the NooBaa account credentials. The default name is noobaa-admin.

      If you created multiple backing stores, ensure that you specify the credentials for the appropriate backing store.

      export NOOBAA_ACCOUNT_CREDENTIALS_SECRET=<secret-name>
    2. Set NOOBAA_ACCOUNT_CERTIFICATE_SECRET to the name of the secret that contains the NooBaa account certificate. The default name is noobaa-s3-serving-cert.
      export NOOBAA_ACCOUNT_CERTIFICATE_SECRET=<secret-name>
  4. Create the secrets that Watson Discovery uses to connect to Multicloud Object Gateway.
    1. Run the setup-mcg command to create the secrets:
      cpd-cli manage setup-mcg \
      --components=watson_discovery \
      --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
      --noobaa_account_secret=${NOOBAA_ACCOUNT_CREDENTIALS_SECRET}
      --noobaa_cert_secret=${NOOBAA_ACCOUNT_CERTIFICATE_SECRET}
    2. Wait for the cpd-cli to return the following message before proceeding to the next step:
      [SUCCESS] ... setup-mcg completed successfully.
    3. Confirm that the secrets were created in the operands project for the instance:
      oc get secrets --namespace=${PROJECT_CPD_INST_OPERANDS} \
      noobaa-account-watson-discovery
    4. If the command returns Error from server (NotFound), re-run the setup-mcg command.

5.11 Creating secrets to connect watsonx Orchestrate to Multicloud Object Gateway

Create the secrets that watsonx Orchestrate uses to connect to Multicloud Object Gateway.
  1. Log the cpd-cli in to the Red Hat OpenShift Container Platform cluster:
    ${CPDM_OC_LOGIN}
    Remember: CPDM_OC_LOGIN is an alias for the cpd-cli manage login-to-ocp command.
  2. Get the names of the secrets that contain the NooBaa account credentials and certificate:
    oc get secrets --namespace=openshift-storage
  3. Set the following environment variables based on the names of the secrets on your cluster.
    1. Set NOOBAA_ACCOUNT_CREDENTIALS_SECRET to the name of the secret that contains the NooBaa account credentials. The default name is noobaa-admin.

      If you created multiple backing stores, ensure that you specify the credentials for the appropriate backing store.

      export NOOBAA_ACCOUNT_CREDENTIALS_SECRET=<secret-name>
    2. Set NOOBAA_ACCOUNT_CERTIFICATE_SECRET to the name of the secret that contains the NooBaa account certificate. The default name is noobaa-s3-serving-cert.
      export NOOBAA_ACCOUNT_CERTIFICATE_SECRET=<secret-name>
  4. Create the secrets that watsonx Orchestrate uses to connect to Multicloud Object Gateway.
    1. Run the setup-mcg command to create the secrets:
      cpd-cli manage setup-mcg \
      --components=watsonx_orchestrate \
      --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
      --noobaa_account_secret=${NOOBAA_ACCOUNT_CREDENTIALS_SECRET}
      --noobaa_cert_secret=${NOOBAA_ACCOUNT_CERTIFICATE_SECRET}
    2. Wait for the cpd-cli to return the following message before proceeding to the next step:
      [SUCCESS] ... setup-mcg completed successfully.
    3. Confirm that the secrets were created in the operands project for the instance:
      oc get secrets --namespace=${PROJECT_CPD_INST_OPERANDS} \
      noobaa-account-watson-orchestrate
    4. If the command returns Error from server (NotFound), re-run the setup-mcg command.
  5. If you haven't already done so, create the secrets to connect watsonx Assistant to Multicloud Object Gateway.

5.12 Creating secrets to connect Watson Speech services to Multicloud Object Gateway

Some Watson Speech services pods might be in an Error state because they cannot connect to Multicloud Object Gateway.

Create the secrets that Watson Speech services uses to connect to Multicloud Object Gateway.
  1. Log the cpd-cli in to the Red Hat OpenShift Container Platform cluster:
    ${CPDM_OC_LOGIN}
    Remember: CPDM_OC_LOGIN is an alias for the cpd-cli manage login-to-ocp command.
  2. Get the names of the secrets that contain the NooBaa account credentials and certificate:
    oc get secrets --namespace=openshift-storage
  3. Set the following environment variables based on the names of the secrets on your cluster.
    1. Set NOOBAA_ACCOUNT_CREDENTIALS_SECRET to the name of the secret that contains the NooBaa account credentials. The default name is noobaa-admin.

      If you created multiple backing stores, ensure that you specify the credentials for the appropriate backing store.

      export NOOBAA_ACCOUNT_CREDENTIALS_SECRET=<secret-name>
    2. Set NOOBAA_ACCOUNT_CERTIFICATE_SECRET to the name of the secret that contains the NooBaa account certificate. The default name is noobaa-s3-serving-cert.
      export NOOBAA_ACCOUNT_CERTIFICATE_SECRET=<secret-name>
  4. Create the secrets that the Watson Speech services use to connect to Multicloud Object Gateway.
    1. Run the setup-mcg command to create the secrets:
      cpd-cli manage setup-mcg \
      --components=watson_speech \
      --cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
      --noobaa_account_secret=${NOOBAA_ACCOUNT_CREDENTIALS_SECRET} \
      --noobaa_cert_secret=${NOOBAA_ACCOUNT_CERTIFICATE_SECRET}
    2. Wait for the cpd-cli to return the following message before proceeding to the next step:
      [SUCCESS] ... setup-mcg completed successfully.
    3. Confirm that the secrets were created in the operands project for the instance:
      oc get secrets --namespace=${PROJECT_CPD_INST_OPERANDS} \
      noobaa-account-watson-speech
    4. If the command returns Error from server (NotFound), re-run the setup-mcg command.
To enable the upload models and voices job pods to run again with the updated secrets, delete them:
oc get po -l 'app.kubernetes.io/component in (stt-models, tts-voices)' -n ${PROJECT_CPD_INST_OPERANDS} | grep ${CUSTOM_RESOURCE_SPEECH}

5.13 Restoring services that do not support online backup and restore

The following list shows the services that don't support online backup and restore. If any of these services are installed in your IBM Software Hub deployment, actions must be taken after an online backup is restored to make them functional.
Data Gate
Data Gate synchronizes Db2 for z/OS data in real time. After IBM Software Hub is restored, data might be out of sync from Db2 for z/OS. It is recommended that you re-add tables after IBM Software Hub foundational services are restored.
MANTA Automated Data Lineage
The service is functional and data can be re-imported. For information about importing data, see Managing existing metadata imports (IBM Knowledge Catalog).
MongoDB
The service must be deleted and reinstalled. Recreate the instance as a new instance, and then restore the data with MongoDB tools. For more information, see Installing the MongoDB service and Back Up and Restore with MongoDB Tools.