Backup & Restore service issues
Use this troubleshooting information to resolve install and upgrade problems that are related to Backup & Restore service.
- Installation and upgrade in CrashLoopBackOff error state
- Backup of virtual machines created on Red Hat® OpenShift® Container Platform 4.13 fail after upgrade to 4.14
- Backup & Restore server installation throws an Invalid Input error
- Backup & Restore hub service in a custom namespace
- Backup & Restore stuck at 5% during upgrade
- MongoDB pod crashes with CrashLoopBackOff status
- Storage-operator pod crashes
with
CrashLoopBackOffstatus - Pods in Crashloopbackoff state after upgrade
- Backup & Restore service goes into unknown state
- Installation and upgrade in CrashLoopBackOff error state
- When you install or upgrade to IBM Storage Fusion 2.7.2,
you may encounter an issue wherein the
application-controllerandtransaction-managerpods in theibm-backup-restorenamespace get stuck inCrashLoopBackOffstatus. This may occur when your cluster uses kubeconfig. To confirm whether you are affected by this issue, you can check theapplication-controllerandtransaction-managerpod logs. A sample error is as follows:UnicodeDecodeError: utf-8' codec can't decode byte 0x98 in position 1: invalid start byte
As a resolution, apply hot fix.- Make a note of the image location. Replace the sample image locations with the actual hotfix image location.
- Update the transaction
manager:
oc set image deployment/transaction-manager transaction-manager=cp.icr.io/cp/fbr/guardian-transaction-manager:2.7.2-15294285-33585 -n ibm-backup-restore - Update the application
controller:
oc set image deployment/application-controller application-controller=cp.icr.io/cp/fbr/guardian-transaction-manager:2.7.2-15294285-33585 -n ibm-backup-restore
Note: The two pods use the same image, so the value for the image is identical for both commands.
- Backup of virtual machines created on Red Hat OpenShift Container Platform 4.13 fail after upgrade to 4.14
-
Backup of virtual machines created on Red Hat OpenShift Container Platform version 4.13 fail after the cluster gets upgraded to OpenShift Container Platform version 4.14 and OpenShift Virtualization to 4.14. The backup job details show either of the following errors:
- Failed transferring data
- Virtual machine snapshots failed
Cause:
The backups fail because the virtual machine snapshots time out after 30 minutes.
Resolution:
Stop and start the virtual machine and retry the backup.
- Backup & Restore server installation throws an Invalid Input error
-
If the Backup & Restore service deployment fails with the following error in the IBM Storage Fusion user interface, try with a private browser window.
Install Error: Invalid Input: Both the api Server and bootstrapToken fields need to be populated
- Backup & Restore hub service in a custom namespace
- In general, the Backup & Restore service is installed
or upgraded to the default
ibm-backup-restorenamespace. To avoid issues during the installation or upgrade of the service with custom namespace, do the resolution steps.Resolution- Go to .
- Open the
ibm-backup-restore-serviceCR and navigate to the YAML tab. - In spec.onboarding.parameters, search for parameter with name as
namespace and change the defaultValue to the
custom-namespace where the service must be installed.Example:
parameters: - dataType: string name: namespace defaultValue: <custom-namespace> userInterface: false required: true descriptionCode: BMYSRV00003 displayNameCode: BMYSRV00004 - Store the following YAML in server-fsi.yaml, and replace
<custom-namespace>with the namespace where the service must be installed.apiVersion: service.isf.ibm.com/v1 kind: FusionServiceInstance metadata: name: ibm-backup-restore-service-instance namespace: ibm-spectrum-fusion-ns spec: parameters: - name: doInstall provided: false value: 'true' - name: namespace provided: false value: <custom-namespace> - name: storageClass provided: true value: lvms-lmvg serviceDefinition: ibm-backup-restore-service triggerUpdate: false enabled: true doInstall: true - Run the following command to apply the changes:
oc apply -f server-fsi.yaml - For the upgrade procedure, upgrade the IBM Storage Fusion operator, repeat step 1 of the resolution, and then upgrade the service.
- Backup & Restore service installation gets stuck after upgrade to IBM Storage Fusion 2.7.0
- If the installation or upgrade of the Backup & Restore service does not reach 100% completion, then check for failed startup probes. Run the following
command to look for any pods that are not in the READY
state:
Example output:oc get pods -n ibm-backup-restoreNAME READY STATUS RESTARTS AGE applicationsvc-855746ffbf-2pmz7 0/1 Running 5 (2m8s ago) 17m ... job-manager-845dc56b8d-r5w6j 0/1 Running 5 (2m50s ago) 17m ...
If found, then describe each of the pods and look at the event section to determine if there is a startup probe failure.
Example output showing probe failure:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- ... Warning Unhealthy 3m40s (x5 over 6m40s) kubelet Startup probe failed: HTTP probe failed with statuscode: 503
If one or more pods show startup probe failures, then patch the probes to provide additional start time. Update the following script and run to patch the probes. To determine the respective deployment names, update the DEPLOYMENT_NAMES variable with the list of deployments that have failing startup probes and run the script.oc get deployment -n ibm-backup-restoreExample output:
Now, check whether the pods are online. If the startup probes continue to fail, then increase the STARTUP_DELAY_SEC parameter and try again.NAME READY UP-TO-DATE AVAILABLE AGE ... applicationsvc 0/1 1 1 12h ... job-manager 0/1 1 1 12h ...
If it is an upgrade and the pods come online, but the progress does not reach 100% in the IBM Storage Fusion user interface even after 30 minutes, then you must update the Backup & Restore Server CR to retrigger the upgrade:- Go to OpenShift Container Platform.
- Go to the Installed Operators view with the selected
ibm-backup-restoreproject or namespace. - Click IBM Storage Fusion Backup & Restore Server.
- Select the Data Protection Server tab and click the instance.
- Select the YAML tab and update the
following:
totriggerUpgrade: falsetriggerUpgrade: true - Save and reload the YAML.
After some time, all pods must be READY and the Backup & Restore service must show healthy in IBM Storage Fusion user interface.
- Backup & Restore stuck at 5% during upgrade
- Diagnosis:
- In OpenShift user interface, go to the
Installed Operator and filter on
ibm-backup-restorenamespace. - Click
IBM Storage Fusion Backup and Restore Server. - Go to Subscriptions.
- If you see the following errors, then do the workaround steps to resolve the issue:
"error validating existing CRs against new CRD's schema for "guardiancopyrestores.guardian.isf.ibm.com": error validating custom resource against new schema for GuardianCopyRestore"
Resolution:- From command line or command prompt, log in to the cluster and run the following commands to
delete the
guardiancopyrestoreCRs and 2.6.0 cs:oc -n ibm-backup-restore delete guardiancopyrestore.guardian.isf.ibm.com --all oc -n ibm-backup-restore delete csv guardian-dm-operator.v2.6.0 - From OpenShift Container Platform console, go to
Installed Operator and filter on
ibm-backup-restorenamespace. - Click
IBM Storage Fusion Backup and Restore Server. - Go to Subscriptions.
- Find the failing
installplanand delete it. - Go to Installed Operator and go to
ibm-backup-restorenamespace. - Find
IBM Storage Fusion Backup and Restore Serverand click Upgrade available and approve the Install plan forIBM Storage Fusion Backup and Restore Server. - Wait for the service upgrade to resume.
- In OpenShift user interface, go to the
Installed Operator and filter on
- MongoDB pod crashes with
CrashLoopBackOffstatus - The MongoDB pod crashes due to OOM error. To resolve the error, increase the memory limit from
256Mi to 512Mi. Do the following steps to change the memory limit:
- Log in to the Red Hat OpenShift web console as an administrator.
- Go to .
- Select the project
ibm-backup-restore. - Select the MongoDB pod, and go to the YAML tab.
- In the YAML, change the memory limit for MongoDB container from 256Mi to 512Mi.
- Click Save.
- storage-operator pod crashes with
CrashLoopBackOffstatus -
The storage-operator pod crashes due to OOM error. To resolve the error, increase the memory limit from 300Mi to 500Mi.
- Log in to the Red Hat OpenShift web console as an administrator.
- Select the project
ibm-spectrum-fusion-ns. - Go to , click IBM Storage Fusion, and go to the YAML tab.
- Search for
control-plane: isf-storage-operatorin the YAML file. - In the YAML, change the memory limit for
isf-storage-operatorcontainer from 300Mi to 500Mi.containers: - resources: limits: cpu: 100m memory: 500Mi - Click Save.
- Wait until a new
isf-storage-operatorpod comes up.
- Pods in Crashloopbackoff state after upgrade
- The Backup & Restore service health changes to
unknown and two pods go into Crashlookbackoff state.
Resolution:
In the resource settings of
guardian-dp-operatorpod that resides inibm-backup-restorenamespace, set the value of IBM Storage Fusion operator memory limits to 1000 mi.Example:resources: limits: cpu: 1000m memory: 1000Mi requests: cpu: 500m memory: 250Mi
- Backup & Restore service goes into unknown state
- This Backup & Restore service might go into unknown
state when you upgrade IBM Storage Fusion from 2.6.x to 2.7.0.
Resolution:
It automatically shows healthy on the IBM Storage Fusion user interface after you upgrade the Backup & Restore service.