IBM Support

Cleanup After Failed STRFSFLASH

General Page

This document provides options to the user in the event that a STRFSFLASH process fails to complete successfully

If the Full System FlashCopy (FSFC) process fails to complete, and has not triggered a failure that causes the STRFSFLASH job to end automatically, then the job is waiting for some process to finish.  In MOST cases, that job is waiting on the receipt of an ENDFSFLASH command from the controller, to signify the backup job(s) has/have completed, and continue any remaining tasks (e.g. transferring QUSRBRM if BRMS is used, shutting down the Target LPAR if specified in CSE DTA, etc)

If you have identified the reason for the "wait" and want to simply correct the issue and restart the process, perform the following steps:

Automated Option:

A) This automated option should only be used if the Target LPAR and Source LPAR can successfully communicate with one another, AND you want to salvage your backup by sending QUSRBRM data from the Target LPAR back to the Source LPAR.

Log into the Target LPAR. Issue the following command:

QZRDHASM/ENDFSFLASH ACTION(*SBMNORMAL)

B) This automated option should only be used if the Target LPAR and Controlling LPAR can successfully communicate with one another AND you want to just cleanup/restore your environment to get ready for the next STRFSFLASH.  Otherwise, see below for manual options for cleanup/restore.

Log into the Target LPAR.  Issue the following commands:

QZRDHASM/ENDFSFLASH ACTION(*FAILBKU)

This command and parameter can only be issued from the Target LPAR.  It is used to indicate to the Controlling LPAR that the backups are "completed", however there was a failure.  Commonly, the Controlling LPAR's last message in the ctl.log is "Waiting for ENDFSFLASH".  Issuing this command manually satisfies that requirement, allows the Controlling LPAR to log the failure correctly, and continue performing any remaining cleanup tasks.  The STRFSFLASH job (on the Controlling LPAR) will then complete successfully.

Note:  If using BRMS, this command will ALSO trigger the Source LPAR to release all functional locks in BRMS and restore for readiness.  By using this option, QUSRBRM will NOT be transferred back to the Source LPAR, as it may be desirable to not have partial data added to the BRMS database.

Manual Options:

In some cases, it is not possible to issue ENDFSFLASH ACTION(*FAILBKU) from the Target LPAR.  If the Target LPAR never IPL'ed (activated), or if the Target LPAR IPL'ed but there is a problem with communications between the Target LPAR and Controlling LPAR,

Target LPAR Never IPL'ed / No Save:

If the Target LPAR never IPL'ed, or the Controlling LPAR is waiting for the IPL which has gone into a loop (e.g. not finding load source, not enough resources, etc), then perform the following:

  • Identify the cause of the error and fix it, so that subsequent attempts are successful.
  • Ensure the Target LPAR is shut down (manually - Shutdown Immediate - through HMC)
  • On the Controlling LPAR, end the STRFSFLASH job (immediate is fine - this will log a "job cancelled" message in ctl.log)
  • If "Use BRMS integration" is set to *YES in the CSE DTA, then additionally, perform the following:

If "BRMS transfer method" is set to *ALL in CSE DTA, then log on to the Source LPAR and run the following:

QZRDHASM/ENDFSFLASH ACTION(*RSTFCNUSG)

If "BRMS transfer method" is set to *CHGONLY in CSE DTA, then log on to the Source LPAR and run the following:

QZRDHASM/ENDFSFLASH ACTION(*RSTFCNUSG) CONFIG(*MEDCLS) MEDCLS(media class)

Target LPAR IPL'ed but Cannot Communicate to Controlling LPAR:

If the Target LPAR IPL'ed, then it's possible the comm resource was set correctly and the backup device was varied on successfully, but there's simply a communication/network error between the Target LPAR and Controlling LPAR.  In this case, it's also possible for the backup to run successfully, but the Target LPAR is simply not able to inform the controller of that completion.  Log into the Target LPAR and determine whether the backup was successful or not.  Then perform the following steps:

Non-BRMS ("BRMS transfer method" is *NO in CSE DTA):

  • Identify the network/communication error and fix it, so that subsequent attempts are successful.
  • On the Controlling LPAR, end the STRFSFLASH job (immediate is fine - this will log a "job cancelled" message in ctl.log
  • If the save was successful, gather the tape and prepare a new tape for next backup.  If unsuccessful, determine if the tape is reusable and if desired, reinitialize and prepare the tape for next backup.

BRMS ("BRMS transfer method" is set to *YES in CSE DTA):

  • Identify the network/communication error and fix it, so that subsequent attempts are successful.
  • If the save was successful and you wish to get the updated QUSRBRM back to the Source LPAR, call Support and get assistance on the best method to do that, based on the nature of the issue, settings in CSE DTA and types of locks placed on BRMS on the Source LPAR.
  • If you do NOT wish to move the QUSRBRM updates back to the source (successful or failed save), then additionally perform the following steps:

On the Controlling LPAR, end the STRFSFLASH job (immediate is fine - this will log a "job cancelled" message in ctl.log

If "BRMS transfer method" is set to *ALL in CSE DTA, then log on to the Source LPAR and run the following:

QZRDHASM/ENDFSFLASH ACTION(*RSTFCNUSG)

If "BRMS transfer method" is set to *CHGONLY in CSE DTA, then log on to the Source LPAR and run the following:

QZRDHASM/ENDFSFLASH ACTION(*RSTFCNUSG) CONFIG(*MEDCLS) MEDCLS(media class)

Prior to re-issuing STRFSFLASH, it is always advisable to perform CHKFSFLASH, to determine if any other error conditions exist.  Once the CHKFSFLASH is successful, and media is ready for backup, it is safe to attempt another STRFSFLASH

[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Platform":[{"code":"PF012","label":"IBM i"}],"Version":"7.1.0"}]

Document Information

Modified date:
11 September 2023

UID

ibm11138156