IBM Support

Ragged Commit Boundary

News


Abstract

This document describes how to create and use ragged commit boundaries using FlashCopy. Ragged commit boundaries allow APYJRNCHG to be used to recover files to get out of the partial transaction state the files are in.

Content

FlashCopy is often used to make a point in time copy of data residing on a system or an independent auxiliary storage pool (IASP). The copied data is accessible as long as the copy target system is active or the IASP is varied on. In this document, the term copy system will be used to refer to the target of the FlashCopy operation. A common usage of a copy system is to offload save processing from a production system. Saves can be initiated on the copy system without worrying about objects being locked or in use by production jobs and there is no need to use Save While Active (SWA).

One downside to offloading saves to a copy system is that there is no save date/time recorded for objects on the production system. This lack of a save date/time setting can surface as a problem later if it is necessary to recover objects by applying journal entries from the production system. An Apply Journal Change (APYJRNCHG) request typically starts from the *LASTSAVE entry and applies entries that occurred after an object was last saved. But in this scenario, there is no *LASTSAVE entry on the production journal. The correct starting journal entry can be deduced but this can be a significant manual process when there are many objects that need to be recovered.

Objects saved on the copy system can be affected when that system is reset either by an IPL or a vary on of the IASP after the FlashCopy operation. The copy system reset will cause any uncommitted transactions at the time of the FlashCopy operation to be rolled back to get the database back to a known, consistent state. This database state is the correct starting point if it’s necessary to recover objects in the future with the APYJRNCHG command. This highlights a second potential disadvantage to offloading saves in that it can make recovery via APYJRNCHG difficult when there are many objects or transactions. That difficulty is because there is no single journal entry from which an APYJRNCHG can be started.

This difficult recovery issue can be avoided by quiescing the production system with the Change ASP Activity (CHGASPACT) command before initiating the FlashCopy request. This quiesce operation guarantees that all transactions are complete and there will be a well-defined journal entry created before the FlashCopy runs. After the FlashCopy, the copy system will have a well-defined journal entry. So, when data is saved off the copy system the save will contain the well-defined journal entry which will facilitate an easier database recovery with APYJRNCHG.

However, in some production environments, a full quiesce via a CHGASPACT OPTION(*SUSPEND) request may not be tolerable because of the disruption it causes to the production workload. A full quiesce can disrupt production workloads since no new transactions are allowed to start when this quiesce is active. To address this concern, IBM i 7.5 includes new support which enables a much less disruptive quiesce operation with a new Suspend Option (SSPOPT) parameter.  Here’s an example of a quiesce operation using this new parameter: CHGASPACT OPTION(*SUSPEND) SSPOPT(*DDL). The *DDL option on the new parameter will result in the CHGASPACT quiesce request failing if it detects open transactions containing SQL Data Definition Language (DDL) operations such as an SQL CREATE, ALTER or DROP.  If no DDL operations are in progress nor are attempted during the time of the quiesce, the new *DDL option will have no impact to the production system. Open database transactions without DDL operations are not suspended when the *DDL option is specified on the CHGASPACT.  If the *DDL quiesce request fails, it is expected that the quiesce can be attempted again by the user a few seconds or minutes later. For this reason, it’s recommended that a small time-out value be specified for the SSPTIMO parameter on the CHGASPACT command along with specifying a value of *END for the SSPTIMOACN parameter. Once the quiesce with the *DDL option is successfully performed, any new DDL operations will be held until a CHGASPACT OPTION(*RESUME) is performed. Once this state is achieved, a FlashCopy can be initiated and database transactions with non-DDL operations will continue without interruption. Once the FlashCopy is complete, the production system is resumed with the CHGASPACT OPTION(*RESUME) command which will enable DDL operations to continue. With the new option, the duration of the DDL-only quiesce is determined by the duration of the FlashCopy operation.  This duration is expected to be tolerable for most users.

When the copy system is reset after receiving a copy made with the new DDL quiesce option, no transactions will be rolled back. Because no transactions are rolled back, the copy system or IASP will match the state of the production system or IASP just prior to the FlashCopy. If it is necessary to recover objects on the copy system later with the APYJRNCHG command there will be a single journal entry on the production system journal from which the apply can be started.

A copy system created with a copy made with the new DDL-only quiesce is unique because some objects will have a "ragged” commit boundary. The term “ragged” is used to describe database transactions that were uncommitted at the time the copy was performed. Objects will be marked as having partial transactions similarly to how they would be marked if they had been restored after having been saved using SWA. Journal recovery must be performed with the APYJRNCHG command to resolve any open transactions before any other changes are allowed to these objects. Receivers can be restored from the production system to this copy system and the journal apply processing will make the journaled objects current. Alternatively, objects from this copy system may be saved and restored back to the production system or wherever the required receivers are located and apply processing will make the journaled objects current.

If the receivers are not available and objects in the ragged state are needed for recovery, manual intervention would be required to remove the objects from the ragged state and determine the validity of the data in those objects. For this reason, it may be desirable to have an additional environment where a traditional FlashCopy is used to create a copy system where an IPL or IASP vary on automatically rolls back partial transactions to transaction boundaries.

In order to use the new IBM i 7.5 DDL-only quiesce support, the Change ASP Attribute (CHGASPA) command must first be used to specify the system serial number and partition ID of the copy system where the ragged commit boundary function is to be used. The CHGASPACT command is used to perform the DDL quiesce prior to the FlashCopy. During an IPL or IASP vary on of the copy system, the system and ASP information is checked.  If the information matches the copy system attributes specified on the CHGASPA command and the DDL quiesce indicator is set, then no open transactions are rolled back during the reset of the copy system. The system information can be set once on the production system via CHGASPA. However, if the new quiesce behavior is desired, then the DDL quiesce option must be specified on the CHGASPACT command prior to each time a FlashCopy is performed.

When using the new option, a new receiver is attached to each user journal during a reset of the copy system and there will be a break in the current receiver chain prior to this new receiver. This chain break is required to allow journal receivers saved on the source system to be restored and associated correctly with the journal on the copy system. If journal recovery is attempted, it will be necessary to manually create and attach a new receiver for each journal on the copy system so that that there will be no collision with a journal receiver that may be restored from the source system. Specifically, the next numbered receiver should not be used, a uniquely named receiver should be created.

If journal recovery will be performed on the copy system, it is necessary to manually identify the journal sequence number that will be used as the starting point for the APYJRNCHG. The first entry deposited into the receiver created by the system during a copy system reset (IPL or IASP vary on) is a J IA or J UA entry. Record the sequence number of this first entry. There will be a different journal entry with this sequence number on the source system, but that is the starting journal entry to use for the APYJRNCHG. Delete receivers that will be restored over from the source system including the receiver attached at the time of the FlashCopy. Only the manually created receiver with the unique name will remain. Restore receivers from the source system. This must be done for each journal for which recovery will be attempted.

Example for an IASP:

On source system:

First, update the ASP attributes with details on the copy system that will be receiving the ragged copy.  Specify the copy system serial number and partition ID on the LPARSER & LPARID parameters (on copy system, you can use CALL QSYS/QLZARCAPI to obtain the partition ID):
1) CHGASPA ASPDEV(<IASP name>) NOCMTBDY(*YES) LPARSER(<serial #>) LPARID(<LPAR #>)
Quiesce DDL operations:
2) CHGASPACT ASPDEV(<IASP name>) OPTION(*SUSPEND) SSPOPT(*DDL) SSPTIMO(30) SSPTIMOACN(*END)
When DDL quiesce is successful, initiate the FlashCopy:
3) Perform FlashCopy
Resume DDL operations:
4) CHGASPACT ASPDEV(<IASP name>) OPTION(*RESUME)
Continue with production activity:
5) Save journal receivers on source that may be needed for recovery including receiver attached at time of FlashCopy
If recovery is necessary, do the following on copy system:
6) Vary on IASP
7) Find J UA entry in detached receiver for each user journal, record sequence number for APYJRNCHG
8) Create uniquely named receiver for each journal and attach via CHGJRN
9) Save all libraries that might be affected by an APYJRNCHG (not required but allows additional apply attempts if errors occur)
10) Delete receivers that will be restored over from the source system including the receiver attached at FlashCopy time (save this receiver before deleting in case it is needed later)
11) Restore receivers saved from source system
12) Issue correct APYJRNCHG command specifying any receivers that include open commit transactions and the correct starting sequence number recorded in step 7
Objects are now available for use.

[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"ARM Category":[{"code":"a8m0z0000000CRZAA2","label":"IBM i Db2-\u003ECommit \/ Rollback"}],"ARM Case Number":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"7.5.0"}]

Document Information

Modified date:
16 May 2022

UID

ibm16579507