Resynchronization problem resolution

While most resynchronization operations will succeed, there are sequences of operations on certain objects that will fail even if the objects are processed in tracking time order.

Most of these cases involve a save/restore entry and/or a move or rename operation. For these situations, user intervention is necessary.

In this example, DATALIB is a library that is always included in replication. There are no specific rules for any *FILE objects in the library, so all eligible files are always included in replication. Suppose the following operations are performed on the primary node while in TRACKING state, with entries added to the OTL to track the changes:
  1. Restore table DATALIB/SIMPLE1 (SIMPLE1 did not exist prior to the restore).
  2. Create a view DATALIB/SIMPLE1V that depends on SIMPLE1.
  3. Rename table DATALIB/SIMPLE1 to SIMPLE2.

When resynchronization is attempted, the save/restore entry cannot be processed because SIMPLE1 no longer exists. Since the object cannot be found on the primary or secondary node, the save/restore entry is deferred. When the CREATE VIEW is attempted it also fails because SIMPLE1 cannot be found. Finally, the rename operation also fails because SIMPLE1 cannot be found. The following figure shows the errors as they are recorded in the OTL. The errors can be found in the resynchronization joblog to provide additional information to help identify the problem.

Figure 1. OTL errors from failed resync
OTL errors from failed resync

During resynchronization it is possible that one or more objects will have been successfully restored before encountering a failure, such as the CPF3204 encountered for SIMPLE1V described here. In those cases we want to perform the steps outlined in the example against all objects including any dependencies of the objects that encompass the errors recorded in the OTL.

In this specific case, we are only concerned with the files associated with the renamed table, the view, and getting those objects replicated to the secondary node. The most efficient way to resolve this problem requires the following steps:
  1. Change the RCL to exclude the file’s old name (SIMPLE1), new name (SIMPLE2), and the view name (SIMPLE1V). This will remove any entries for these objects from the OTL. This can be done though the Db2® Mirror GUI or by calling the following procedures:
    CALL QSYS2.ADD_REPLICATION_CRITERIA(INCLUSION_STATE => 'EXCLUDE',
                                        IASP_NAME       => '*SYSBAS',
                                        LIBRARY_NAME    => 'DATALIB',
                                        OBJECT_TYPE     => '*FILE',
                                        OBJECT_NAME     => 'SIMPLE1',
                                        APPLY           => 'PENDING',
                                        APPLY_LABEL     => 'FIXOTL');
    
    CALL QSYS2.ADD_REPLICATION_CRITERIA(INCLUSION_STATE => 'EXCLUDE',
                                        IASP_NAME       => '*SYSBAS',
                                        LIBRARY_NAME    => 'DATALIB',
                                        OBJECT_TYPE     => '*FILE',
                                        OBJECT_NAME     => 'SIMPLE2',
                                        APPLY           => 'PENDING',
                                        APPLY_LABEL     => 'FIXOTL');
    
    CALL QSYS2.ADD_REPLICATION_CRITERIA(INCLUSION_STATE => 'EXCLUDE',
                                        IASP_NAME       => '*SYSBAS',
                                        LIBRARY_NAME    => 'DATALIB',
                                        OBJECT_TYPE     => '*FILE',
                                        OBJECT_NAME     => 'SIMPLE1V',
                                        APPLY           => 'PENDING',
                                        APPLY_LABEL     => 'FIXOTL');
    
    CALL QSYS2.PROCESS_PENDING_REPLICATION_CRITERIA(IASP_NAME    => '*SYSBAS',
                                                    APPLY_ACTION => 'COMMIT',
                                                    APPLY_LABEL  => 'FIXOTL'); 
    
    At this point you can examine the OTL to confirm that all error entries have been removed or deferred.
  2. If any objects that were just excluded from replication exist on the secondary node, delete the object from that node. Prior to deleting the object, ensure that all dependent objects are also excluded from replication so that any cascaded deletes do not affect those replicated objects.
  3. Now that the error entries for these objects have been removed from the OTL and any objects have been cleaned up from the secondary node, change the RCL to include all files in DATALIB for replication once again. This can be done though the Db2 Mirror GUI or by calling the following procedures:
    CALL QSYS2.REMOVE_REPLICATION_CRITERIA(IASP_NAME       => '*SYSBAS',
                                           LIBRARY_NAME    => 'DATALIB',
                                           OBJECT_TYPE     => '*FILE',
                                           OBJECT_NAME     => 'SIMPLE1V',
                                           APPLY           => 'PENDING',
                                           APPLY_LABEL     => 'FIXOTL');
    
    CALL QSYS2.REMOVE_REPLICATION_CRITERIA(IASP_NAME       => '*SYSBAS',
                                           LIBRARY_NAME    => 'DATALIB',
                                           OBJECT_TYPE     => '*FILE',
                                           OBJECT_NAME     => 'SIMPLE2',
                                           APPLY           => 'PENDING',
                                           APPLY_LABEL     => 'FIXOTL');
    
    CALL QSYS2.REMOVE_REPLICATION_CRITERIA(IASP_NAME       => '*SYSBAS',
                                           LIBRARY_NAME    => 'DATALIB',
                                           OBJECT_TYPE     => '*FILE',
                                           OBJECT_NAME     => 'SIMPLE1',
                                           APPLY           => 'PENDING',
                                           APPLY_LABEL     => 'FIXOTL');
    
    CALL QSYS2.PROCESS_PENDING_REPLICATION_CRITERIA(IASP_NAME    => '*SYSBAS',
                                                    APPLY_ACTION => 'COMMIT',
                                                    APPLY_LABEL  => 'FIXOTL'); 
    
  4. Resume replication either from the Db2 Mirror GUI or by using the QSYS2.CHANGE_MIRROR procedure.
    CALL QSYS2.CHANGE_MIRROR(IASP_NAME =>'*SYSBAS',
                             REPLICATION_STATE=> 'RESUME');

Note that because of dependencies between files, it might be necessary (or at least better from a performance perspective) to also include certain dependent files. For example, assume that there is another table called COMPLEX1 and that a unique keyed logical file (LGLUNQ) exists over both SIMPLE1 and COMPLEX1. It is possible that duplicate key failures could occur if COMPLEX1 is not also excluded and then included again in the RCL. Even if duplicate key errors would not occur, the index for LGLUNQ would need to be rebuilt if COMPLEX1 is not also included.