Performing an external refresh
You can perform a refresh of one or more tables in a subscription using a third-party tool, and still integrate this activity into the refresh while active capability of InfoSphere® CDC. Such third-party tools might be a table unload at source and load at target, an application program that regenerates the content of a table being run against both the source and target tables, recovery of a table's content to a prior point in time at both the source and target, or even a refresh of the table using a subscription other than the one that will be mirroring the data. A refresh or reconstruction of a table by such a means is referred to as an external refresh.
- InfoSphere CDC for InfoSphere DataStage® (either Direct Connect or Flat File)
- InfoSphere CDC Event Server
- InfoSphere Classic CDC for z/OS®
In order to integrate the activity of the external refresh into the mirroring activity of the subscription, new command capabilities have been added to the products. To understand what these commands do, how to provide usable data to them, and what to avoid so that they won't be misused, it is necessary to understand the mechanism of refresh while active processing.
Understanding refresh while active
When a table within a subscription is set to Method:Mirror and Status:Refresh, it is readied for a refresh while active. When the subscription is started, as viewed externally, the table is refreshed and then mirroring of the tables in the subscription starts. When the subscription starts, it must necessarily start scraping from the log at a place prior to the place when the refresh was performed. For the table that was refreshed at the start of mirroring, changes scraped from the log are discarded until scraping reaches the place where the refresh had started. From that point forward, changes scraped for that table are forwarded to the target for application to the target table. Between the place in the log where the refresh started and the place where the refresh ended, a change scraped from the log could either have been applied to the table or not yet have been applied to the table by the time that portion of the table was refreshed. Thus the change may or may not have been forwarded to the target and been applied to the target table during the refresh. For this reason, changes scraped from the log between the places where the refresh ran are applied at the target with an error mitigation filter that suppresses operational errors (INSERT of a preexisting row, UPDATE or DELETE of a nonexisting row). These changes are said to be sent with the "indefinitely refreshed" indicator set. Once scraping proceeds past the place in the log where the refresh ended, the indefinitely refreshed indicator is no longer set, and operational errors are treated as harshly as any other.It is possible that changes that have been logged before the place in the log when the refresh started were not yet committed when the refresh started. Such changes would appear in the table after the COMMIT was issued, but would be discarded when they were scraped from the log because they were logged prior to the refresh starting place in the log. The capture of the table's content during the refresh could miss these changes, depending on how much time passes before the COMMIT is issued. This would cause the source and target tables to be unsynchronized. To avoid this, InfoSphere CDC uses methods based on the DBMS in use to establish a point of consistency for the table at the time that the refresh for a refresh while active is starting. For example, on the z/OS platform, before the refresh starts, a shared lock with a table scope is obtained on the table being refreshed. The current log write position is determined, then the lock is released. This forces any units of recovery that contain changes for the table being locked to complete (that is, a COMMIT). When the shared lock is released the refresh starts, and the log write position sampled while the lock was held will be used as the place for the start of the refresh. The shared lock is held for milliseconds at most, and should not be disruptive to normal application processing for the table.
Considerations for an external refresh
When an external refresh occurs, it involves the capturing of the source table's content and the writing of that content to the target table. The period when the source table's content is being captured is equivalent to the period during a refresh while active when the source table is being read. Depending on how the source and target tables are being re-synchronized outside of the subscription, this period could be significantly long (for example, using unload and load) or effectively instantaneous (for example, point-in-time recovery at both source and target).For InfoSphere CDC for z/OS, this period needs to be converted to a starting place and an ending place in the log, expressed in terms of log positions in the DBMS's log. The two (possibly identical) log positions are then provided to InfoSphere CDC using the command interface, and InfoSphere CDC updates the metadata with them as though they had originated from the execution of the refresh of a refresh while active. When the subscription starts to mirror, these two log positions describe the scrape point when discarding of the table's data stops and marking changes as 'indefinitely refreshed starts, and when marking changes as indefinitely refreshed stops.
For all other supported engines, the refresh while active period is determined by the current position of the log when the dmmarkexternalunloadstart and dmmarkexternalunloadend commands are issued.
In order for this update to the metadata to be effective and productive, the following have to be true:
- The subscription must not be active, as is always the case when a subscription's attributes are being modified.
- The first (earlier) of the two log positions must not be earlier than the log position where the subscription will start scraping. If this rule is not followed, the results are not predictable.
- Neither of the two log positions should be for a place that has not yet been written to the log. If this rule is not followed, the results are not predictable.
In addition to these considerations, there is also the issue of establishing a point of consistency at the start of capturing the source table for the external refresh. When performing a refresh while active, as explained above, a point of consistency for the table being refreshed is established by acquiring a shared lock with table scope. This method is unavailable when doing an external refresh, so other means of establishing a point of consistency should be employed. Sometimes, this will not be possible, such as for a point-in-time recovery at both the source and target. If it is possible, such as when performing an unload and reload, then it is strongly recommended that a point of consistency be established. This can be done by quiescing activity against the table, or simply by stopping all updating application programs. Sometimes it may not be necessary, such as when a program is used to regenerate the entire table content at both the source and target (assuming such a program would, by its action, implicitly lock the entire table). Failure to take the steps to establish a point of consistency could produce operational errors later, under the scenario described above.