Table compare utility (asntdiff)

The asntdiff command compares the columns in one table to their corresponding columns in another table and generates a list of differences between the two in the form of a Db2® table.

To use the compare utility, you run the asntdiff command and specify the name of a Q subscription (Q Replication) or subscription set member (SQL Replication) that contains the source and target tables that you want to compare. You can also use SQL statements in an input file to specify the tables to compare.

The following sections explain how to use the asntdiff command:

Overview of the asntdiff command
When to use the compare utility
Where differences are stored
Sample difference report
Required authorizations
Restrictions for key columns at source and target
Data type considerations
Effects of filtering
Comparisons based on queries instead of subscriptions
Comparing a subset of table rows

Overview of the asntdiff command

You can run the asntdiff command on Linux®, UNIX, Windows, and z/OS® operating systems. The command compares tables on Linux, UNIX, Windows, z/OS, or System i® operating systems. The asntdiff command can be used with federated sources and targets if the corresponding columns in the two tables have the same data types.

z/OS: The ASNTDIFF sample job in the SASNSAMP data set provides further information that is specific to the z/OS platform.

For Q Replication, the target must be a user copy table or a consistent-change-data (CCD) table that is condensed and complete. Stored procedure targets are not supported. For SQL Replication, the target must be a user table, point-in-time table, replica table, user-copy table, or consistent-change-data (CCD) table that is condensed and complete.

For SQL Replication, the Apply Control Server and the target server must be the same.

When you run the command, you specify an SQL WHERE clause that uniquely identifies the Q subscription or subscription set member:

Q Replication

The WHERE clause identifies a row in the IBMQREP_SUBS control table at the Q Capture server, based on the value of the SUBNAME column. For example:



where="subname = 'my_qsub'"

SQL Replication

The WHERE clause identifies a row in the IBMSNAP_SUBS_MEMBR table at the Apply control server, based on the value of the SET_NAME column. For example:



where="set_name = 'my_set' and source_table='EMPLOYEE'"

You might need to use more predicates in the WHERE clause to uniquely identify the subscription set member. For example, you might need to add the APPLY_QUAL, the SOURCE_OWNER, the TARGET_OWNER, or the TARGET_TABLE column from the IBMSNAP_SUBS_MEMBR table to the clause.

When to use the compare utility

The best time to use the utility is when the source and target tables are stable. You might want to run the utility when the Q Capture and Q Apply programs or Capture and Apply programs are idle. For example, you could run the utility when the Q Capture program reached the end of the Db2 recovery log and all changes are applied at the target. If applications are still updating the source, the comparison might not be accurate.

If the replication programs are running, you might need to run the asntdiff command more than once to get a complete picture of evolving differences between the source and target tables.

Where differences are stored

The asntdiff command creates a difference table in the source database or subsystem to store differences that it finds.

The difference table is named schema.ASNTDIFF, where schema is the value specified in the DIFF_SCHEMA parameter. If the schema is not specified, it defaults to ASN. You can also use the DIFF parameter to specify a table name.

By default, the difference table is created in the default Db2 user table space. You can specify a different, existing table space by using the DIFF_TABLESPACE parameter.

The difference table has two or more columns. One column is named DIFF, with a blank space at the end on Linux, UNIX, and Windows. The value in the DIFF column is a character that indicates an insert, update, or delete operation followed by a numeric value that indicates which table contains a row with differences. The other columns contain the value of replication key columns. There is one row in the difference table for each unmatched row in the target table.

The difference table uses three identifiers that indicate the operation that is needed to change the target table so that it matches the source table:

D (delete): Indicates that a row with the key value exists only at the target and not at the source.
U (update): Indicates that rows with the same key value exist at both the source and target, but at least one non-key column is different at the target.
I (insert): Indicates that a row with the key value exists only at the source and not at the target.

A value of ? 1 indicates that there is an invalid character in one or more source columns.

A value of ? 2 indicates that there is an invalid character in one or more target columns.

Example:

The following list of values is returned by comparing an EMPLOYEE table at the source with a target copy of the same table. The key column for replication is the employee number, EMPNO:



DIFF  EMPNO
U 2   000010
I 2   000020
I 2   000040
D 2   000045
I 2   000050
D 2   000055

The first row in the example shows that a row with the key value 000010 exists at both the source and target tables, but at least one non-key column at the target has a different value. The next two rows show that rows with the key values 000020 and 000040 exist only at the source. The fourth row shows that a row with the key value 000045 exists only at the target.

The values ? 1 and ? 2 are not shown in the example.

Sample difference report

After a table compare, the asntdiff command outputs a summary report to the console. The following example shows a sample difference report.

---------------------| Difference Report |---------------------
Start time:     2011-10-24-11.06.32.890004
End time:       2011-10-24-11.07.23.563205
Duration:       0.60042 second
SOURCE_SELECT:  SELECT * FROM src_table WHERE val!= 0.331 ORDER BY 2
TARGET_SELECT:  SELECT id, CAST(val AS DEC(9,3)) FROM trg_table
  WHERE CAST(val AS DEC(9,3)) != 0.330 ORDER BY 2
TDIFF table:    "asn"."diff_table"

Summary of compare
------------------
Number of different rows: 1
Number of common rows:    4

Difference details (rows)
-------------------------
Source total: 5
Target total: 5
Source only:  0
Target only:  0
Update:       1

Here is more information about the "Summary of compare" and "Difference details (rows)" sections of the report:

Number of different rows: The number of rows with differences that were identified ("Source only" + "Target only" + "Update").
Number of common rows: The number of rows that have the same key and non-key values between the source result set and target result set.
Source total: The number of rows in the source table (or the result set of the SOURCE_SELECT parameter).
Target total: The number of rows in the target table (or the result set of the TARGET_SELECT parameter).
Source only: The number of rows in the source result set and not in the target result set.
Target only: The number of rows in the target result set and not in the source result set.
Update: The number of rows in the two result sets with the same key values but differences in the non-key values.

Required authorizations

These database authorizations are required for the compare utility:

Access privileges to the tables that are being compared, and to the replication control tables unless the -f (file) option is used
Linux, UNIX, Windows: Read privileges for the password file if the PWDFILE keyword is used
WRITE privileges for the directory that is specified by the DIFF_PATH keyword
To create the difference table, CREATETAB authority on the source database and USE privilege on the table space. In addition, one of the following privileges is needed:
- IMPLICIT_SCHEMA authority on the database, if the implicit or explicit schema name of the table does not exist
- CREATEIN privilege on an existing schema if the table is created in this schema
On z/OS, if the user ID that runs asntdiff does not have authority to create tables, you can use the SQLID keyword to specify an authorization ID that can be used to create the difference table.
DROPIN privilege on the schema to drop the difference table unless DIFF_DROP=N is used
SELECT, DELETE, and INSERT privileges on the difference table (at the source). The default schema name is ASN and the default table name is ASNTDIFF.

Restrictions for key columns at source and target

The asntdiff utility supports multiple-byte character sets when the database is defined with SYSTEM or IDENTITY. However, the columns that are used as keys for replication at the source and target tables must use single-byte characters for the utility to compare the tables.

In a Linux, UNIX, or Windows database that uses Unicode, the characters in key data cannot be greater than the base U.S. English ASCII subset (first 256 ASCII characters) or the asntdiff utility cannot compare the tables.

Data type considerations

You need to consider the data types of the tables that you are comparing when using asntdiff.

Different data types in sources and targets

The compare utility can build two SELECT SQL statements that are based on the description of a subscription. To obtain the differences between the source and target tables, the utility compares the data that result from executing both statements. The data types and lengths of the columns for both SQL statements must be the same.

SQL Replication: The utility builds the SQL statement for the source by using the EXPRESSION column in the IBMSNAP_SUBS_COLS table.
Q Replication: The data types for both the source and the target must be the same.

Unsupported data types

The compare utility does not support comparisons between the following data types:

Nonkey columns: DECFLOAT, BLOB_FILE, CLOB_FILE, DBCLOB_FILE
Key columns: DECFLOAT, BLOB, CLOB, DBCLOB, VARGRAPHIC, GRAPHIC, LONG_VARGRAPHIC, BLOB_FILE, CLOB_FILE, DBCLOB_FILE, XML

Comparing the GRAPHIC data type

Columns with the GRAPHIC data type at the source and target might not match when you use the utility to compare the source and target tables. Db2 columns with the GRAPHIC data type have blank padding after the graphic data. This padding might be single-byte or double-byte spaces, depending on the code page that the database was created in. This padding might cause data to not match between the source and the target tables, especially if the source and target tables are in different code pages. This padding applies only to GRAPHIC data types and not other graphic data types such as VARGRAPHIC or LONG VARGRAPHIC.

To compare columns with GRAPHIC data types, you must remove the blank padding in the data before you compare the source and target tables by using the Db2 scalar function rtrim(<column>. This function eliminates the code page differences for single-byte or double-byte spaces and ensures that the utility compares the GRAPHIC data in a consistent manner.

TIMESTAMP WITH TIMEZONE restriction

The compare utility does not support comparisons that involved the TIMESTAMP WITH TIMEZONE data type that was introduced in Db2 for z/OS Version 10.

Effects of filtering

In some cases, differences between source and target tables are intentional, for example, if you use a search condition in Q Replication to filter which rows are replicated. The utility will not show differences between source and target tables that are a result of predicates or suppressed deletes.

Row filtering

The compare utility uses information from the replication control tables to avoid showing intentional differences:

SQL Replication: The utility uses the PREDICATES column in the IBMSNAP_SUBS_MEMBR table to select rows from the source tables. The value of the UOW_CD_PREDICATES column is ignored (asntdiff looks directly at the source table, where the Apply program looks at the CD table).
Q Replication: The utility uses the value of the SEARCH_CONDITION column in the IBMQREP_SUBS table to build the WHERE clause for the SELECT statement.

Suppressed delete operations

In Q Replication, you can choose to suppress replication of delete operations from the source table. If you do not replicate delete operations, rows that exist in the target table might not exist in the source table. When the SUPPRESS_DELETES value for a Q subscription is Y, the asntdiff utility ignores the rows that are unique to the target and reports no differences. A warning is issued to indicate how many rows were suppressed.

The asntdiff -f (input file) option does not support SUPPRESS_DELETES because it bases the table comparison on a SQL SELECT statement rather than the Q subscription definition.

Comparisons based on queries instead of subscriptions

The asntdiff -f command option enables you to do differencing by using SQL SELECT statements that are read from an input file. This option provides greater flexibility to do differencing between two generic tables. The asntdiff -f option does not use replication definitions to determine which tables and rows to compare as the standard asntdiff command does.

The asntdiff -f option works for all tables on Linux, UNIX, Windows, and z/OS. For details on this option, see asntdiff –f (input file) command option.

In addition to the SELECT statements, the input file contains the source and target database information, the difference table information, and optional parameters that specify methods for processing the differences. You can use a password file that is created by the asnpwd command to specify a user ID and password for connecting to the source and target databases.

Note: To compare Db2 XML columns by using the asntdiff -f option, you need to serialize the XML column as a character large-object (CLOB) data type by using the XMLSERIALIZE scalar function. For example, this SELECT statement in the input file compares the XMLColumn column in the source table Table 1 to the same column in another database table (the TARGET_SELECT would use the same function):


SOURCE_SELECT="select ID, XMLSERIALIZE(XMLColumn AS CLOB) AS XMLColumn
from Table1 order by 1"

Comparing a subset of table rows

You can use the asntdiff RANGECOL parameter to compare only some of the rows in the two tables. This parameter specifies a range of rows from the source table that are bounded by two timestamps. You provide the name of a DATE, TIME, or TIMESTAMP column in the source table, and then use one of three different clauses for specifying the range. When you compare tables that are involved in peer-to-peer replication, you can use the IBM®-generated IBMQREPVERTIME column for the source column in the range clause.

The RANGECOL parameter is not valid for the asntdiff -f (input file) option. You can use a SQL WHERE clause in the input file to achieve similar results.