Table compare utility (asntdiff)
The asntdiff command compares the columns in one table to their corresponding columns in another table and generates a list of differences between the two in the form of a Db2® table.
To use the compare utility, you run the asntdiff command and specify the name of a Q subscription (Q Replication) or subscription set member (SQL Replication) that contains the source and target tables that you want to compare. You can also use SQL statements in an input file to specify the tables to compare.
The following sections explain how to use the asntdiff command:
- Overview of the asntdiff command
- When to use the compare utility
- Where differences are stored
- Sample difference report
- Required authorizations
- Restrictions for key columns at source and target
- Data type considerations
- Effects of filtering
- Comparisons based on queries instead of subscriptions
- Comparing a subset of table rows
Overview of the asntdiff command
You can run the asntdiff command on Linux®, UNIX, Windows, and z/OS® operating systems. The command compares tables on Linux, UNIX, Windows, z/OS, or System i® operating systems. The asntdiff command can be used with federated sources and targets if the corresponding columns in the two tables have the same data types.
z/OS: The ASNTDIFF sample job in the SASNSAMP data set provides further information that is specific to the z/OS platform.
For Q Replication, the target must be a user copy table or a consistent-change-data (CCD) table that is condensed and complete. Stored procedure targets are not supported. For SQL Replication, the target must be a user table, point-in-time table, replica table, user-copy table, or consistent-change-data (CCD) table that is condensed and complete.
For SQL Replication, the Apply Control Server and the target server must be the same.
When you run the command, you specify an SQL WHERE clause that uniquely identifies the Q subscription or subscription set member:
- Q Replication
- The WHERE clause identifies a row in the IBMQREP_SUBS control
table at the Q Capture server, based on the value of the SUBNAME column.
For example:
where="subname = 'my_qsub'"
- SQL Replication
- The WHERE clause identifies a row in the IBMSNAP_SUBS_MEMBR table
at the Apply control server, based on the value of the SET_NAME column.
For example:
You might need to use more predicates in the WHERE clause to uniquely identify the subscription set member. For example, you might need to add the APPLY_QUAL, the SOURCE_OWNER, the TARGET_OWNER, or the TARGET_TABLE column from the IBMSNAP_SUBS_MEMBR table to the clause.where="set_name = 'my_set' and source_table='EMPLOYEE'"
When to use the compare utility
The best time to use the utility is when the source and target tables are stable. You might want to run the utility when the Q Capture and Q Apply programs or Capture and Apply programs are idle. For example, you could run the utility when the Q Capture program reached the end of the Db2 recovery log and all changes are applied at the target. If applications are still updating the source, the comparison might not be accurate.
If the replication programs are running, you might need to run the asntdiff command more than once to get a complete picture of evolving differences between the source and target tables.
Where differences are stored
The asntdiff command creates a difference table in the source database or subsystem to store differences that it finds.
The difference table is named schema.ASNTDIFF, where schema is the value specified in the DIFF_SCHEMA parameter. If the schema is not specified, it defaults to ASN. You can also use the DIFF parameter to specify a table name.
By default, the difference table is created in the default Db2 user table space. You can specify a different, existing table space by using the DIFF_TABLESPACE parameter.
The difference table has two or more columns. One column is named DIFF, with a blank space at the end on Linux, UNIX, and Windows. The value in the DIFF column is a character that indicates an insert, update, or delete operation followed by a numeric value that indicates which table contains a row with differences. The other columns contain the value of replication key columns. There is one row in the difference table for each unmatched row in the target table.
The difference table uses three identifiers that indicate the operation that is needed to change the target table so that it matches the source table:
D
(delete)- Indicates that a row with the key value exists only at the target and not at the source.
U
(update)- Indicates that rows with the same key value exist at both the source and target, but at least one non-key column is different at the target.
I
(insert)- Indicates that a row with the key value exists only at the source and not at the target.
A value of ? 1 indicates that there is an invalid character in one or more source columns.
A value of ? 2 indicates that there is an invalid character in one or more target columns.
Example:
The following list of values is returned by comparing an EMPLOYEE table at the source with a target copy of the same table. The key column for replication is the employee number, EMPNO:
DIFF EMPNO
U 2 000010
I 2 000020
I 2 000040
D 2 000045
I 2 000050
D 2 000055
The first row in the example shows that a row with
the key value 000010
exists at both the source and
target tables, but at least one non-key column at the target has a
different value. The next two rows show that rows with the key values 000020
and 000040
exist
only at the source. The fourth row shows that a row with the key value 000045
exists
only at the target.
The values ? 1
and ?
2
are not shown in the example.
Sample difference report
After a table compare, the asntdiff command outputs a summary report to the console. The following example shows a sample difference report.
---------------------| Difference Report |---------------------
Start time: 2011-10-24-11.06.32.890004
End time: 2011-10-24-11.07.23.563205
Duration: 0.60042 second
SOURCE_SELECT: SELECT * FROM src_table WHERE val!= 0.331 ORDER BY 2
TARGET_SELECT: SELECT id, CAST(val AS DEC(9,3)) FROM trg_table
WHERE CAST(val AS DEC(9,3)) != 0.330 ORDER BY 2
TDIFF table: "asn"."diff_table"
Summary of compare
------------------
Number of different rows: 1
Number of common rows: 4
Difference details (rows)
-------------------------
Source total: 5
Target total: 5
Source only: 0
Target only: 0
Update: 1
Here is more information about the "Summary of compare" and "Difference details (rows)" sections of the report:
- Number of different rows: The number of rows with differences that were identified ("Source only" + "Target only" + "Update").
- Number of common rows: The number of rows that have the same key and non-key values between the source result set and target result set.
- Source total: The number of rows in the source table (or the result set of the SOURCE_SELECT parameter).
- Target total: The number of rows in the target table (or the result set of the TARGET_SELECT parameter).
- Source only: The number of rows in the source result set and not in the target result set.
- Target only: The number of rows in the target result set and not in the source result set.
- Update: The number of rows in the two result sets with the same key values but differences in the non-key values.
Required authorizations
These database authorizations are required for the compare utility:
- Access privileges to the tables that are being compared, and to the replication control tables unless the -f (file) option is used
- Linux, UNIX, Windows: Read privileges for the password file if the PWDFILE keyword is used
- WRITE privileges for the directory that is specified by the DIFF_PATH keyword
- To create the difference table, CREATETAB authority on the source
database and USE privilege on the table space. In addition, one of
the following privileges is needed:
- IMPLICIT_SCHEMA authority on the database, if the implicit or explicit schema name of the table does not exist
- CREATEIN privilege on an existing schema if the table is created in this schema
- DROPIN privilege on the schema to drop the difference table unless DIFF_DROP=N is used
- SELECT, DELETE, and INSERT privileges on the difference table (at the source). The default schema name is ASN and the default table name is ASNTDIFF.
Restrictions for key columns at source and target
The asntdiff utility supports multiple-byte character sets when the database is defined with SYSTEM or IDENTITY. However, the columns that are used as keys for replication at the source and target tables must use single-byte characters for the utility to compare the tables.
In a Linux, UNIX, or Windows database that uses Unicode, the characters in key data cannot be greater than the base U.S. English ASCII subset (first 256 ASCII characters) or the asntdiff utility cannot compare the tables.
Data type considerations
You need to consider the data types of the tables that you are comparing when using asntdiff.
- Different data types in sources and targets
- The compare utility can build two SELECT SQL statements that are
based on the description of a subscription. To obtain the differences
between the source and target tables, the utility compares the data
that result from executing both statements. The data types and lengths of the columns for both SQL statements
must be the same.
- SQL Replication
- The utility builds the SQL statement for the source by using the EXPRESSION column in the IBMSNAP_SUBS_COLS table.
- Q Replication
- The data types for both the source and the target must be the same.
- Unsupported data types
- The compare utility does not support comparisons between the following
data types:
- Nonkey columns
- DECFLOAT, BLOB_FILE, CLOB_FILE, DBCLOB_FILE
- Key columns
- DECFLOAT, BLOB, CLOB, DBCLOB, VARGRAPHIC, GRAPHIC, LONG_VARGRAPHIC, BLOB_FILE, CLOB_FILE, DBCLOB_FILE, XML
- Comparing the GRAPHIC data type
- Columns with the GRAPHIC data type at the source and target might
not match when you use the utility to compare the source and target
tables. Db2 columns with the
GRAPHIC data type have blank padding after the graphic data. This
padding might be single-byte or double-byte spaces, depending on the
code page that the database was created in. This padding might cause
data to not match between the source and the target tables, especially
if the source and target tables are in different code pages. This
padding applies only to GRAPHIC data types and not other graphic data
types such as VARGRAPHIC or LONG VARGRAPHIC.
To compare columns with GRAPHIC data types, you must remove the blank padding in the data before you compare the source and target tables by using the Db2 scalar function
rtrim(<column>
. This function eliminates the code page differences for single-byte or double-byte spaces and ensures that the utility compares the GRAPHIC data in a consistent manner. - TIMESTAMP WITH TIMEZONE restriction
- The compare utility does not support comparisons that involved the TIMESTAMP WITH TIMEZONE data type that was introduced in Db2 for z/OS Version 10.
Effects of filtering
In some cases, differences between source and target tables are intentional, for example, if you use a search condition in Q Replication to filter which rows are replicated. The utility will not show differences between source and target tables that are a result of predicates or suppressed deletes.
- Row filtering
- The compare utility uses information from the replication control
tables to avoid showing intentional differences:
- SQL Replication
- The utility uses the PREDICATES column in the IBMSNAP_SUBS_MEMBR table to select rows from the source tables. The value of the UOW_CD_PREDICATES column is ignored (asntdiff looks directly at the source table, where the Apply program looks at the CD table).
- Q Replication
- The utility uses the value of the SEARCH_CONDITION column in the IBMQREP_SUBS table to build the WHERE clause for the SELECT statement.
- Suppressed delete operations
- In Q Replication, you can choose to suppress replication of delete
operations from the source table. If you do not replicate delete operations,
rows that exist in the target table might not exist in the source
table. When the SUPPRESS_DELETES value for a Q subscription is Y,
the asntdiff utility ignores the rows that are unique to the target
and reports no differences. A warning is issued to indicate how many
rows were suppressed.
The asntdiff -f (input file) option does not support SUPPRESS_DELETES because it bases the table comparison on a SQL SELECT statement rather than the Q subscription definition.
Comparisons based on queries instead of subscriptions
The asntdiff -f command option enables you to do differencing by using SQL SELECT statements that are read from an input file. This option provides greater flexibility to do differencing between two generic tables. The asntdiff -f option does not use replication definitions to determine which tables and rows to compare as the standard asntdiff command does.
The asntdiff -f option works for all tables on Linux, UNIX, Windows, and z/OS. For details on this option, see asntdiff –f (input file) command option.
In addition to the SELECT statements, the input file contains the source and target database information, the difference table information, and optional parameters that specify methods for processing the differences. You can use a password file that is created by the asnpwd command to specify a user ID and password for connecting to the source and target databases.
SOURCE_SELECT="select ID, XMLSERIALIZE(XMLColumn AS CLOB) AS XMLColumn
from Table1 order by 1"
Comparing a subset of table rows
You can use the asntdiff RANGECOL parameter to compare only some of the rows in the two tables. This parameter specifies a range of rows from the source table that are bounded by two timestamps. You provide the name of a DATE, TIME, or TIMESTAMP column in the source table, and then use one of three different clauses for specifying the range. When you compare tables that are involved in peer-to-peer replication, you can use the IBM®-generated IBMQREPVERTIME column for the source column in the range clause.
The RANGECOL parameter is not valid for the asntdiff -f (input file) option. You can use a SQL WHERE clause in the input file to achieve similar results.