Using constraints to improve query optimization

Consider defining unique, check, and referential integrity constraints. These constraints provide semantic information that allows the Db2® optimizer to rewrite queries to eliminate joins, push aggregation down through joins, push FETCH FIRST n ROWS down through joins, remove unnecessary DISTINCT operations, and perform a number of other optimizations.

Informational constraints can also be used for both check constraints and referential integrity constraints when the application itself can guarantee the relationships. The same optimizations are possible. Constraints that are enforced by the database manager when rows are inserted, updated, or deleted can lead to high system overhead, especially when updating a large number of rows that have referential integrity constraints. If an application has already verified information before updating a row, it might be more efficient to use informational constraints, rather than regular constraints. This type of informational constraint is also known as a NOT ENFORCED TRUSTED constraint.

For example, consider two tables, DAILY_SALES and CUSTOMER. Each row in the CUSTOMER table has a unique customer key (CUST_KEY). DAILY_SALES contains a CUST_KEY column and each row references a customer key in the CUSTOMER table. A referential integrity constraint could be created to represent this 1:N relationship between CUSTOMER and DAILY_SALES. If the application were to enforce the relationship, the constraint could be defined as informational. The following query could then avoid performing the join between CUSTOMER and DAILY_SALES, because no columns are retrieved from CUSTOMER, and every row from DAILY_SALES will find a match in CUSTOMER. The query optimizer will automatically remove the join.

SELECT AMT_SOLD, SALE PRICE, PROD_DESC
  FROM DAILY_SALES, PRODUCT, CUSTOMER
  WHERE
    DAILY_SALES.PROD_KEY = PRODUCT.PRODKEY AND
    DAILY_SALES.CUST_KEY = CUSTOMER.CUST_KEY

Informational constraints must not be violated, otherwise queries might return incorrect results. In this example, if any rows in DAILY_SALES do not have a corresponding customer key in the CUSTOMER table, the query would incorrectly return those rows.

Another type of informational constraint is the NOT ENFORCED NOT TRUSTED constraint. It can be useful to specify this type of informational constraint if an application cannot verify that the rows of a table will conform to the constraint. The NOT ENFORCED NOT TRUSTED constraint can be used to improve query optimization in cases where the Db2 optimizer can use the data to infer statistics from a statistical view. In these cases the strict matching between the values in the foreign keys and the primary keys are not needed. If a constraint is NOT TRUSTED and enabled for query optimization, then it will not be used to perform optimizations that depend on the data conforming completely to the constraint, such as join elimination.

When RI (referential integrity) tables are related by informational constraints, the informational constraints might be used in the incremental maintenance of dependant MQT data, staging tables, and query optimization. Violating an informational constraint might result in inaccurate MQT data and query results.

For example, parent and child tables are related by informational constraints, so the order in which they are maintained affects query results and MQT integrity. If there is data in the child table that cannot be related to a row in the parent table, an orphan row has been created. Orphan rows are a violation of the informational constraint relating that parent and child table. The dependent MQT data and staging tables associated with the parent-child tables might be updated with incorrect data, resulting in unpredictable optimization behaviour.

If you have an ENFORCED informational constraint, Db2 will force you to maintain RI tables in the correct order. For example, if you deleted a row in a parent table that would result in an orphan row, Db2 returns an SQL error and rolls back the change.

If you have a NOT ENFORCED informational constraint, you must maintain the integrity of the RI tables by updating tables in the correct order. The order in which parent-child tables are maintained is important to ensure MQT data integrity.

For example, you have set up the following parent and child table with a corresponding MQT:

create table parent (i1 int not null primary key, i2 int);
create table child (i1 int not null, i2 int);

alter table child add constraint fk1 foreign key (i2) references parent (i1) not enforced;
enable query optimization;

create table mqt1 as (select p.i1 as c1, p.i2 as c2, c.i1 as c3, count (*) as cnt from parent p, child c
where p.i1 = c.i2 group by p.i1, p.i2, c.i1) data 
initially deferred refresh immediate;

refresh table mqt1;

commit;

To insert rows into parent-child tables, you must insert rows into the parent table first.

insert into parent values (4, 4);
insert into child values(40, 4);

If rows are inserted into the child table first, orphan rows exist while there is no row in the parent table that matches the child row's foreign key. This violation, although temporary, might produce unpredictable behaviour during query optimization and MQT processing.

To remove rows from the parent table, you must remove the related rows from the child table first.

delete from child;
delete from parent;

If rows are removed from the parent table first, orphan rows are created when a child row's foreign key no longer matches a row key in the parent table. This results in a violation of the informational constraint between the parent and child tables. This violation, although temporary, might produce unpredictable behaviour during query optimization and MQT processing.