How DB2 simplifies join operations

DB2® can sometimes simplify join operations to improve access path efficiency.

Introductory concepts

Ways to join data from more than one table

Begin program-specific programming interface information.

However, because full outer joins are less efficient than left or right joins, and left and right joins are less efficient than inner joins, the recommendations is to always try to use the simplest type of join operation in your queries.

Simplification for predicates that eliminate null values

DB2 can simplify a join operation when the query contains a predicate or an ON clause that eliminates the null values that are generated by the join operation.

ON clauses that eliminate null values

Consider the following query:

SELECT * FROM T1 X FULL JOIN T2 Y
  ON X.C1=Y.C1
  WHERE X.C2 > 12;

The outer join operation yields these result table rows:

The rows with matching values of C1 in tables T1 and T2 (the inner join result)
The rows from T1 where C1 has no corresponding value in T2
The rows from T2 where C1 has no corresponding value in T1

However, when you apply the predicate, you remove all rows in the result table that came from T2 where C1 has no corresponding value in T1. DB2 transforms the full join into a left join, which is more efficient:

SELECT * FROM T1 X LEFT JOIN T2 Y
  ON X.C1=Y.C1
  WHERE X.C2 > 12;

Predicates that eliminate null values

In the following statement, the X.C2>12 predicate filters out all null values that result from the right join:

SELECT * FROM T1 X RIGHT JOIN T2 Y
  ON X.C1=Y.C1
  WHERE X.C2>12;

Therefore, DB2 can transform the right join into a more efficient inner join without changing the result:

SELECT * FROM T1 X INNER JOIN T2 Y
  ON X.C1=Y.C1
  WHERE X.C2>12;

Predicates that follow join operations

The predicate that follows a join operation must have the following characteristics before DB2 transforms an outer join into a simpler outer join or inner join:

The predicate is a Boolean term predicate.
The predicate is false if one table in the join operation supplies a null value for all of its columns.

The following predicates are examples of predicates that can cause DB2 to simplify join operations:

T1.C1 > 10
T1.C1 IS NOT NULL
T1.C1 > 10 OR T1.C2 > 15
T1.C1 > T2.C1
T1.C1 IN (1,2,4)
T1.C1 LIKE 'ABC%'
T1.C1 BETWEEN 10 AND 100
12 BETWEEN T1.C1 AND 100

ON clauses that eliminate unmatched values

This examples shows how DB2 can simplify a join operation because the query contains an ON clause that eliminates rows with unmatched values:

SELECT * FROM T1 X LEFT JOIN T2 Y
  FULL JOIN T3 Z ON Y.C1=Z.C1
  ON X.C1=Y.C1;

Because the last ON clause eliminates any rows from the result table for which column values that come from T1 or T2 are null, DB2 can replace the full join with a more efficient left join to achieve the same result:

SELECT * FROM T1 X LEFT JOIN T2 Y
  LEFT JOIN T3 Z ON Y.C1=Z.C1
  ON X.C1=Y.C1;

Full outer joins processed as left outer joins

In one case, DB2 transforms a full outer join into a left join when you cannot write code to do it. This is the case where a view specifies a full outer join, but a subsequent query on that view requires only a left outer join.

For example, consider the view that is created by the following statement:

CREATE VIEW V1 (C1,T1C2,T2C2) AS
  SELECT COALESCE(T1.C1, T2.C1), T1.C2, T2.C2
  FROM T1 X FULL JOIN T2 Y
  ON T1.C1=T2.C1;

This view contains rows for which values of C2 that come from T1 are null. However, if you execute the following query, you eliminate the rows with null values for C2 that come from T1:

SELECT * FROM V1
  WHERE T1C2 > 10;

Therefore, for this query, a left join between T1 and T2 would have been adequate. DB2 can execute this query as if the view V1 was generated with a left outer join so that the query runs more efficiently.

Removal of unneeded tables in left outer joins

When an SQL statement contains a left outer join, but does not select any columns from the right side of the join, DB2 can remove the join from the statement.

The right table is unneeded if either of the following conditions are true:

A unique index exists on the join key column of the right table.
The statement specifies SELECT DISTINCT.

For example consider the following statement:

SELECT DISTINCT T1.C3 
FROM T1 LEFT OUTER JOIN T2
ON T1.C2 = T2.C2
WHERE T1.C1 = ?

Because the statement specified SELECT DISTINCT, the reference to the right table is unneeded, and DB2 can select an access path for the following statement instead:

SELECT DISTINCT T1.C3
FROM T1
WHERE T1.C1 = ?

Because all references to the right table have been removed from the statement, the PLAN_TABLE output and access path diagrams for the statement contain no references to the table.

End program-specific programming interface information.