Change to implementation of some shared common table expressions for SQL queries

On IBM® i 7.5, the query optimizer has changed how it implements queries in which a Common Table Expression (CTE) is referenced multiple times within that same query. This change increases compatibility with the SQL standard, and it may also affect the behavior of some queries that use CTEs. The following detail applies only to queries that reference a CTE multiple times.

On prior releases, the optimizer had two options for implementing these CTEs. The optimizer would choose the option that it estimated would run the fastest.

The options were:

  1. Capture
    "Capture" results set of the CTE. The full CTE is run once by itself, and the resulting data is stored in a temporary data structure to be shared wherever the CTE is referenced in that query.
  2. Merge
    "Merge" each CTE as defined into the tree and handle each CTE reference separately. This would treat each CTE reference as its own entity, duplicating the underlying tree and allowing the optimizer to decide how each reference to that tree is run based on the context of that reference.

It has been determined that option (2) is not consistent with the SQL standard. Because the CTE's underlying tree is duplicated, changes to underlying tables while the query is running may cause a CTE reference to return a different result set. This could cause unexpected results, as an individual CTE is expected to return the same data wherever it is referenced.

To provide consistent query results and to ensure compliance with the SQL standard, the query optimizer will now always use option (1) by default. As a side-effect, some CTE queries that were previously implemented with option (2) may run slower. In other cases, run time may be improved.

In all cases, the CTE query will return predictable results.

Capturing the results set into a temporary data structure also prevents the use of a sensitive cursor or ALWCPYDTA(*NO) with these queries. Queries that previously were able to run in such an environment because they used option (2) will need to be re-written, or they will now fail with SQLCODE -243 or SQLCODE -527.

If the underlying tables are known not to be changing during query execution and the performance of option (1) is not acceptable, the DETERMINISTIC keyword may be added to the CTE definition. This will allow the optimizer to consider option (2) for these CTEs. If the underlying data does change, the use of the DETERMINISTIC clause may result in behavior that is not consistent with the SQL standard definition for CTEs. Users who implement DETERMINISTIC CTEs assume this risk. The DETERMINISTIC keyword is available in PTFs SI84058 for IBM i 7.5 and SI84057 for IBM i 7.4

For more information on detecting shared CTEs, see New database monitor field indicating shared common table expressions for SQL queries.