Statistical views
The Db2® cost-based optimizer uses an estimate of the number of rows processed by an access plan operator to accurately cost that operator. This cardinality estimate is the single most important input to the optimizer cost model, and its accuracy largely depends upon the statistics that the RUNSTATS command collects from the database.
- Comparisons involving expressions. For example,
price > MSRP + Dealer_markup
. - Relationships spanning multiple tables. For example,
product.name = 'Alloy wheels' and product.key = sales.product_key
. - Any relationships other than predicates involving independent attributes and simple comparison operations.
When a query is compiled, the optimizer matches the query to the available statistical views. When the optimizer computes cardinality estimates for intermediate result sets, it uses the statistics from the view to compute a better estimate.
Queries do not need to reference the statistical view directly in order for the optimizer to use the statistical view. The optimizer uses the same matching mechanism that is used for materialized query tables (MQTs) to match queries to statistical views. In this respect, statistical views are similar to MQTs, except that they are not stored permanently, do not consume disk space, and do not have to be maintained.
CREATE VIEW SV_TIME_FACT AS (
SELECT T.* FROM TIME T, SALES S
WHERE T.TIME_KEY = S.TIME_KEY)
ALTER VIEW SV_TIME_FACT ENABLE QUERY OPTIMIZATION
RUNSTATS ON TABLE DB2DBA.SV_TIME_FACT WITH DISTRIBUTION
Once
the view is enabled for optimization, it is identified as a statistical
view in the SYSCAT.TABLES catalog view with a 'Y' in position 13 of
the PROPERTY column.SELECT SUM(S.PRICE)
FROM SALES S, TIME T, PRODUCT P
WHERE
T.TIME_KEY = S.TIME_KEY AND
T.YEAR_MON = 200712 AND
P.PROD_KEY = S.PROD_KEY AND
P.PROD_DESC = ‘Power drill'
Without a statistical
view, the optimizer assumes that all fact table TIME_KEY values corresponding
to a particular TIME dimension YEAR_MON value occur uniformly within
the fact table. However, sales might have been strong in December,
resulting in many more sales transactions than during other months.Statistics that are gathered on queries that have complex expressions in the predicate can be used by the optimizer to calculate accurate cardinality estimates which results in better access plans.
For many star join queries, several statistical views might need to be created. You can use referential integrity constraints to reduce the number of views needed to obtain the required statistical information.
Another way to obtain better access plans is to apply column group statistics on statistical views. These group statistics help to adjust filter factors which help to gather more accurate statistics which the optimizer can use to obtain accurate cardinality estimates.
Statistics
can also be gathered automatically from statistical views through
the automatic statistics collection feature in Db2.
This new feature can be enabled or disabled by using the auto_stats_views database
configuration parameter. This database configuration parameter is
off by default and can be enabled by using the UPDATE DB
CFG command. The statistics collected by the automatic statistics
collection is equivalent to issuing the command runstats on
view view_name with distribution
.
Utility
throttling can be used for statistical views to restrict the performance
impact on the workload. For example, the command runstats
on view view_name util_impact_priority 10
contains
the impact on the workload within a limit specified by the util_impact_lim database
manager configuration parameter while statistics are collected on
statistical views.
A statistical view cannot directly or indirectly reference a catalog table.