Because the Optimization Advisor always recommends MQTs with summarized data, the term summary table is used in the DB2 Warehouse documentation to describe the recommended MQTs.
You can complete expensive calculations and joins for your queries ahead of time and store that data in a summary table. When you run queries that can use the precomputed data, DB2 will reroute the queries to the summary table. A query does not need to match the precomputed calculations exactly. If you use simple analytics like SUM and COUNT, DB2 can dynamically aggregate the results from the precomputed data. Many different queries can be satisfied by one summary table. Using summary tables can dramatically improve query performance for queries that access commonly used data or that involve aggregated data over one or more dimensions or tables.
Figure 1 shows a cube model based on a star schema with a Sales facts object and Product, Time, and Retailer dimensions. The facts object has measures and attributes, and each dimension has a set of attributes and is joined to the facts object by a facts-to-dimension join.
SELECT PRODUCT_TYPE_CODE, REGION_EN, CURRENT_MONTH, SUM(SALE_TOTAL)
FROM TIME, PRODUCT, RETAILER, REGION, SALES
WHERE SALES.ORDER_DAY_KEY=TIME.DAY_KEY
AND SALES.RETAILER_SITE_KEY=RETAILER.RETAILER_SITE_KEY
AND SALES.PRODUCT_KEY=PRODUCT.PRODUCT_KEY
AND RETAILER.COUNTRY_KEY=REGION.COUNTRY_KEY
AND CURRENT_YEAR=2004
GROUP BY PRODUCT_TYPE_CODE, CURRENT_MONTH
The
thin line connecting the Type-Region-Month slice in Figure 2 represents the slice
that the query accesses. Type-Region-Month is a slice of the cube
model and includes one level from each hierarchy. You can define summary
tables to satisfy queries at or above a particular slice. A summary
table can be built for the Type-Region-Month slice that is accessed
by the query. Any other queries that access data at or above that
slice including All Time, Year, Quarter, All Retailers, All Products,
and Line can be satisfied by the summary table with some additional
aggregating. However, if you query more detailed data below the slice,
such as Day or Retailer, DB2 cannot
use the summary table for this more granular query. The rewriter in the DB2 SQL compiler knows about existing summary tables, and can automatically rewrite queries to read from the summary table instead of the base tables. Rewritten queries are typically much faster because the summary tables are usually much smaller than the base tables and contain pre-aggregated data. Users continue to write queries against the base tables. DB2 will decide when to use a summary table for a particular query and will rewrite the user's query to access the summary tables instead, as shown in Figure 4. The rewritten query accesses a summary table that contains pre-aggregated data. A summary table is often significantly smaller, and therefore significantly faster, than the base tables and returns the same results as the base tables.
SELECT PRODUCT_TYPE_CODE, REGION_EN, CURRENT_MONTH, SUM(SALE_TOTAL)
FROM TIME, PRODUCT, RETAILER, REGION, SALES
WHERE SALES.ORDER_DAY_KEY=TIME.DAY_KEY
AND SALES.RETAILER_SITE_KEY=RETAILER.RETAILER_SITE_KEY
AND SALES.PRODUCT_KEY=PRODUCT.PRODUCT_KEY
AND RETAILER.COUNTRY_KEY=REGION.COUNTRY_KEY
AND CURRENT_YEAR=2004
GROUP BY PRODUCT_TYPE_CODE, CURRENT_MONTH
The rewritten
query is here:SELECT PRODUCT_TYPE_CODE, REGION_EN, CURRENT_MONTH, SUM(SALE_TOTAL)
FROM SUMMARYTABLE1
WHERE CURRENT_YEAR = '2004'
GROUP BY PRODUCT_TYPE_CODE, CURRENT_MONTH;
The rewritten query is much simpler and quicker for DB2 to complete because the data is pre-aggregated and many of the table joins are precomputed so DB2 accesses one small table instead of six tables, including a large fact table. The savings with summary tables can be tremendous, especially for schemas that have large fact tables. For example, a fact table with 1 billion rows might be pre-aggregated to a summary table with only 1 million rows, and the calculations involved in this aggregation occur only once instead of each time a query is issued. A summary table that is 1000 times smaller is much faster than accessing the large base tables.
Figure 5 shows the summary table for the Type-Country-Month slice. DB2 needs to calculate data for Region from the higher level Country instead of from the lower level Retailer, so the summary table has fewer rows than the base tables because there are fewer countries than retailers. DB2 does not need to perform any additional calculations to return sales data by Month and Type because the data is already aggregated at these levels. This query is satisfied entirely by the data in the summary table that joins the tables used in the query ahead of time and the joins do not need to be performed at the time the query is issued. For more complex queries, the performance gains can be dramatic.
In some cases, a query might access an attribute that is related to an attribute that is included in the summary table. The DB2 optimizer can use functional dependencies and constraints to dynamically join the summary table with the appropriate dimension table.
When the Optimization Advisor recommends a summary table, all of the measures in the cube model are represented. In the cube model in Figure 1, the SalesFacts object has only five measures including Sales, Total cost, Quantity, Gross profit, and Gross margin, which are all included in the summary table. Sometimes the Advisor can represent a set of measures using a smaller number of columns. For example, multiple measures may reference common expressions. The Optimization Advisor does not need to include all of the related attributes that are defined for a level in the summary table because DB2 can exploit semantic information such as constraints to determine how to join from the summary table back to the dimension tables to retrieve necessary information.