IBM Support

How do organizing keys work in Db2 Analytics Accelerator and how they differ in V5 and V7

White Papers


Abstract

The concept of an organizing key has existed in Db2 Analytics Accelerator for many years. It exists in the previous accelerator generation (which is Db2 Analytics Accelerator V5 and earlier versions) and in the current accelerator generation (which is Db2 Analytics Accelerator V7).
However, the feature has been implemented differently in either version. The implementation differences make it necessary to also handle organizing keys in a different way, depending on your product version. This document is useful for readers who are already familiar with organizing keys in product version 5, and want to continue to use them in version 7. It is also useful for readers who want to start using organizing keys in either product.

Content

An organizing key can have an impact on the query response time. When choosing an organizing key, you specify columns of an accelerator-shadow table or accelerator-archive table for the clustering of the table records. This creates grouped segments or blocks of rows with equal or nearby values in the columns selected as organizing keys. If an incoming SQL query references one of the organizing key columns in a range or an equality predicate, the query can run much faster because entire blocks can be skipped during the table scan if the underlying database system can determine that none of the records in a block satisfy the predicate.

For more information on organizing keys see here: Organizing keys

Organizing key implementation in Db2 Analytics Accelerator V5 (or lower)

Db2 Accelerator Accelerator V5 is based on IBM® Netezza® technology. Netezza uses a concept of so-called clustered base table. A clustered base table (CBT) is a table that contains data that is organized by using an organizing key, which comprises between one and four columns.

When a Db2 Analytics Accelerator user specifies an organizing key, the table is altered to a clustered base table and appropriate reorganization actions (GROOM) take place to group the records within the table and save them in the same or nearby blocks (extents).

Netezza also creates zone maps for the organizing columns to accelerate the performance of queries on that table. Zone maps summarize the range of values inside the columns of the records that are saved in an extent; the minimum and maximum values of all rows within the extent are stored. During query processing, the minimum and maximum values are used to decide whether extents can be skipped because the row values in the extent do not match the query predicate. An organizing key helps to narrow the range of values within the extent by grouping the data by the values of the organizing key columns. That means that after the grouping, the minimum and maximum values are close to each other and the value ranges (intervals) of an extent have fewer and smaller overlaps with the value ranges of other extents. If the data is well organized within the extent, and the zone maps have smaller “ranges” of values, queries run faster because Netezza can skip the extents that contain unrelated data, and direct its resources to processing the data that matches the query.

Click the links for more information on a CBT and zone maps.

Organizing key implementation in Db2 Analytics Accelerator V7

Db2 Analytics Accelerator V7 is based on Db2® Warehouse as the underlying database and acceleration technology. In Db2 Warehouse, the concept of a clustered based table does not exist.

When a Db2 Analytics Accelerator user specifies an organizing key using the ACCEL_ALTER_TABLES stored procedure or a graphical user interface (for example Db2 Analytics Accelerator Studio),  the data on the accelerator is sorted (one-dimensional sort) according to the specified columns in the organizing key and inserted into a new table. For example, if an organizing key is set on the columns C1 and C2, an ‘INSERT INTO T2 SELLECT * FROM T1 ORDER BY C1, C2’ command is executed on the accelerator.

Db2 Warehouse maintains a so-called synopsis table for each user table. A synopsis table contains metadata that describes the ranges of values that exist and where they are located in the user table. A synopsis table stores minimum and maximum values for a sequence of 1000 rows. Such a sequence is called a block. An organizing key helps to narrow the range of values within a block by sorting the data by the values of the organizing key columns. That means that after the sorting, the minimum and maximum values are close to each other and the value ranges (intervals) of a block have fewer and smaller overlaps with the value ranges of other blocks. This enables Db2 to skip blocks of a table when scanning data to answer a query if the query references the organizing key columns. If the data is sorted, more blocks of the table can be skipped.

Follow this link for more information on synopsis tables: Synopsis Tables

As described in the previous sections, the implementation of organizing keys is different for both product versions. In version 5, an organizing key is defined at the table level, which means that the Netezza database management system takes care of maintaining the table clustering by executing groom or reorg operations. In version 7, an organizing key cannot be defined at the table level, and therefore requires an explicit sorting of the table when you define or change the organizing key. A sorting is also required to maintain the organizing key, for example after many changes have been made to the table (for example by replication or after a reload). 

The different implementations lead to differences in the methods that users apply to define, change, and maintain an organizing key for a table. In the following sections, these differences are described.

The following user interfaces are available to define, change, or maintain organizing keys:

  • Stored procedure SYSPROC.ACCEL_ALTER_TABLES
  • Graphical user interfaces (which call the SYSPROC.ACCEL_ALTER_TABLES stored procedure):
    • ‘Alter Keys’ dialog in Db2 Analytics Accelerator Studio (Data Studio plugin)
    • ‘Alter Keys’ dialog in Data Server Manager

Defining, changing or maintaining an organizing key for an accelerator-shadow table

Db2 Analytics Accelerator V5:

You can define or change an organizing key on an accelerator-shadow table:

  • After the table has been added to the accelerator
  • After the table has been added to and loaded on the accelerator

In both cases, the data in the table is clustered by an asynchronous background reorganization process after the table has been loaded on the accelerator. The reorganization is not part of executing the SYSPROC.ACCEL_ALTER_TABLES[1] stored procedure. This means that the stored procedure gives control back to the calling program right after altering the table to define the organizing key. If a query is run after the stored procedure has given control back to the calling program, the query might not benefit from the organizing key because the asynchronous process hasn’t finished yet. It is therefore recommended that you postpone query execution for a few hours until the  process has finished. The duration of the asynchronous process depends on the amount of data in the table.

Using a graphical user interface, such as Db2 Analytics Accelerator Studio, the progress of the reorganization can be monitored by observing the ‘Organized %’ value that is displayed for the table. If the value 100% is shown, the reorganization has finished. The stored procedure SYSPROC.ACCEL_GET_TABLES_INFO returns this information as well.

An organizing key can be defined or changed for a replication-enabled table without impacting the replication process for this or the  other tables that are part of the subscription.

To maintain the organizing key for the table, no dedicated user actions are required. If you reload the table fully or partially, or if the replication process makes more than a certain number of changes to the table, an automatic reorganization will be started in the background to maintain the table clustering according to the defined organizing key.

Db2 Analytics Accelerator V7:

You can define or change an organizing key on an accelerator-shadow table:

  • After the table has been added to and loaded on the accelerator

The data in the table is sorted according to the organizing key columns and inserted into a new table on the accelerator. This is a synchronous action, which is carried out as part of the SYSPROC.ACCEL_ALTER_TABLES stored procedure call. This means that the stored procedure gives control back to the calling program after the table has been sorted. For large tables with a size of several TBs, this can take a few hours. If a query is executed after the stored procedure has given control back to the calling program, the query immediately benefits from the defined organizing key. If you run a query while the sorting is still ongoing, the query uses the previous version of the table, and therefore does not benefit from the organizing key.

Important: If you use Db2 Analytics Accelerator Studio, make sure that you do not close Data Studio during the sorting. Otherwise the sorting is stopped and rolled back.

If you use a graphical user interface, such as Db2 Analytics Accelerator Studio, you can monitor the progress of the sorting from the ‘List Tasks’ window. Look for an ‘Alter Keys’ task. However,  the Progress column of the 'List Tasks' window does not report the sorting progress of an ‘Alter Keys’ task correctly if you use Db2 Analytics Accelerator Studio V7.5.2 or lower.  The fix for APAR PH27646 (IBM Db2 Analytics Accelerator Studio does not show progress indicator correctly for altering or setting an organizing key) is available with Db2 Analytics Accelerator Studio V7.5.3.

If you want to set an organizing key for a replication-enabled table, you must first stop replication. This impacts all replication-enabled tables.

To maintain the organizing key for the table, you have to intervene manually. If you reload the table fully or partially, or the replication process updated the table many times, you must re-sort the table data according to the already defined organizing key. To this end, run the SYSPROC.ACCEL_ALTER_TABLES stored procedure or the ‘Alter Keys’ function from a graphical user interface. If you run the stored procedure with the organizing key that is already set, the table data is re-sorted. However, if you run the ‘Alter Keys’ function of a graphical user interface (for example Db2 Analytics Accelerator Studio), this does not happen unless you have installed a fix for APAR PH27644 (IBM Db2 Analytics Accelerator Studio does not execute Alter Keys if the keys are not changed). Db2 Analytics Accelerator Studio V7.5.3 fixes this defect.

If your queries require an organizing key for improved performance, consider the following recommendations for maintaining an organizing key:

  • After a full table reload, run the ‘Alter Keys’ function or the SYSPROC.ACCEL_ALTER_TABLES stored procedure to set the same organizing key. This will re-sort the table data according to the organizing key.
  • If you reload a table partially, only the data of the newly loaded partitions is unsorted. The data of all other partitions is still sorted according to the organizing key. Re-sort the complete data only if the majority of the partitions were reloaded, or if you notice a performance degradation of queries that reference the table.
  • If the table is replication-enabled and incremental updates occur, re-sort the complete data only if you notice a performance degradation of queries that reference the table.

At the time of this writing, the ‘Organized %’ column (as shown on a graphical user interface or as part of the result set returned by the SYSPROC.ACCEL_GET_TABLES_INFO stored procedure) does not show how well organized or sorted a table is. This makes it hard to tell whether a re-sorting is necessary. APAR PH27645 (IBM Db2 Analytics Accelerator Studio does not show the status of an organizing key on a table) has been opened to address this issue.

Defining, changing or maintaining an organizing key for an accelerator-archive table

An accelerator-archive table consists of two tables on the accelerator. One table (let’s call it the active table) contains the data of the active partitions of your Db2 for z/OS table. The other table (let’s call it the archived table) contains the data of historical partitions of your Db2 for z/OS table, which have been archived to the accelerator. The accelerator handles both tables internally, so that the user sees just a single accelerator-archive table. However, the fact that there are actually two tables is important for the handling of organizing keys.

Db2 Analytics Accelerator V5:

You can define an organizing key on an accelerator-archive table:

  • A) After the table has been added to the accelerator
  • B) After the table has been added and active partitions have been loaded or history partitions have been archived to the accelerator (with limitations) or both

As described in a previous section in this document the data within a table is clustered in an asynchronous background reorganization action after a load or archive action took place on the accelerator-archive table.

For defining an organizing key on an accelerator-archive table it is recommended to use option A). This ensures that the key is defined on the active table and on the archive table. If you go with option B) and define an organizing key after partitions have been archived,  the organizing key is only defined for the active table and not for the archive table. This is a limitation. The impact might be that the defined organizing key may not lead to the expected query performance improvement since the archive table is not clustered according to the organizing key columns.

If you want to change an organizing key on an accelerator-archive table then specifying a new organizing key using the ‘Alter Keys’ function of a graphical user interface or the ACCEL_ALTER_TABLES stored procedure changes the organizing key of the active table only.[2] To apply the organizing key change to the active and the archived table, use one of the following procedures:

If you use one accelerator
  1. Restore the archived partitions to Db2 for z/OS using the ‘Storage Saver’ function of a graphical user interface or run the SYSPROC.ACCEL_RESTORE_ARCHIVE_TABLES stored procedure.
  2. Define the (changed) organizing key for the accelerator-archive table
  3. Archive the history partitions and/or load the active partitions.
If you use more than one accelerator, and the accelerator-archive table exists on more than one accelerator, and has the same status on the accelerators in terms of which partitions are archived and which partitions are loaded.
  1. Remove the accelerator-archive table from one accelerator.
  2. Add the accelerator-archive table to the accelerator again.
  3. Define the (changed) organizing key for the accelerator-archive table.
  4. Archive the history partitions and/or load the active partitions.
  5. Repeat the previous steps for the other accelerators.

To maintain the organizing key for the accelerator-archive table, an intervention on your part is not required. If you reload the table (either full table reload or partial reload), archive additional partitions, or if the table was updated replicated many times, an automatic background reorganization process maintains the clustering according to the defined organizing key.

Db2 Analytics Accelerator V7:

You can define an organizing key for an accelerator-archive table:

  • After the table has been added, and active partitions have been loaded or history partitions have been archived

As described in the previous section, the data is sorted by a synchronous process, which is part of the SYSPROC.ACCEL_ALTER_TABLES stored procedure call.

If you use Db2 Analytics Accelerator V7.5.1, the organizing key is valid for the active table, but not for the archived table. This is a limitation. The impact might be that the organizing key does not lead to the expected performance improvement because the archived table is not sorted according to the organizing key columns.

Starting with V7.5.2, the organizing key is valid for both, the active table and the archived table. Hence the data of both tables is resorted as part of running the SYSPROC.ACCEL_ALTER_TABLES stored procedure. Mind that it is not possible to re-sort just one of the tables. Queries receive a maximum benefit from an organizing key, no matter if they reference just the active data, or the archived and the active data.

Important: If you use Db2 Analytics Accelerator Studio, make sure that you do not close Data Studio during the sorting. Otherwise the sorting is stopped and rolled back.  

If you use a graphical user interface, such as Db2 Analytics Accelerator Studio, you can monitor the progress of the sorting from the ‘List Tasks’ window. Look for an ‘Alter Keys’ task. However,  the Progress column of the 'List Tasks' window does not report the sorting progress of an ‘Alter Keys’ task correctly if you use Db2 Analytics Accelerator Studio V7.5.2 or lower.  APAR PH27646 (IBM Db2 Analytics Accelerator Studio does not show progress indicator correctly for altering or setting an organizing key) remedies this issue and is fixed with Db2 Analytics Accelerator Studio V7.5.3.

To maintain the organizing key for the table, you have to intervene manually. If you reload the table fully or partially, or if you archive additional partitions, you must re-sort the table data according to the already defined organizing key. To this end, run the SYSPROC.ACCEL_ALTER_TABLES stored procedure or the ‘Alter Keys’ function from a graphical user interface. If you run the stored procedure with the organizing key that is already set, the table data is re-sorted. However, if you run the ‘Alter Keys’ function of a graphical user interface (for example Db2 Analytics Accelerator Studio), this does not happen unless you have installed a fix for APAR PH27644 (IBM Db2 Analytics Accelerator Studio does not execute Alter Keys if the keys are not changed). Db2 Analytics Accelerator Studio V7.5.3 fixes this APAR.

If your queries require an organizing key for improved performance, consider the following recommendations for maintaining an organizing key (note that V7.5.2 or later is required for the sorting of the archived table):

  • If you reload active partitions (partial reload), only the newly loaded partitions are unsorted. The other active partitions and also the archived partitions are still sorted according to the previously applied organizing key. Re-sort the data in both, the active and the archived table only if you have many partitions with newly loaded data and notice a performance degradation of queries that reference the accelerator-archive table.
  • If you archive additional partitions, re-sort both, the active and the archive table only if due to too many unsorted partitions, you notice a performance degradation of queries that reference the accelerator-archive table.

At the time of this writing, the ‘Organized %’ column (as shown on a graphical user interface or as part of the result set returned by the SYSPROC.ACCEL_GET_TABLES_INFO stored procedure) does not show how well organized or sorted a table is. This makes it hard to tell whether a re-sorting is necessary. APAR PH27645 (IBM Db2 Analytics Accelerator Studio does not show the status of an organizing key on a table) has been opened to address this issue.

 

[1]     Changing the distribution key does result in a synchronous reorganization of the data.

[2]     The same restriction applies for the distribution key!

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS4LQ8","label":"Db2 Analytics Accelerator for z\/OS"},"ARM Category":[{"code":"a8m0z0000000741AAA","label":"Administration"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
23 October 2020

UID

ibm16255138