Specifying or changing a distribution key or organizing keys

Distribution and organizing keys have a considerable impact on the query response time.

About this task

Distribution keys and organizing keys can influence the query response time positively as well as negatively. Therefore, make sure that you only select such keys when necessary, and that you select a proper key.

Important: This product allows the setting of distribution keys for accelerator-only tables. The description of this type of key in this topic applies to accelerator-only tables in the same way. However, due to the different nature of accelerator-only tables, a distribution key must be set as part of the CREATE TABLE ... IN ACCELERATOR statement.

Organizing keys, on the other hand, are still not supported for accelerator-only tables.

Distribution key: The default behavior is: If the tables involved have an enforced unique constraint in Db2® for z/OS® (unique index), the index that consists of the fewest table columns is taken as the distribution key during the initial load of the table data. If none of the tables has such an index, the data (table rows) is distributed evenly among the processing nodes (random distribution).

In general, unique key columns, such as primary keys or columns with a unique index, are a good choice for a distribution key because all values are distinct, which leads to an even distribution of the data across the database partitions. Unique key columns are also often used as join columns for table joins in queries. Joining tables on distribution key columns results in a collocated join of the tables in the database partition that the data resides in. This optimizes the query performance.

The default distribution key can be set or changed to a single column or a set of columns of your choice after you have added or loaded a table on the accelerator. Note that setting the distribution key after you have added the table (but not loaded it yet) is much faster than changing it after you have loaded the table because no data must be redistributed. Consider the following points as you select columns for a distribution key:

  • The table data should be distributed evenly across all database partitions.

    Columns that allow NULL as a value are not a good choice if many NULL values can be expected. The reason is that all rows with a NULL in the specified column will end up in the same database partition.

  • The key should effect a collocation of rows that are commonly fetched together from different tables.

    Use as few columns as possible in the distribution key to achieve this. For example, you might be able to use only the column that most frequently acts as a join column in your queries.

You can discard the default distribution key and change to a random distribution only after you have loaded the relevant table on the accelerator. This is what the Random distribution without a distribution key check box signifies.

If Random distribution without a distribution key is selected, all table rows are distributed sequentially to the database partitions (random distribution), so that each partition receives an equal share of the data. Expect an advantage of a change from a distribution key to random distribution only under the following conditions:

  • The data is not distributed evenly.
  • The use of the key did not result in collocated joins for most queries.

Unless both of these conditions apply, it is better to use a single column or a set of columns as the distribution key. If random distribution is used because a table does not contain a unique constraint in Db2 for z/OS, keep this setting for small tables only. For large tables, select a distribution key according to the guidelines above.

Organizing keys: An organizing key further enhances the performance in that it sorts table rows into blocks with equal values in the selected columns. The query engine can thus process incoming rows more quickly, as less time is required to scan the rows. Since the rows are distributed for processing in blocks, entire blocks can be skipped if the value in a particular column does not match a selection criterion specified in the query. It is not necessary to scan the entire accelerator-shadow table on disk.

Important:
  • An organizing key has no effect if the table is too small. The Organized column in the Accelerator view reflects this by not showing a value for the degree of organization (percentage).
  • Do not change the keys of more than one table at a time if tables with sizes of more than 1 GB are involved.
  • An organizing key can only be changed after a table has been loaded because the key change effects a resorting of the table. And after the next reload, the organizing key and the sorting are lost, so that the key must be set again.

This task of altering or changing distribution keys or organizing keys is carried out by the SYSPROC.ACCEL_ALTER_TABLES stored procedure on your data server. For information about the privileges that are required to run this procedure and further details, follow the appropriate Related reference link at the end of this topic.

Procedure

  1. Connect to the Db2 subsystem or data sharing group that the accelerator is attached to. For more information, see Connecting to a database server.
  2. Select the accelerator. For more information, see Selecting an accelerator.
  3. Open the window to change the keys:
    IBM® Db2 Analytics Accelerator Studio

    Click the Alter Keys button on the toolbar.

    You see the Alter Distribution or Organizing Keys window.

    The list on the left has a Recommendation column, which suggests certain columns to be used in a distribution key or as organizing keys. Suggestions are made on the basis of a previous run of DB2® RUNSTATS. So if no recommendations can be seen, use RUNSTATS and then return to the Alter Distribution or Organizing Keys window.

    RUNSTATS also calculates the column cardinality (Cardinality in the selection list). This is the number of distinct values in a column. The higher this number, the better a column is suited as a distribution key.

    IBM Data Server Manager Click the Alter Keys button on the toolbar.

    You see the Alter Distribution and Organizing Keys window.

    The list of table columns contains a RECOMMENDED column, which suggests certain columns to be used in a distribution key or as organizing keys. Suggestions are made on the basis of a previous run of Db2 RUNSTATS. So if no recommendations can be seen, use RUNSTATS and then return to the Alter Distribution and Organizing Keys window.

    Draft comment: kuester
    What about the Cardinality?

    IBM Unified Experience for z/OS
    1. In the list of tables, select a table that you want to set a key on.
    2. Click the Table menu icon (3-dot menu) icon for that table, which is located in the very right column of the tables list.
    3. From the menu that opens, select Alter keys.
    IBM Db2 Analytics Accelerator Administration Services
    1. Start a /table/alterKeys request.
    2. In the request body:
      1. Specify the connection name ("connectionName":).
      2. Specify the accelerator name ("accelName":).
    Important: The RUNSTATS utility cannot collect statistics for accelerator-only tables because the data of this type of table does not reside in Db2. Hence key recommendations that rely on Db2 statistics cannot be given. You only find key recommendations for accelerator-only tables if these can be derived from the column data type.
  4. To specify a distribution key:
    IBM Db2 Analytics Accelerator Studio
    1. If random distribution is currently selected for the table, clear the Random distribution without a distribution key check box. This enables the controls for the definition of a distribution key.
    2. In the Alter Distribution or Organizing Keys window, you see a list of the columns in the selected table. To specify a column as the distribution key or as part of it, select the column in the list and click the right-arrow button.

      The Name like filter field makes it easier to find particular columns if the list is long. Type a column name in this field, either fully or partially, to display just the columns bearing or starting with that name. The names of columns that are currently selected as keys are grayed out.

      Selected columns appear in the upper list box on the right. You can remove a selected column by first selecting it in this box, and then clicking the left-arrow button.

      Using the buttons with the upward-pointing and downward-pointing arrows, you can change the order of the columns in the key. The order of the columns in the list has an influence on the hash value that is calculated to determine the target processing node. To place the rows of joined tables on the same processing node, the distribution keys of all tables must yield the same hash value. It is therefore important to specify the distribution key columns for all tables in the same order.

    3. Repeat this step to add further columns to the key. A maximum of four columns is allowed.
    IBM Data Server Manager
    1. To select a column for a distribution key, select the corresponding check box in the DISTRIBUTION KEY column.

      You can use a filter to find particular columns: Click Magnifying glass icon to display the search field. Type the names of columns you are looking for in the search field, either fully or partially. This will display just the column names that contain your search string.

      Selected columns appear in a small list or table on the lower left. You can remove a selected column from this list by clearing the check box in the DISTRIBUTION KEY column.

      Selecting an entry in the small list on the lower left and using the Move Up and Move Down buttons, you can change the order of the columns in the key. The order of the columns in the list has an influence on the hash value that is calculated to determine the target processing node. To place the rows of joined tables on the same processing node, the distribution keys of all tables must yield the same hash value. It is therefore important to specify the distribution key columns for all tables in the same order.

    2. Repeat the previous step to add further columns to the key. A maximum of four columns is allowed.
    IBM Unified Experience for z/OS In the Alter keys window, you see a list of the table columns.
    1. Select a column to include in the distribution key.
    2. Click the Table menu icon (3-dot menu) icon for that column.
    3. Select Add to distribution keys.

      The column name is added to the Distribution keys box on the right.

    4. Repeat steps 4.a through 4.c to add additional columns.
      Note: It is not possible to select more than one column, and then add these in a single step.

    You can change the column order of the key by dragging and dropping individual column names in the Distribution keys box.

    To remove a column from the key, click the Remove column icon icon next to it.

    IBM Db2 Analytics Accelerator Administration Services In the request body, specify distribution key columns as values of the "distKeysArray": array. Separate the values by a comma.
    Remember: It is a best practice to use as few columns as possible in a distribution key.
  5. To specify organizing keys:
    IBM Db2 Analytics Accelerator Studio
    1. Select a column to be used as an organizing key in the list on the left and click the right-arrow button next to the list box at the bottom.

      The Name like filter field makes it easier to find particular columns if the list is long. Type a column name in this field, either fully or partially, to display just the columns bearing or starting with that name. The names of columns that are currently selected as keys are grayed out.

      Selected columns appear in the lower list box on the right. You can remove a selected column by first selecting it in this box, and then clicking the left-arrow button.

    2. Repeat the previous step to add further keys. A maximum of four keys is allowed.
    IBM Data Server Manager
    1. To select a column as an organizing key, select the corresponding check box in the ORGANIZATION KEY column.

      You can use a filter to find particular columns: Click Magnifying glass icon to display the search field. Type the names of columns you are looking for in the search field, either fully or partially. This will display just the column names that contain your search string.

      You can remove an organizing key by clearing the check box in the ORGANIZATION KEY column.

    2. Repeat the previous step to add further keys. A maximum of four keys is allowed.
    IBM Unified Experience for z/OS In the Alter keys window, you see a list of the table columns.
    1. Select a column to add it as an organizing key.
    2. Click the Table menu icon (3-dot menu) icon for that column.
    3. Select Add to organizing keys.

      The column name is added to the Organizing keys box on the lower right.

    4. Repeat steps 5.a through 5.c to add additional columns.
      Note: It is not possible to select more than one column, and then add these in a single step.

    You can change the column order of the key by dragging and dropping individual column names in the Organization keys box.

    To remove an organizing key, click the Remove column icon icon next to the column name.

    IBM Db2 Analytics Accelerator Administration Services In the request body, as the values of the "schemaName": and "tableName": parameters, specify the name of the schema and the name of the table that you want to apply the keys to.
  6. Finish the key definition:
    IBM Db2 Analytics Accelerator Studio Click OK.
    IBM Data Server Manager Click OK.
    IBM Unified Experience for z/OS Click Submit.
    IBM Db2 Analytics Accelerator Administration Services Click Execute.