Specifying or changing a distribution key or organizing keys
Distribution and organizing keys have a considerable impact on the query response time.
About this task
Distribution keys and organizing keys can influence the query response time positively as well as negatively. Therefore, make sure that you only select such keys when necessary, and that you select a proper key.
Organizing keys, on the other hand, are still not supported for accelerator-only tables.
Distribution key: The default behavior is: If the tables involved have an enforced unique constraint in Db2® for z/OS® (unique index), the index that consists of the fewest table columns is taken as the distribution key during the initial load of the table data. If none of the tables has such an index, the data (table rows) is distributed evenly among the processing nodes (random distribution).
In general, unique key columns, such as primary keys or columns with a unique index, are a good choice for a distribution key because all values are distinct, which leads to an even distribution of the data across the database partitions. Unique key columns are also often used as join columns for table joins in queries. Joining tables on distribution key columns results in a collocated join of the tables in the database partition that the data resides in. This optimizes the query performance.
The default distribution key can be set or changed to a single column or a set of columns of your choice after you have added or loaded a table on the accelerator. Note that setting the distribution key after you have added the table (but not loaded it yet) is much faster than changing it after you have loaded the table because no data must be redistributed. Consider the following points as you select columns for a distribution key:
- The table data should be distributed evenly across all database partitions.
Columns that allow NULL as a value are not a good choice if many NULL values can be expected. The reason is that all rows with a NULL in the specified column will end up in the same database partition.
- The key should effect a collocation of rows that are commonly fetched together from different
tables.
Use as few columns as possible in the distribution key to achieve this. For example, you might be able to use only the column that most frequently acts as a join column in your queries.
You can discard the default distribution key and change to a random distribution only after you have loaded the relevant table on the accelerator. This is what the Random distribution without a distribution key check box signifies.
If Random distribution without a distribution key is selected, all table rows are distributed sequentially to the database partitions (random distribution), so that each partition receives an equal share of the data. Expect an advantage of a change from a distribution key to random distribution only under the following conditions:
- The data is not distributed evenly.
- The use of the key did not result in collocated joins for most queries.
Unless both of these conditions apply, it is better to use a single column or a set of columns as the distribution key. If random distribution is used because a table does not contain a unique constraint in Db2 for z/OS, keep this setting for small tables only. For large tables, select a distribution key according to the guidelines above.
Organizing keys: An organizing key further enhances the performance in that it sorts table rows into blocks with equal values in the selected columns. The query engine can thus process incoming rows more quickly, as less time is required to scan the rows. Since the rows are distributed for processing in blocks, entire blocks can be skipped if the value in a particular column does not match a selection criterion specified in the query. It is not necessary to scan the entire accelerator-shadow table on disk.
- An organizing key has no effect if the table is too small. The Organized column in the Accelerator view reflects this by not showing a value for the degree of organization (percentage).
- Do not change the keys of more than one table at a time if tables with sizes of more than 1 GB are involved.
- An organizing key can only be changed after a table has been loaded because the key change effects a resorting of the table. And after the next reload, the organizing key and the sorting are lost, so that the key must be set again.
This task of altering or changing distribution keys or organizing keys is carried out by the SYSPROC.ACCEL_ALTER_TABLES stored procedure on your data server. For information about the privileges that are required to run this procedure and further details, follow the appropriate Related reference link at the end of this topic.
to display the search field. Type the names of columns you
are looking for in the search field, either fully or partially. This will display just the column
names that contain your search string.
icon next to it.
What about the Cardinality?