Updates
statistics in the system catalog about the characteristics of a table
and/or associated indexes, or statistical views. These characteristics
include number of records, number of pages, and average record length.
The optimizer uses these statistics when determining access paths
to the data.
For a table, call the RUNSTATS command when the table
has had many updates, or after reorganizing the table. For a statistical
view, call the RUNSTATS command when changes to underlying tables
have substantially affected the rows returned by the view. The view
must have been previously enabled for use in query optimization by
using the ALTER VIEW statement.
Scope
The
RUNSTATS command can be issued from any database partition in the db2nodes.cfg file.
It can be used to update the catalogs on the catalog database partition.
For
tables, this command collects statistics for a table on the database
partition from which it is invoked. If the table does not exist on
that database partition, the first database partition in the database
partition group is selected.
For views, this command collects
statistics using data from tables on all participating database partitions.
Authorization
For tables, one of the following
authorities:
- SYSADM
- SYSCTRL
- SYSMAINT
- DBADM
- SQLADM
- CONTROL privilege on the table
- LOAD authority
You do not need any explicit privilege to use this command
on any declared temporary table that exists within its connection.
For
statistical views, one of the following authorities:
- SYSADM
- SYSCTRL
- SYSMAINT
- DBADM
- SQLADM
- CONTROL privilege on the statistical view
Required connection
Database
Command syntax
>>-RUNSTATS--ON TABLE--object-name------------------------------>
>--+-+------------------------+--+------------------------------------+-+-><
| '-| Statistics Options |-' '-UTIL_IMPACT_PRIORITY--+----------+-' |
| '-priority-' |
+-USE PROFILE--------------------------------------------------------+
'-UNSET PROFILE------------------------------------------------------'
Statistics Options
.-ALLOW WRITE ACCESS-.
|--+--------------------------+--+--------------------+--------->
'-| Table Object Options |-' '-ALLOW READ ACCESS--'
>--+----------------------------+------------------------------->
'-| Table Sampling Options |-'
>--+----------------------------+--+---------------------+------|
'-| Index Sampling Options |-' '-| Profile Options |-'
Table Object Options
|--+-FOR--| Index Clause |--+-----------------------+----------------------------------+--|
| '-EXCLUDING XML COLUMNS-' |
'-+-------------------------+--+-----------------------+--+-----------------------+-'
'-| Column Stats Clause |-' '-EXCLUDING XML COLUMNS-' '-AND--| Index Clause |-'
Table Sampling Options
|--TABLESAMPLE--+-BERNOULLI-+--(--numeric-literal--)------------>
'-SYSTEM----'
>--+-----------------------------------+------------------------|
'-REPEATABLE--(--integer-literal--)-'
Index Sampling Options
|--INDEXSAMPLE--+-BERNOULLI-+--(--numeric-literal--)------------|
'-SYSTEM----'
Profile Options
.-SET PROFILE NONE--------------.
|--+-------------------------------+----------------------------|
'-+-SET----+--PROFILE--+------+-'
'-UPDATE-' '-ONLY-'
Index Clause
|--+-------------------------+--+-INDEXES-+--------------------->
| .-SAMPLED---. | '-INDEX---'
'-+-----------+--DETAILED-'
'-UNSAMPLED-'
.-,----------.
V |
>--+---index-name-+-+-------------------------------------------|
'-ALL------------'
Column Stats Clause
|--+-| On Cols Clause |------------------------------+----------|
'-+--------------------+--| Distribution Clause |-'
'-| On Cols Clause |-'
On Cols Clause
.-ON ALL COLUMNS-----------------------------------------------------.
| .-,-----------------. |
| V | |
|--+-ON--+----------------------+--COLUMNS--(----| Column Option |-+--)-+--|
| '-+-ALL-+--COLUMNS AND-' |
| '-KEY-' |
'-ON KEY COLUMNS-----------------------------------------------------'
Distribution Clause
|--WITH DISTRIBUTION--+-------------------------+--------------->
'-| On Dist Cols Clause |-'
>--+--------------------------+---------------------------------|
'-| Default Dist Options |-'
On Dist Cols Clause
.-ON ALL COLUMNS-----------------------------------------------------------------------------------.
| .-,-----------------------------------------------. |
| | .--------------------------. | |
| V V | | |
|--+-ON--+----------------------+--COLUMNS--(----| Column Option |----+----------------------+-+-+--)-+--|
| '-+-ALL-+--COLUMNS AND-' +-| Frequency Option |-+ |
| '-KEY-' '-| Quantile Option |--' |
'-ON KEY COLUMNS-----------------------------------------------------------------------------------'
Default Dist Option
.--------------------------.
V |
|--DEFAULT----+-| Frequency Option |-+-+------------------------|
'-| Quantile Option |--'
Frequency Option
|--NUM_FREQVALUES--integer--------------------------------------|
Quantile Option
|--NUM_QUANTILES--integer---------------------------------------|
Column Option
|--+-column-name--+-----------------+-+-------------------------|
| '-LIKE STATISTICS-' |
| .-,-----------. |
| V | |
'-(----column-name-+--)------------'
Command parameters
- object-name
- Identifies the table or statistical view on which statistics are
to be collected. This parameter must not be a hierarchy table. For
typed tables, object-name must be the name of the
root table of the table hierarchy. The fully qualified name or alias
in the form: schema.object-name must
be used. The schema is the user name under which the table was created.
- USE PROFILE
- This option allows RUNSTATS to employ a previously
stored statistics profile to gather statistics for a table or statistical
view. The statistics profile is created using the SET PROFILE options
and is updated using the UPDATE PROFILE options.
- UNSET PROFILE
- Specify this option to remove an existing statistics profile.
For example,
RUNSTATS ON tablemyschema.mytable UNSET PROFILE
- FOR INDEXES
- Collects and
updates statistics for the indexes only. If no table statistics had been previously collected on the
table, basic table statistics are also collected. Updates statistics for table cardinality (CARD),
FPAGES, NPAGES, even when table statistics exists. These basic statistics do not include any
distribution statistics. This option cannot be used for views. COLCARD of the leading column of the
index might also be updated.
- SAMPLED
- Used together only with the DETAILED parameter.
Specifying this option does not change the default functionality
from DETAILED. This option is left in for compatibility
with previous versions of DB2®.
This option cannot be used for views.
- UNSAMPLED
- This option, when used with the DETAILED option,
forces RUNSTATS to examine every entry in the index
to compute the extended index statistics. This option cannot be used
for views and it cannot be used together with scan index sampling
(INDEXSAMPLE keyword). This option significantly
increases RUNSTATS resource consumption, while
rarely providing significant improvement over the DETAILED or SAMPLED
DETAILED options, which are equivalent.
- DETAILED
- Calculates extended index statistics. The extended index statistics
are the CLUSTERFACTOR and PAGE_FETCH_PAIRS statistics
that are gathered for relatively large indexes. Not all index entries
are examined, a CPU sampling technique is employed instead to improve
performance. This option cannot be used for views.
- index-name
- Identifies an existing index defined on the table. If you do not
specify the fully qualified name in the form: schema.index-name,
the default schema is assumed. This option cannot be used for views.
- EXCLUDING XML COLUMNS
- Use this clause to omit all XML type columns
from statistics collection. Using this clause facilitates the collection
of statistics on non-XML columns because the inclusion of XML data
can require greater system resources. The EXCLUDING XML
COLUMNS clause takes precedence over other clauses that
specify XML columns for statistics collection. For example, if you
use the EXCLUDING XML COLUMNS clause, and you
also specify XML type columns with the ON COLUMNS clause
or you use the ON ALL COLUMNS clause, all XML
type columns will be ignored during statistics collection. For DB2 V9.7 Fix Pack 1 and later releases,
distribution statistics over XML type columns are not collected when
this clause is specified.
- AND INDEXES
- Collects and updates statistics for both the table and the indexes.
This option cannot be used for views.
- ON ALL COLUMNS
- To collect statistics on all eligible columns,
use the ON ALL COLUMNS clause. Columns can be
specified either for basic statistics collection (On Cols clause)
or in conjunction with the WITH DISTRIBUTION clause
(On Dist Cols clause). The ON ALL COLUMNS specification
is the default option if neither of the column specific clauses are
specified.
If it is specified in the On Cols clause,
all columns will have only basic column statistics collected unless
specific columns are chosen as part of the WITH DISTRIBUTION clause.
Those columns specified as part of the WITH DISTRIBUTION clause
will also have basic and distribution statistics collected.
If
the WITH DISTRIBUTION ON ALL COLUMNS is specified
both basic statistics and distribution statistics are collected for
all eligible columns. Anything specified in the On Cols clause
is redundant and therefore not necessary.
- ON COLUMNS
- To
collect statistics on specific columns, column groups, or both, use
the ON COLUMNS. A column group is a parenthesized
comma-separated list of columns for which you want to collect combined
statistics.
The column and column groups are specified as a parenthesized
comma-separated list.
When you run the
RUNSTATS command
on a table without gathering index statistics and specify a subset
of columns for which statistics are to be gathered:
- Statistics for columns not specified in the RUNSTATS command
but which are the first column in an index are not reset.
- Statistics for all other columns not specified in the RUNSTATS command
are reset.
This clause can be used in the On Cols clause
and the On Dist Cols clause. Collecting distribution
statistics for a group of columns is not currently supported.
If
XML type columns are specified in a column group, the XML type columns
are ignored for collecting distinct values for the group. However,
basic XML column statistics are collected for the XML type columns
in the column group.
- ON KEY COLUMNS
- Instead of listing specific columns, you can choose to collect
statistics on columns that make up all the indexes defined on the
table. It is assumed here that critical columns in queries are also
those used to create indexes on the table. If there are no indexes
on the table, it is as good as an empty list and no column statistics
will be collected. It can be used in the On Cols clause
or the On Dist Cols clause. It is redundant in the On
Cols clause if specified in both clauses since the WITH
DISTRIBUTION clause is used to specify collection of both
basic and distribution statistics. XML type columns are by definition
not a key column and will not be included for statistics collection
by the ON KEY COLUMNS clause. This option cannot
be used for views.
- column-name
- Name of a column in the table or statistical view. If you specify
the name of an ineligible column for statistics collection, such as
a nonexistent column or a mistyped column name, error (-205) is returned.
Two lists of columns can be specified, one without distribution and
one with distribution. If the column is specified in the list that
is not associated with the WITH DISTRIBUTION clause
only basic column statistics will be collected. If the column appears
in both lists, distribution statistics will be collected (unless NUM_FREQVALUES and NUM_QUANTILES are
set to zero).
- LIKE STATISTICS
- When
this option is specified additional column statistics might be collected
for columns of type CHAR and VARCHAR with a code page attribute of
single-byte character set (SBCS), FOR BIT DATA, or UTF-8. The statistics
are collected if the runstats utility determines
that such statistics are appropriate after analyzing column values.
These statistics are the SUB_COUNT and the SUB_DELIM_LENGTH statistics
in SYSSTAT.COLUMNS. They are used by the query optimizer
to improve the selectivity estimates for predicates of the type "column
LIKE '%xyz'"and "column LIKE '%xyz%'".
- WITH DISTRIBUTION
- This clause specifies that both basic statistics and distribution
statistics are to be collected on the columns. If the ON
COLUMNS clause is not specified, distribution statistics
are collected on all the columns of the table or statistical view
(excluding columns that are ineligible such as CLOB and LONG VARCHAR).
If the ON COLUMNS clause is specified, distribution
statistics are collected only on the column list provided (excluding
those ineligible for statistics collection). If the clause is not
specified, only basic statistics are collected.
Collection of
distribution statistics on column groups is currently not supported;
distribution statistics will not be collected when column groups
are specified in the WITH DISTRIBUTION ON COLUMNS clause.
- DEFAULT
- If NUM_FREQVALUES or NUM_QUANTILES are
specified, these values will be used to determine the maximum number
of frequency and quantile statistics to be collected for the columns,
if these are not specified for individual columns in the ON
COLUMNS clause. If the DEFAULT clause
is not specified, the values used will be those in the corresponding
database configuration parameters.
- NUM_FREQVALUES
- Defines
the maximum number of frequency values to collect. It can be specified
for an individual column in the ON COLUMNS clause.
If the value is either not specified or is specified as '-1' for
an individual column, the frequency limit value is picked up from
that specified in the DEFAULT clause. If it is
not specified there either, the maximum number of frequency values
to be collected will be what is set in the num_freqvalues database
configuration parameter.
- NUM_QUANTILES
- Defines
the maximum number of distribution quantile values to collect. It
can be specified for an individual column in the ON COLUMNS clause.
If the value is either not specified or is specified as '-1' for
an individual column, the quantile limit value is picked up from that
specified in the DEFAULT clause. If it is not
specified there either, the maximum number of quantile values to be
collected will be what is set in the num_quantiles database
configuration parameter.
For DB2 V9.7
Fix Pack 1 and later releases, distribution statistics for each index
over XML data uses a maximum of 250 quantiles as the default. The
default can be changed by specifying the NUM_QUANTILES parameter
in the ON COLUMNS or the DEFAULT clause.
The num_quantiles database configuration parameter
is ignored while collecting XML distribution statistics.
- ALLOW WRITE ACCESS
- Specifies that other users can read from and write to the tables
while statistics are calculated. For statistical views, these are
the base tables referenced in the view definition.
The ALLOW
WRITE ACCESS option is not recommended for tables that
will have a lot of inserts, updates or deletes occurring concurrently.
The RUNSTATS command first performs table statistics
and then performs index statistics. Changes in the table's state between
the time that the table and index statistics are collected might result
in inconsistencies. Although having up-to-date statistics is important
for the optimization of queries, it is also important to have consistent
statistics. Therefore, statistics should be collected at a time when
inserts, updates or deletes are at a minimum.
- ALLOW READ ACCESS
- Specifies that other users can have read-only access to the tables
while statistics are calculated. For statistical views, these are
the base tables referenced in the view definition.
- TABLESAMPLE BERNOULLI
- This option allows RUNSTATS to collect statistics
on a sample of the rows from the table or statistical view. Bernoulli
sampling considers each row individually, including that row
with probability P/100 (where P is
the value of numeric-literal) and excluding it with probability 1-P/100.
Thus, if the numeric-literal were evaluated to be the value 10,
representing a 10 percent sample, each row would be included with
probability 0.1 and be excluded with probability 0.9. Unless the
optional REPEATABLE clause is specified, each
execution of RUNSTATS will usually yield a different
such sample of the table. All data pages will be retrieved through
a table scan but only the percentage of rows as specified through
the numeric-literal parameter will be used for the statistics collection.
- TABLESAMPLE SYSTEM
- This option allows RUNSTATS to collect statistics
on a sample of the data pages from the tables. System sampling considers
each page individually, including that page with probability P/100
(where P is the value of numeric-literal) and excluding
it with probability 1-P/100. Unless the optional REPEATABLE clause
is specified, each execution of RUNSTATS will usually
yield a different such sample of the table. The size of the sample
is controlled by the numeric-literal parameter in parentheses, representing
an approximate percentage P of the table to be returned. Only a
percentage of the data pages as specified through the numeric-literal
parameter will be retrieved and used for the statistics collection.
For statistical views, SYSTEM sampling
can only be applied to a single base table referenced in the view
definition. If the view contains multiple tables, SYSTEM sampling
is possible if a single table among all the tables in the statistical
view can be identified as being joined with all primary keys or unique
index columns of the other tables used in the view. If the statistical
view does not meet those conditions, Bernoulli sampling will be used
instead and a warning will be returned.
- numeric-literal
- The numeric-literal parameter specifies the size of the sample
to be obtained, as a percentage P. This value
must be a positive number that is less than or equal to 100, and can
be between 1 and 0. For example,
a value of 0.01 represents one one-hundredth of a percent, such that
1 row in 10,000 would be sampled, on average. A value of 0 or 100 will
be treated by the DB2 database
system as if sampling was not specified, regardless of whether TABLESAMPLE
BERNOULLI or TABLESAMPLE SYSTEM is
specified. A value greater than 100 or less than 0 will
be treated as an error (SQL1197N) by the DB2 database
system.
- REPEATABLE (integer-literal)
- Adding the REPEATABLE clause to the TABLESAMPLE clause
ensures that repeated executions of RUNSTATS return
the same sample. The integer-literal parameter
is a non-negative integer representing the seed to be used in sampling.
Passing a negative seed will result in an error (SQL1197N). The sample
set might still vary between repeatable RUNSTATS invocations
if activity against the table or statistical view resulted in changes
to the table or statistical view data since the last time TABLESAMPLE
REPEATABLE was run. Also, the method by which the sample
was obtained as specified by the BERNOULLI or SYSTEM keyword,
must also be the same to ensure consistent results.
- INDEXSAMPLE BERNOULLI
- Use this option to collect index statistics on a sample of the
rows in the index. Bernoulli sampling considers each
row individually, including the row with probability P/100
(where P is the value of the numeric-literal) and
excluding it with probability 1-P/100. Thus, if
the numeric-literal were evaluated to be the value 10,
representing a 10 percent sample, each row would be included with
probability 0.1 and be excluded with probability 0.9. Each execution
of RUNSTATS is likely to yield a different sample
of the index. All index pages are retrieved through an index scan
but only the percentage of rows as specified through the numeric-literal
parameter is used for the statistics collection. This option is not
supported on statistical views.
- INDEXSAMPLE SYSTEM
- Use this option to collect statistics on a sample of the index
pages. System sampling considers each page individually,
including the page with probability P/100 (where P is
the value of the numeric-literal) and excluding it with probability
1-P/100. Each execution of the RUNSTATS command
usually yields a different sample of the index. The size of the sample
is controlled by the numeric-literal parameters in parentheses, representing
an approximate percentage P of the index to be
returned. Only a percentage of the index pages as specified through
the numeric-literal parameter is retrieved and used for the statistics
collection. This option is not supported on statistical views.
- SET PROFILE NONE
- Specifies that no statistics profile will be set for this RUNSTATS invocation.
- SET PROFILE
- Allows RUNSTATS to generate and store a specific
statistics profile in the system catalog tables and executes the RUNSTATS command
options to gather statistics.
- SET PROFILE ONLY
- Allows RUNSTATS to generate and store a specific
statistics profile in the system catalog tables without running the RUNSTATS command
options.
- UPDATE PROFILE
- Allows RUNSTATS to modify an existing statistics
profile in the system catalog tables, and runs the RUNSTATS command
options of the updated statistics profile to gather statistics. You
cannot use the UPDATE PROFILE option to remove clauses
that are in a statistics profile.
- UPDATE PROFILE ONLY
- Allows RUNSTATS to modify an existing statistics
profile in the system catalog tables without running the RUNSTATS command
options of the updated statistics profile. You cannot use the UPDATE
PROFILE ONLY option to remove clauses that are in a statistics
profile.
- UTIL_IMPACT_PRIORITY priority
- Specifies that RUNSTATS will be throttled at
the level specified by priority. priority is
a number in the range of 1 to 100,
with 100 representing the highest priority and 1 representing
the lowest. The priority specifies the amount of throttling to which
the utility is subjected. All utilities at the same priority undergo
the same amount of throttling, and utilities at lower priorities are
throttled more than those at higher priorities. If priority is
not specified, the RUNSTATS will have the default
priority of 50. Omitting the UTIL_IMPACT_PRIORITY keyword
will invoke the RUNSTATS utility without throttling
support. If the UTIL_IMPACT_PRIORITY keyword
is specified, but the util_impact_lim configuration
parameter is set to 100, then the utility will run
unthrottled.
In a partitioned database, when used on tables,
the RUNSTATS command collects the statistics on
only a single database partition. If the database partition from which
the RUNSTATS command is executed has a partition
of the table, then the command executes on that database partition.
Otherwise, the command executes on the first database partition in
the database partition group across which the table is partitioned.
Examples
- Collect statistics on the table only, on all columns without distribution
statistics:
RUNSTATS ON TABLE employee
- Collect statistics on the table only, on columns empid and empname
with distribution statistics:
RUNSTATS ON TABLE employee
WITH DISTRIBUTION ON COLUMNS (empid, empname)
- Collect statistics on the table only, on all columns with distribution
statistics using a specified number of frequency limit for the table
while picking the num_quantiles from the configuration
setting:
RUNSTATS ON TABLE employee WITH DISTRIBUTION DEFAULT
NUM_FREQVALUES 50
- Collect
statistics on the table only, on all columns with distribution statistics,
and on column group JOB, WORKDEPT, and SEX:
RUNSTATS ON TABLE employee ON ALL COLUMNS AND COLUMNS ((JOB, WORKDEPT, SEX))
WITH DISTRIBUTION
- Collect statistics on a set of indexes:
RUNSTATS ON TABLE employee for indexes empl1, empl2
- Collect statistics on all indexes only:
RUNSTATS ON TABLE employee FOR INDEXES ALL
- Collect basic statistics on the table and all indexes using sampling
for the detailed index statistics collection:
RUNSTATS ON TABLE employee AND SAMPLED DETAILED INDEXES ALL
This
is equivalent to:RUNSTATS ON TABLE employee AND DETAILED INDEXES ALL
- Collect statistics on table, with distribution statistics on columns
empid, empname and empdept and the two indexes Xempid and Xempname.
Distribution statistics limits are set individually for empdept, while
the other two columns use a common default:
RUNSTATS ON TABLE employee
WITH DISTRIBUTION ON COLUMNS (empid, empname, empdept NUM_FREQVALUES
50 NUM_QUANTILES 100)
DEFAULT NUM_FREQVALUES 5 NUM_QUANTILES 10
AND INDEXES Xempid, Xempname
- Collect statistics on all columns used in indexes and on all indexes:
RUNSTATS ON TABLE employee ON KEY COLUMNS AND INDEXES ALL
- Collect statistics on all indexes and all columns without distribution
except for one column. Consider T1 containing columns c1, c2, ....,
c8
RUNSTATS ON TABLE T1
WITH DISTRIBUTION ON COLUMNS (c1, c2, c3 NUM_FREQVALUES 20
NUM_QUANTILES 40, c4, c5, c6, c7, c8)
DEFAULT NUM_FREQVALUES 0, NUM_QUANTILES 0 AND INDEXES ALL
RUNSTATS ON TABLE T1
WITH DISTRIBUTION ON COLUMNS (c3 NUM_FREQVALUES 20 NUM_QUANTILES 40)
AND INDEXES ALL
- Collect
statistics on table T1 for the individual columns c1 and c5 as well
as on the column groups (c2, c3) and (c2, c4). Multicolumn cardinality
is very useful to the query optimizer when it estimates filter factors
for predicates on columns in which the data is correlated.
RUNSTATS ON TABLE T1 ON COLUMNS (c1, (c2, c3),
(c2, c4), c5)
- Collect statistics on table T1 for the individual columns c1 and
c2. For column c1 also collect the LIKE predicate statistics.
RUNSTATS ON TABLE T1 ON COLUMNS (c1 LIKE STATISTICS, c2)
- Register a statistics profile to collect statistics on the table
only, on all columns with distribution statistics using a specified
number of frequency limit for the table while picking the num_quantiles from
the configuration setting. The command also updates the statistics
as specified.
RUNSTATS ON TABLE employee WITH DISTRIBUTION DEFAULT
NUM_FREQVALUES 50 SET PROFILE
- Register a statistics profile to collect statistics on the table
only, on all columns with distribution statistics using a specified
number of frequency limit for the table while picking the num_quantiles from
the configuration setting. Statistics are not collected.
RUNSTATS ON TABLE employee WITH DISTRIBUTION
DEFAULT NUM_FREQVALUES 50 SET PROFILE ONLY
- Modify the previously registered statistics profile by changing
the NUM_FREQVALUES value from 50 to 30.
The command also updates the statistics as specified.
RUNSTATS ON TABLE employee WITH DISTRIBUTION
DEFAULT NUM_FREQVALUES 30 UPDATE PROFILE
- Modify the previously registered statistics profile by changing
the NUM_FREQVALUES value from 50 to 30.
Statistics are not collected.
RUNSTATS ON TABLE employee WITH DISTRIBUTION
DEFAULT NUM_FREQVALUES 30 UPDATE PROFILE ONLY
- Modify the previously registered statistics profile by adding
column empl_address and column group (empl_title, empl_salary)
options. The command also updates the statistics as specified.
RUNSTATS ON TABLE employee
ON COLUMNS (empl_address, (empl_title, empl_salary))
UPDATE PROFILE
- Modify the previously registered statistics profile by adding
column empl_address and column group (empl_title, empl_salary)
options. Statistics are not collected.
RUNSTATS ON TABLE employee
ON COLUMNS (empl_address, (empl_title, empl_salary))
UPDATE PROFILE ONLY
- Collect statistics on a table using the options recorded in the
statistics profile for that table:
RUNSTATS ON TABLE employee USE PROFILE
- Query the RUNSTATS command options corresponding
to the previously registered statistics profile stored in the catalogs
of the table:
SELECT STATISTICS_PROFILE FROM SYSCAT.TABLES WHERE TABNAME =
'EMPLOYEE'
- Collect statistics, including distribution statistics, on 30 percent
of the rows:
RUNSTATS ON TABLE employee WITH DISTRIBUTION
TABLESAMPLE BERNOULLI(30)
- To control the sample set on which statistics will be collected
and to be able to repeatedly use the same sample set, you can do so
as follows:
RUNSTATS ON TABLE employee WITH DISTRIBUTION
TABLESAMPLE BERNOULLI(30) REPEATABLE(4196)
Issuing
the preceding statement will result in the same set of statistics
as long as the data has not changed in the interim.
- Collect index statistics as well as table statistics on 1.5 percent
of the data pages. Only table data pages and not index pages are sampled.
In this example 1.5 percent of table data pages are used for the
collection of table statistics, while for index statistics all the
index pages will be used:
RUNSTATS ON TABLE employee AND INDEXES ALL TABLESAMPLE SYSTEM(1.5)
- Collect table statistics on 1.5 percent of the data pages and
index statistics on 2.5 percent of the index pages. Both table data
pages and index pages are sampled:
RUNSTATS ON TABLE employee AND INDEXES ALL TAMBLESAMPLE SYSTEM(1.5)
INDEXSAMPLE SYSTEM(2.5)
- Collect statistics for a statistical view, on all columns, without
distribution statistics:
RUNSTATS ON VIEW product_sales_view
- Collect statistics for a statistical view, with distribution statistics
on the columns category, type and product_key.
Distribution statistics limits are set for the category column,
while the other columns use a common default:
RUNSTATS ON VIEW product_sales_view
WITH DISTRIBUTION ON COLUMNS (category NUM_FREQVALUES 100 NUM_QUANTILES 100,
type, product_key) DEFAULT NUM_FREQVALUES 50 NUM_QUANTILES 50
- Collect statistics for a statistics view, including distribution
statistics, on 10 percent of the rows using row level sampling:
RUNSTATS ON VIEW product_sales_view
WITH DISTRIBUTION TABLESAMPLE BERNOULLI (10)
- Collect statistics for a statistics view, including distribution
statistics, on 2.5 percent of the rows using data page level sampling.
Additionally, specify the repeated use of the same sample set. For
this command to succeed, the query must be such that the DB2 database system can successfully push data
page sampling down to one or more tables. Otherwise, an error (SQL
20288N) is issued.
RUNSTATS ON VIEW product_sales_view
WITH DISTRIBUTION TABLESAMPLE SYSTEM (2.5)
- Register a statistics profile to collect statistics on the view
and on all columns with distribution statistics as specified:
RUNSTATS ON VIEW product_sales_view
WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 50 NUM_QUANTILES 50
SET PROFILE
- Modify the previously registered statistics profile. This command
also updates the statistics as specified:
RUNSTATS ON VIEW salesdb.product_sales_view
WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 25 NUM_QUANTILES 25
UPDATE PROFILE
Usage notes
- When there are detached partitions on a partitioned table, index
keys that still belong to detached data partitions which require cleanup
will not be counted as part of the keys in the statistics. These keys
are not counted because they are invisible and no longer part of the
table. They will eventually get removed from the index by asynchronous
index cleanup. As a result, statistics collected before asynchronous
index cleanup is run will be misleading. If the RUNSTATS command
is issued before asynchronous index cleanup completes, it will likely
generate a false alarm for index reorganization or index cleanup based
on the inaccurate statistics. Once asynchronous index cleanup is run,
all the index keys that still belong to detached data partitions which
require cleanup will be removed and this may eliminate the need for
index reorganization.
For partitioned tables, you are encouraged to issue
the RUNSTATS command after an asynchronous index
cleanup has completed in order to generate accurate index statistics
in the presence of detached data partitions. To determine whether
or not there are detached data partitions in the table, you can check
the status field in the SYSCAT.DATAPARTITIONS catalog view and look
for the value L (logically detached), I (index
cleanup), or D (detached with dependent MQT).
The RUNSTATS command collects
statistics for all index partitions of a partitioned index. Statistics
in the SYSSTAT.INDEXES view for the partitioned index represent an
index partition, except for FIRSTKEYCARD, FIRST2KEYCARD, FIRST3KEYCARD,
FIRST4KEYCARD, and FULLKEYCARD statistics. Because these statistics
are used in cardinality estimates, they are for the entire index and
not for an index partition. Distribution statistics (frequent values
and quantiles) are not collected for partitioned indexes, but are
gathered if RUNSTATS is run on the table. Statistics
on the leading columns of a partitioned index might not be as accurate
as statistics on the leading columns of a nonpartitioned index.
- It is recommended to run the RUNSTATS command:
- On tables that have been modified considerably (for example, if
a large number of updates have been made, or if a significant amount
of data has been inserted or deleted or if LOAD has
been done without the statistics option during LOAD).
- On tables that have been reorganized (using REORG, REDISTRIBUTE
DATABASE PARTITION GROUP).
- On tables which have been row compressed.
- When a new index has been created.
- Before binding applications whose performance is critical.
- When the prefetch quantity is changed.
- On statistical views whose underlying tables have been modified
substantially so as to change the rows that are returned by the view.
- After LOAD has been executed with the STATISTICS option,
use the RUNSTATS utility to collect statistics
on XML columns. Statistics for XML columns are never collected during LOAD,
even when LOAD is executed with the STATISTICS option.
When RUNSTATS is used to collect statistics for
XML columns only, existing statistics for non-XML columns that have
been collected by LOAD or a previous execution
of the RUNSTATS utility are retained. In the case
where statistics on some XML columns have been collected previously,
the previously collected statistics for an XML column will either
be dropped if no statistics on that XML column are collected by the
current command, or be replaced if statistics on that XML column are
collected by the current command.
- The options chosen must depend on the specific table and the application.
In general:
- If the table is a very critical table in critical queries, is
relatively small, or does not change too much and there is not too
much activity on the system itself, it might be worth spending the
effort on collecting statistics in as much detail as possible.
- If the time to collect statistics is limited, if the table is
relatively large, or if the table is updated frequently, it might
be beneficial to execute RUNSTATS limited to the
set of columns that are used in predicates. This way, you will be
able to execute the RUNSTATS command more often.
- If time to collect statistics is very limited and the effort to
tailor the RUNSTATS command on a table by table
basis is a major issue, consider collecting statistics for the "KEY"
columns only. It is assumed that the index contains the set of columns
that are critical to the table and are most likely to appear in predicates.
- If time to collect statistics is very limited and table statistics
are to be gathered, consider using the TABLESAMPLE option
to collect statistics on a subset of the table data.
- If
time to collect statistics is very limited and index statistics are
to be gathered, consider using the INDEXSAMPLE option
to collect statistics on a subset of the index data.
- If there is skew in certain columns and predicates of the type "column
= constant", it might be beneficial to specify a larger NUM_FREQVALUES value
for that column
- Collect distribution statistics for all columns that are used
in equality predicates and for which the distribution of values might
be skewed.
- For columns that have range predicates (for example "column
>= constant", "column BETWEEN constant1 AND constant2") or
of the type "column LIKE '%xyz'", it might be beneficial
to specify a larger NUM_QUANTILES value.
- If storage space is a concern and one cannot afford too much time
on collecting statistics, do not specify high NUM_FREQVALUES or NUM_QUANTILES
values for columns that are not used in predicates.
- If index statistics are requested, and statistics have never
been run on the table containing the index, statistics on both the
table and indexes are calculated.
- If statistics for XML columns in the table are not required, the EXCLUDING
XML COLUMNS option can be used to exclude all XML columns.
This option takes precedence over all other clauses that specify XML
columns for statistics collection.
- After the command is run, note the following:
- A COMMIT should be issued to release the locks.
- To allow new access plans to be generated, the packages that reference
the target table must be rebound.
- Executing the command on portions of the table could result in
inconsistencies as a result of activity on the table since the command
was last issued. In this case a warning message is returned. Issuing RUNSTATS on
the table only might make table and index level statistics inconsistent.
For example, you might collect index level statistics on a table and
later delete a significant number of rows from the table. If you then
issue RUNSTATS on the table only, the table cardinality
might be less than FIRSTKEYCARD, which is an inconsistency.
In the same way, if you collect statistics on a new index when you
create it, the table level statistics might be inconsistent.
- The RUNSTATS command drops previously collected
distribution statistics if table statistics are requested. For example, RUNSTATS
ON TABLE, or RUNSTATS ON TABLE … AND INDEXES
ALL will cause previously collected distribution statistics
to be dropped. If the command is run on indexes only then previously
collected distribution statistics are retained. For example, RUNSTATS
ON TABLE … FOR INDEXES ALL causes the previously collected
distribution statistics to be retained. If the RUNSTATS command
is run on XML columns only, then previously collected basic column
statistics and distribution statistics are retained. In the case where
statistics on some XML columns have been collected previously, the
previously collected statistics for an XML column are either be dropped
if no statistics on that XML column are collected by the current command,
or be replaced if statistics on that XML column are collected by the
current command.
- For DB2 V9.7
Fix Pack 1 and later releases, distribution statistics are collected
on indexes over XML data defined on an XML column. When the RUNSTATS
command is run on a table with the WITH DISTRIBUTION clause,
the following apply to the collection of distribution statistics on
a column of type XML:
- Distribution statistics are collected for each index over XML
data specified on an XML column.
- The RUNSTATS command must collect both distribution
statistics and table statistics to collect distribution statistics
for indexes over XML data defined on an XML column. Table statistics
must be gathered in order for distribution statistics to be collected
since XML distribution statistics are stored with table statistics.
An
index clause is not required to collect XML distribution statistics.
Specifying only an index clause does not collect XML distribution
statistics
By default, XML distribution statistics use a maximum
of 250 quantiles for each index over XML data. When collecting distribution
statistics on an XML column, you can change the maximum number of
quantiles by specifying a value with NUM_QUANTILES parameter
in the ON COLUMNS or the DEFAULT clause.
- Distribution statistics are collected for indexes over XML data
of type VARCHAR, DOUBLE, TIMESTAMP, and DATE. Distribution statistics
are not collected over indexes of type VARCHAR HASHED.
- Distribution statistics are not collected for partitioned indexes
over XML data defined on a partitioned table.
- For range-clustered tables, there is a special system-generated
index in the catalog tables which represents the range ordering property
of range-clustered tables. When statistics are collected on this
type of table, if the table is to be included as part of the statistics
collection, statistics will also be collected for the system-generated
index. The statistics reflect the fast access of the range lookups
by representing the index as a two-level index with as many pages
as the base data table, and having the base data clustered perfectly
along the index order.
- In the On Dist Cols clause of the command syntax,
the Frequency Option and Quantile Option parameters are currently
not supported for column GROUPS. These options are
supported for single columns.
- There are three prefetch statistics that cannot be computed when
working in DMS mode. When looking at the index statistics in the index
catalogs, you will see a -1 value for the following
statistics:
- AVERAGE_SEQUENCE_FETCH_PAGES
- AVERAGE_SEQUENCE_FETCH_GAP
- AVERAGE_RANDOM_FETCH_PAGES
- A statistics profile can be set or updated for the table or statistical
view specified in the RUNSTATS command, by using
the set profile or update profile options. The statistics profile
is stored in a visible string format, which represents the RUNSTATS command,
in the STATISTICS_PROFILE column of the SYSCAT.TABLES system
catalog table.
- Statistics collection on XML type columns is governed by two DB2 database system registry values: DB2_XML_RUNSTATS_PATHID_K and DB2_XML_RUNSTATS_PATHVALUE_K.
These two parameters are similar to the NUM_FREQVALUES parameter
in that they specify the number of frequency values to collect. If
not set, a default of 200 will be used for both parameters.
- RUNSTATS acquires an IX table lock on SYSTABLES
and a U lock on the row for the table on which statistics are being
gathered at the beginning of RUNSTATS. Operations
can still read from SYSTABLES including the row with the U lock. Write
operations are also possible, providing they do not occur against
the row with the U lock. However, another reader or writer will not
be able acquire an S lock on SYSTABLES because of RUNSTATS' IX
lock.
- Statistics are not collected for columns with
structured types. If they are specified, columns with these data types
are ignored.
- Only AVGCOLLEN and NUMNULLS are collected for
columns with LOB or LONG data types.
- AVGCOLLEN represents the average space in bytes
when the column is stored in database memory or a temporary table.
This value represents the length of the data descriptor for LOB or
LONG data types, except when LOB data is inlined on the data page.
Note: The
average space required to store the column on disk may be different
than the value represented by this statistic.
- The UNSAMPLED
DETAILED option is available to change the way index statistics
are collected, but it should be used only in cases where its clear
that the default or DETAILED doesnt work.
- When
using the INDEXSAMPLE keyword you cannot specify
different index sampling rates for different indexes within a single
command. For example:
runstats on table orders and index o_ck indexsample system(5),
index o_ok indexsample system(10)
is invalid. The following
two RUNSTATS commands can be used to achieve the
required result:runstats on table orders and index o_ck indexsample system(5)
runstats on table orders for index o_ok indexsample system(10)
- If
you have modified the table since statistics were last collected on
the table or its indexes, you should run RUNSTATS ON TABLE
… AND INDEXES ALL. If you use RUNSTATS ON
TABLE … FOR INDEXES ALL, the resulting statistics might
be inconsistent.