SYSPROC.ACCEL_COLLECT_TABLE_STATISTICS

Invokes the RUNSTATS utility for an accelerator-shadow table, an accelerator-only table, or a set of such tables on an accelerator.

Details

To run the RUNSTATS utility successfully on an accelerator table, the table must be in one of the following states:

Accelerator-shadow tables Accelerator-only tables
  • Loaded
  • ReplicationInProgress

    for tables that are incrementally updated by the IBM Integrated Synchronization function of the product. The stored procedure does not work for tables updated by IBM InfoSphere® Change Data Capture for z/OS (CDC). See the Restriction note further down.

  • UpdateInProgress
  • ArchivingInProgress
  • SchemaModificationInProgress
  • Operational
  • UpdateInProgress
Restriction: As soon as a Db2 subsystem is found for which CDC replication is enabled, SYSPROC.ACCEL_COLLECT_TABLE_STATISTICS refuses to run and exits with message AQT20139E. This means that in case your accelerator is paired with more than one Db2 subsystem, the stored procedure will also not work for tables from the other Db2 subsystems, no matter if these subsystems are CDC-enabled or not.

If you run the stored procedure on an accelerator-shadow table that is in the process of being fully reloaded, the statistics are collected on the previous version of the table. That is, after the completion of the reload, the statistics of the newest version of the table might not be up-to-date. Therefore, complete a running reload before you run this stored procedure.

If you specify more than a single table as input for this stored procedure, the RUNSTATS utility is run in parallel. That is, a single execution thread is started for each table in the set. This might result in a high demand for CPU resources and system memory. To collect statistics on a large number of tables, it is therefore better to run the stored procedure several times in a sequence.

The SYSPROC.ACCEL_COLLECT_TABLE_STATISTICS stored procedure works synchronously. This means the procedure does not return results to the caller before all specified tables have been processed. This allows you to include this stored procedure in extract, transform, and load (ETL) processes. For example:

  1. Insert rows into an accelerator-only table A.
  2. Collect statistics on this table.
  3. Run a second INSERT statement that selects rows in accelerator-only table A in order to insert these rows into accelerator-only table B.

Important usage notes

In most cases, it is not necessary to run this stored procedure because the accelerator automatically collects statistics when certain events occur:

early stats
The collection starts immediately after an initial table load.
copy stats
Statistics are copied from the initially loaded table during a full table reload.
auto stats
The collection is initiated by a daemon of the database engine that runs every two hours to monitor table changes. New statistics are collected on tables with many changes.
fabricated statistics
These statistics are generated during query execution if statistics are missing.

Generally, statistics should only be collected manually if tables were modified considerably, that is, if many updates occurred or if a significant amount of data was deleted or inserted. See the following list of occasions on which a manual collection of statistics might be useful:

  • After you changed an accelerator-only table by INSERT, UPDATE, or DELETE operations and before you reference the table in a query.
  • If tables were updated by the IBM Integrated Synchronization function because many INSERT, UPDATE or DELETE operations had to be synchronized, and you want to collect new statistics before any of the affected tables is referenced in a query.
  • If you have opened a service request and IBM support asks you to collect statistics.
Important: A statistics collection requires considerable processing and memory resources. In addition, the catalog of the database engine must be updated, which might cause concurrency issues.
  • Do not start too many statistics collections at the same time, as these might take resources away from other processes, such as queries or table load operations.
  • Start a statistics collection only if the current statistics are outdated and if the performance is unsatisfactory, that is, if it takes comparably long to obtain results from the affected tables when these are referenced in queries.

For more information, see Table Statistics and IBM Db2 Analytics Accelerator for z/OS.

Syntax

CALL SYSPROC.ACCEL_COLLECT_TABLE_STATISTICS
(accelerator_name,
table_statistics_specification,
message);

Options description

accelerator_name
The unique name of the accelerator. This accelerator must have been defined by the SYSPROC.ACCEL_ADD_ACCELERATOR2 stored procedure.
table_statistics_specification
An XML input string that lists the tables you want to collect statistics on. A table name must not occur more than once in the input string.

The XML string must conform to the structure of the tableSetForStatisticsCollection element in the SAQTSAMP(AQTSXSD1) data set. For information on how to specify the input XML string, follow the link to the element content in the appendix at the end of this topic.

Example:

 <?xml version="1.0" encoding="UTF-8"?>
<dwa:tableSetForStatisticsCollection 
 xmlns:dwa="http://www.ibm.com/xmlns/prod/dwa/2011" version="1.0">
    <table name="NATION" schema="TPCH"/>
    <table name="REGION" schema="TPCH"/>
 </dwa:tableSetForStatisticsCollection>

The <passThrough> element is an optional element you can specify inside a <table> </table> block. You should only use it when IBM support advises it. The <passThrough> element allows you to add command options to the RUNTSTATS statement that is finally executed.

With these command options, you can focus on specific table columns or column groups. You can select the columns or column groups that you want to collect statistics or distribution metrics on. You can also determine the sampling type as well as set, update, or reference a statistics profile.

For details, see the IBM Documentation topic on the RUNSTATS command.

Example:

<table schema="TPCH" name="NATION">
    <passThrough>
       WITH DISTRIBUTION AND SAMPLED DETAILED INDEXES ALL SET PROFILE ONLY
    </passThrough>
</table>

This results in the execution of the following RUNSTATS command on the accelerator:

RUNSTATS ON TABLE "DWA<location>"."NATION..." 
WITH DISTRIBUTION AND SAMPLED DETAILED INDEXES
ALL SET PROFILE ONLY
Note: The syntax of the command options used within the <passThrough> element might change over time or vary with different product versions. A particular syntax you have used in the past might not be supported anymore in the future.
message
For the description, follow the appropriate link under Related reference at the end of this topic.

Result sets

The stored procedure always returns two result sets:

First result set (SP_TRACE_CURSOR)
Depending on the trace configuration in the message input parameter, this result set is empty or contains trace information about the stored procedure execution. It is identified by the cursor SP_TRACE_CURSOR and contains the following columns:
  • SEQID of type INTEGER
  • TRACEDATA of type VARBINARY, with a maximum length of 32698 characters

The information in the TRACEDATA column is encoded in UTF-8. It is intended for analysis by IBM support. If the length of a value exceeds 32698 characters, it is split into multiple result set rows. The rows are concatenated in ascending order of SEQID.

Second result set (MESSAGES_CURSOR)
This result set contains an XML string that contains the same messages as the MESSAGE output parameter. In contrast to the MESSAGE output parameter, the result set does not have a 64 KB size limitation. Therefore, it always contains the whole set of <message> elements (no truncation). The structure of the XML string conforms to that of the messageOutput element in the SAQTSAMP(AQTSXSD1) data set. The result set is identified by a cursor named MESSAGES_CURSOR and contains the following columns:
  • SEQID of type INTEGER
  • MESSAGES of type VARBINARY, with a maximum length of 32698 characters

The rows in the result set are concatenated in ascending order of the values in the SEQID column. The information in the MESSAGES column is encoded in UTF-8.

Prerequisites

  • For the user ID that calls the stored procedures on z/OS, you must define an OMVS segment in the Resource Access Control Facility (RACF®).
  • The accelerator_name parameter must specify an accelerator name that is listed in the catalog tables of the product.
  • The tables must exist on the accelerator that the accelerator_name parameter specifies.

Authorizations for z/OS

On z/OS, the user ID under which this stored procedure is run must have the following privileges:

  • EXECUTE on the stored procedure
  • MONITOR1 (needed so that DSNWLIR can be used to run Db2 commands via the Db2 Instrumentation Facility Interface (IFI)).
  • Authorization to run ADMIN_INFO_SYSPARM so that ZPARMs can be retrieved.
  • RACF ACCESS(READ) on the data set that contains the AQTENV file in the started task procedure of the Workload Manager (WLM) environment.
  • RACF ACCESS(READ) on the data set that contains the AQTDEF6 file in the started task procedure of the Workload Manager (WLM) environment.
  • EXECUTE on the DSNADM.DSNADMIZ package to allow access to system parameters when the SYSPROC.ADMIN_INFO_SYSPARM stored procedure is called.

Accessed data and systems

The SYSPROC.ACCEL_COLLECT_TABLE_STATISTICS stored procedure does not modify data in Db2 for z/OS. On an accelerator, it updates the runtime statistics of accelerator-shadow tables and accelerator-only tables.