Support for IBM Netezza Analytics remote stored procedures

IBM Netezza Analytics provides a set of analytics stored procedures for the IBM PureData® System for Analytics.With this version of the product, it is possible to call a subset of these procedures from a Db2 for z/OS subsystem. IBM Db2 Analytics Accelerator supports those IBM Netezza Analytics stored procedures that are also available in the IBM SPSS® Modeler and that are used by the R functions of the ibmdbR package.

Supported IBM Netezza Analytics stored procedures

  • arule, predict_arule
  • dectree, grow_dectree, prune_dectree, predict_dectree
  • divcluster, predict_divcluster
  • export_model_to_table
    Important: This stored procedure is needed for the support of the R programming language.
  • glm, predict_glm, print_glm
    Restriction: You can use the glm stored procedure on Unicode data only.
  • impute_data
  • iqr
  • kmeans, predict_kmeans
  • linear_regression, predict_linear_regression
    Restriction: You can use the linear_regression stored procedure on Unicode data only.
  • median, median_disc
  • model_exists, drop_model, drop_all_models, list_models
  • moments
  • naivebayes, predict_naivebayes
  • outliers,
  • pmml_model
  • quantile, quantile_disc
  • quartile, quartile_disc
  • random_sample
  • regtree, grow_regtree, prune_regtree, predict_regtree
  • split_data
  • std_norm
  • twostep, predict_twostep

For more information about these stored procedures, see the IBM Netezza In-Database Analytics Reference Guide and the IBM Netezza In-Database Analytics Developer's Guide. These PDF files are delivered in a compressed archive called nz-analyticsdoc-v3.2.4.0, which is available for download from IBM Fix Central. You find links to IBM Fix Central, the IBM Netezza Analytics package, and the ibmdbR package at the end of this topic.

Concept of Db2 for z/OS wrapper stored procedures

All supported IBM Netezza Analytics stored procedures have a corresponding wrapper stored procedure on the Db2 for z/OS side. The wrapper stored procedures are standardized in such a way that they copy, as far as possible, the parameters and the structure of the administrative IBM Db2 Analytics Accelerator stored procedures. That means, for example, that error messages are reported in the message output parameter, that tracing is enabled in the message input parameter, and that accelerator names are specified in the accelerator_name parameter.

Common DDL structure for all wrapper stored procedures

The data definition (DDL) for the wrapper stored procedure looks as follows:

CREATE PROCEDURE <schema-name>.<procedure-name>
(IN ACCELERATOR_NAME VARCHAR(128),
 IN PARAMETERS CLOB(65536),
 OUT RETURN_VALUE <type>-- optional and can be ommitted
 INOUT MESSAGE CLOB(65536))
   DYNAMIC RESULT SETS 2
   EXTERNAL NAME "AQT02ISP"LANGUAGE C
   MODIFIESSQLDATA
   PARAMETER STYLE Db2SQL
   FENCED
   WLM ENVIRONMENT <wlm-env>
   ASUTIME NO LIMIT
   COLLID SYSACCEL
   RUN OPTIONS 'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),
      POSIX(ON),STACK(128K,128K,ANY,KEEP,512K,128K),
      BELOW(4K,4K,FREE),XPLINK(ON),ALL31(ON),
      ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
   STAY RESIDENT NO
   COMMIT ONRETURN NO
   PROGRAM TYPE MAIN
   SECURITY USER
   PARAMETER CCSID UNICODE;

Common syntax for calling the wrapper stored procedures

CALL PROCEDURE <schema-name>.<procedure-name>
(accelerator_name,
 parameters,
 return-value,
 message)

Common parameters of wrapper stored procedures

accelerator_name
The unique name of the accelerator. This name must exist in the Db2 for z/OS catalog tables. It must not be NULL.
parameters

A string specific to the IBM Netezza Analytics stored procedures. It is named paramString and contains references to input tables and names of output tables. It might contain additional information that is specific to the wrapped (remote IBM Netezza Analytics) stored procedure. The string is passed to the remote stored procedure after converting the Db2 for z/OS names of input and output tables to corresponding accelerator table names.

Input tables are accelerator tables that serve as input to the remote stored procedure. A remote stored procedures might create tables on the accelerator, which must be defined in Db2 for z/OS as accelerator-only tables. The names of these accelerator-only tables are the names of the remote stored procedure's output tables. The schema name of these tables is determined by the CURRENT SCHEMA special register in Db2 if nothing else is specified in the parameters string. The tables are created in the database that is specified by the AQT_ANALYTICS_DATABASE environment variable in the AQTENV data set. If this variable is not set, the tables are created in the default database that is defined for CREATE TABLE IN ACCELERATOR statements.

The following rules apply with regard to the encoding of string columns in the tables:
  • You cannot have Netezza Unicode character columns (NCHAR or NVARCHAR) and other Netezza character columns (CHAR or VARCHAR) in the same accelerator-only table. If both types of character columns are encountered, an error is returned.
    Exception: If the Netezza database was created as a single-byte Unicode database, the columns contain Unicode data, no matter what their declaration is, and hence no problems arise. The coded-character set identifier (CCSID) of the accelerator-only table is set to UNICODE.
  • Likewise, if all string columns in the accelerator-only table are of the type NCHAR or NVARCHAR, the CCSID is set to UNICODE.
  • If all string columns in the accelerator-only table are of the type CHAR or VARCHAR, the CCSID is set to ASCII or EBCDIC, depending on the single-byte CCSID that was used to create the Netezza database.
return_value

An optional output parameter that is specific to the remote stored procedure. The following data types are currently allowed:

  • SMALLINT
  • INTEGER
  • BIGINT
  • DOUBLE
  • REAL
  • VARCHAR
  • CLOB

The return value can be NULL. Follow the link at the end of this topic to see how these Netezza output types are mapped to Db2 for z/OS output types.

message
The message parameter works in much the same way as for the IBM Db2 Analytics Accelerator stored procedures. In addition to the general description, which you find if you click the appropriate link under Related reference, also see the specific information that this parameter contains in error cases.

Result sets of wrapper stored procedures

Wrapper stored procedures that are called by IBM Netezza Analytics functions always return two result sets. The first contains information about the execution of the stored procedure. The second is reserved for trace information. Both result sets can be empty.

If trace records have been created by a wrapper stored procedure, they are included in the last result set.

Some IBM Netezza Analytics stored procedures produce text output in addition to the output in their result parameter. This additional output is returned by the wrapper stored procedure in the first result set. If the length of the text exceeds 32698 characters, it is split into multiple result rows. For Db2 for z/OS, this means that the information becomes part of a global temporary table that is created by the following SQL statement:

CREATE GLOBAL TEMPORARY TABLE 
DSNAQT.ACCEL_TABLES_DETAILS (
 SEQID INTEGER, 
 TABLES_DETAILS VARCHAR(32698)
)
CCSID UNICODE

The original text output of the IBM Netezza Analytics stored procedure is restored by concatenating the values in the TABLES_DETAILS VARCHAR column in ascending order of SEQID.

The situation is similar for the second (trace) result set. If trace information was requested, the information is stored in a global temporary table that is created by the following SQL statement:
CREATE GLOBAL TEMPORARY TABLE 
DSNAQT.ACCEL_TRACE_ACCELERATOR (
 SEQID INTEGER, 
 TRACEDATA VARBINARY(32698)
)
CCSID UNICODE

You find the trace information in the TRACEDATA column. It is sorted in ascending order of SEQID.

Authorizations for z/OS

The user ID under which the wrapper stored procedure is invoked must have the following privileges:

  • EXECUTE on the wrapper stored procedure
  • EXECUTE on the SYSACCEL.* packages
  • Authorization to read the accelerator tables specified in the parameters string as input tables.
  • DISPLAY authorization for calling the SYSPROC.ADMIN_COMMAND_Db2(-DIS GROUP) command
  • Authorization to create accelerator-only tables in the Db2 for z/OS database that is specified by the AQT_ANALYTICS_DATABASE environment variable or in the default database (in general, these are the specified output tables).
  • Write access to the /tmp directory for the user who calls the stored procedure (for tracing)
  • RACF® ACCESS(READ) on the data set that contains the AQTENV file in the started task procedure of the Workload Manager (WLM) environment.

Prerequisites

  • The specified accelerator has been started and is online.
  • The input tables that are specified in the parameters string have been defined on the accelerator and enabled for query acceleration.
  • The specified output tables (accelerator-only tables) must not yet be defined in Db2 for z/OS
  • The necessary authorizations exist.
  • The IBM Netezza Analytics stored procedures have been enabled on the accelerator.

Accessed data and systems

A wrapper stored procedure is implemented by a common load module on the Db2 side. This common load module completes the following tasks:

  • Checks the Db2 authorizations for data (table) access
  • Calls the remote IBM Netezza Analytics on the accelerator
  • Creates accelerator-only tables
  • Returns result information from the IBM Netezza Analytics stored procedure as text in a result set. Optionally, a result value can be assigned, which is returned by the return_value parameter.

Error handling

If an error occurs during the execution of a remote stored procedure, the message output parameter contains specific information about the remote stored procedure call.

Example

The KMEANS remote stored procedure from the IBM Netezza Analytics suite is called to create a model, but the model already exists. In this case, the following information is returned by the message output parameter:
<?xml version="1.0" encoding="UTF-8" ?>
<dwa:messageOutput 
xmlns:dwa="http://www.ibm.com/xmlns/prod/dwa/2011" version="1.0">
   <message  severity="error" reason-code="AQT10050E">
      <text>
      An internal error occurred on the 'AQP74' accelerator: ODBC
      operation SQLExecDirect  failed with code -1 when executing statement: SELECT
      NZA..KMEANS('model=customer_gzd,  intable="TLMQ61"."CUSTOMER-UID_02410003", 
      outtable="BCKE"."CUSTOMER_OUT-ID_56", id=C_CUSTKEY, target=C_NATIONKEY, 
      transform=S, 
      distance=euclidean, k=3, maxiter=5, randseed=12345, idbased=false'): 
       <ODBCDiagnostics>
         <SQLSTATE>HY000</SQLSTATE> <SQLCODE>46</SQLCODE> 
         <Message>ERROR:   NZAUT-0232E: A model with the name 
          '"CUSTOMER_GZD"' already exists so a model being newly added 
          cannot use this name.
         </Message> 
       </ODBCDiagnostics> 
     </text>
    <description>
      This  error occurs when an unhandled error was detected on the
      accelerator. Additional information is returned in the diagnostic
      message.
    </description>
    <action>
      If this message was  returned by the SYSPROC.ACCEL_LOAD_TABLES
      stored procedure, you might  receive the message 
      04E-00E4007 abend LOC=DSNUT101.DSNUGSDA for
      those Db2 UNLOAD utilities processes or threads that have 
      already unloaded data to the accelerator. You can ignore this message 
      and delete the corresponding dump files.
     </action>
   </message>
</dwa:messageOutput>

Example of a KMEANS wrapper stored procedure call

Suppose that a table with the name TLMQ61 has already been created by the following statement:

CREATE TABLE
        "TLMQ61"."CUSTOMER" (                               
        "C_CUSTKEY" INTEGER NOT NULL,                                
        "C_NAME" VARCHAR(25) FOR SBCS DATA NOT NULL,                                
        "C_ADDRESS" VARCHAR(40) FOR SBCS DATA NOT NULL,                                
        "C_NATIONKEY" INTEGER NOT NULL,                                
        "C_PHONE" CHAR(15) FOR SBCS DATA NOT NULL,                                
        "C_ACCTBAL" DECIMAL(15 , 2) NOT NULL,                                
        "C_MKTSEGMENT" CHAR(10) FOR SBCS DATA NOT NULL,                                
        "C_COMMENT" VARCHAR(117) FOR SBCS DATA NOT NULL,                                
        CONSTRAINT "CUSTOMER_PK" PRIMARY KEY("C_CUSTKEY")
        ) CCSID UNICODE;

Suppose further that this table has been defined as an accelerator-shadow table on an accelerator called ACCEL01, loaded, and enabled for query acceleration. In addition, suppose that the accelerator has been started and enabled for the use of IBM Netezza Analytics (for information on how to enable remote stored procedures, follow the Related tasks link at the end of this topicsee Enabling the IBM Netezza Analytics stored procedures on an accelerator in the IBM Db2 Analytics Accelerator for z/OS: Stored Procedures Reference). Finally, suppose that the ID used to call the INZA.KMEANS remote stored procedure has all required authorizations.

If all these preconditions are met, you can call the remote stored procedure by the following statement:

CALL INZA.KMEANS (
      'ACCEL01', 
      'model=customer_gzd, intable=TLMQ61.CUSTOMER, outtable=customer_out,
      id=C_CUSTKEY, target=C_NATIONKEY, transform=S, 
      distance=euclidean, k=3, maxiter=5,
      randseed=12345, idbased=false',
      ?,
      ' <?xml version= "1.0" encoding= "UTF-8"?>
       <aqttables:messageControl
        xmlns:aqttables = "http://www.ibm.com/xmlns/prod/dwa/2011" version = "1.2">
           <traceConfig keepTrace="false" forceFlush="true" 
            location="/tmp/INZAKMeansTableSim16.trc">
               <component name = "PROCEDURE" level = "DEBUG" />
           </traceConfig>
       </aqttables:messageControl> '
)

Result of the example

The INZA.KMEANS stored procedure returns the number of generated clusters in the return_value parameter and the XML text in the message parameter:

Table 1. Results of INZA.KMEANS
Parameter Value
return_value 1
message
<?xml version="1.0" encoding="UTF-8" ?>
<dwa:messageOutput
 xmlns:dwa="http://www.ibm.com/xmlns/prod/dwa/2011" version="1.0">
    <message severity="informational" reason-code="AQT10000I">
       <text>The operation was completed successfully.</text>
       <description>Success message for the XML MESSAGE output 
        parameter of each stored procedure.
       </description>
       <action></action>
    </message>
</dwa:messageOutput>

The requested debug trace information is included in the result set.

Output table cleanup in error cases

If you cancel a remote stored procedure run, or if the remote stored procedure fails before the creation of the output tables (accelerator-only tables) has been committed in Db2 for z/OS, the corresponding tables are automatically removed from the accelerator by the system maintenance daemon.

If the creation of the output tables in Db2 for z/OS has been committed, and the remote stored procedure is still running at the point of the error, DROP TABLE statements are submitted automatically to remove the definitions of the accelerator-only tables from Db2 for z/OS.

However, if the table creation has been committed, but the remote stored procedure is not active anymore (was canceled or ended abnormally), you have to remove the definitions of the accelerator-only tables manually. To find the orphaned tables, run the SYSPROC.ACCEL_GET_TABLES_INFO stored procedure. Specify the list of output tables from the remote stored procedure as input for SYSPROC.ACCEL_GET_TABLES_INFO. For each of the tables that were actually created (as will be shown in the output of SYSPROC.ACCEL_GET_TABLES_INFO), submit a DROP TABLE statement.

Error handling in case of incorrect DDLs

If the provider of an external package has supplied a DDL for the creation of a wrapper stored procedure that does not meet the requirements, a special error message is issued. The message starts with the error code STLSTATE 01H16 and contains additional diagnostic information. For example, if the DDL does not contain a proper definition for the output message parameter, so that error messages cannot be returned, the following information will be returned:

  • INVALID NUMBER OF PARAMETERS
  • INVALID MESSAGE OUTPUT PARAMETER TYPE
  • INVALID MESSAGE OUTPUT PARAMETER LENGTH