Support for IBM Netezza Analytics remote stored procedures
IBM Netezza Analytics provides a set of analytics stored procedures for the IBM PureData® System for Analytics.With this version of the product, it is possible to call a subset of these procedures from a Db2 for z/OS subsystem. IBM Db2 Analytics Accelerator supports those IBM Netezza Analytics stored procedures that are also available in the IBM SPSS® Modeler and that are used by the R functions of the ibmdbR package.
Supported IBM Netezza Analytics stored procedures
- arule, predict_arule
- dectree, grow_dectree, prune_dectree, predict_dectree
- divcluster, predict_divcluster
- export_model_to_tableImportant: This stored procedure is needed for the support of the R programming language.
- glm, predict_glm, print_glmRestriction: You can use the glm stored procedure on Unicode data only.
- impute_data
- iqr
- kmeans, predict_kmeans
- linear_regression, predict_linear_regressionRestriction: You can use the linear_regression stored procedure on Unicode data only.
- median, median_disc
- model_exists, drop_model, drop_all_models, list_models
- moments
- naivebayes, predict_naivebayes
- outliers,
- pmml_model
- quantile, quantile_disc
- quartile, quartile_disc
- random_sample
- regtree, grow_regtree, prune_regtree, predict_regtree
- split_data
- std_norm
- twostep, predict_twostep
For more information about these stored procedures, see the IBM Netezza In-Database Analytics Reference Guide and the IBM Netezza In-Database Analytics Developer's Guide. These PDF files are delivered in a compressed archive called nz-analyticsdoc-v3.2.4.0, which is available for download from IBM Fix Central. You find links to IBM Fix Central, the IBM Netezza Analytics package, and the ibmdbR package at the end of this topic.
Concept of Db2 for z/OS wrapper stored procedures
All supported IBM Netezza Analytics
stored procedures have a corresponding wrapper stored procedure on the Db2 for z/OS side. The wrapper stored
procedures are standardized in such a way that they copy, as far as possible, the parameters and the
structure of the administrative IBM Db2 Analytics Accelerator stored procedures. That
means, for example, that error messages are reported in the
message
output parameter, that tracing is enabled in the
message
input parameter, and that accelerator names are
specified in the accelerator_name
parameter.
Common DDL structure for all wrapper stored procedures
The data definition (DDL) for the wrapper stored procedure looks as follows:
CREATE PROCEDURE <schema-name>.<procedure-name>
(IN ACCELERATOR_NAME VARCHAR(128),
IN PARAMETERS CLOB(65536),
OUT RETURN_VALUE <type>-- optional and can be ommitted
INOUT MESSAGE CLOB(65536))
DYNAMIC RESULT SETS 2
EXTERNAL NAME "AQT02ISP"LANGUAGE C
MODIFIESSQLDATA
PARAMETER STYLE Db2SQL
FENCED
WLM ENVIRONMENT <wlm-env>
ASUTIME NO LIMIT
COLLID SYSACCEL
RUN OPTIONS 'TRAP(ON),HEAP(32K,32K,ANY,KEEP,8K,4K),
POSIX(ON),STACK(128K,128K,ANY,KEEP,512K,128K),
BELOW(4K,4K,FREE),XPLINK(ON),ALL31(ON),
ENVAR("_CEE_ENVFILE_S=DD:AQTENV")'
STAY RESIDENT NO
COMMIT ONRETURN NO
PROGRAM TYPE MAIN
SECURITY USER
PARAMETER CCSID UNICODE;
Common syntax for calling the wrapper stored procedures
CALL PROCEDURE <schema-name>.<procedure-name>
(accelerator_name,
parameters,
return-value,
message)
Common parameters of wrapper stored procedures
accelerator_name
- The unique name of the accelerator. This name must exist in the Db2 for z/OS catalog tables. It must
not be
NULL
. parameters
-
A string specific to the IBM Netezza Analytics stored procedures. It is named
paramString
and contains references to input tables and names of output tables. It might contain additional information that is specific to the wrapped (remote IBM Netezza Analytics) stored procedure. The string is passed to the remote stored procedure after converting the Db2 for z/OS names of input and output tables to corresponding accelerator table names.Input tables are accelerator tables that serve as input to the remote stored procedure. A remote stored procedures might create tables on the accelerator, which must be defined in Db2 for z/OS as accelerator-only tables. The names of these accelerator-only tables are the names of the remote stored procedure's output tables. The schema name of these tables is determined by the CURRENT SCHEMA special register in Db2 if nothing else is specified in the
parameters
string. The tables are created in the database that is specified by the AQT_ANALYTICS_DATABASE environment variable in the AQTENV data set. If this variable is not set, the tables are created in the default database that is defined for CREATE TABLE IN ACCELERATOR statements.The following rules apply with regard to the encoding of string columns in the tables:- You cannot have Netezza Unicode character columns (NCHAR or NVARCHAR) and other Netezza character columns (CHAR or VARCHAR) in the same accelerator-only table. If both types of
character columns are encountered, an error is returned.Exception: If the Netezza database was created as a single-byte Unicode database, the columns contain Unicode data, no matter what their declaration is, and hence no problems arise. The coded-character set identifier (CCSID) of the accelerator-only table is set to UNICODE.
- Likewise, if all string columns in the accelerator-only table are of the type NCHAR or NVARCHAR, the CCSID is set to UNICODE.
- If all string columns in the accelerator-only table are of the type CHAR or VARCHAR, the CCSID is set to ASCII or EBCDIC, depending on the single-byte CCSID that was used to create the Netezza database.
- You cannot have Netezza Unicode character columns (NCHAR or NVARCHAR) and other Netezza character columns (CHAR or VARCHAR) in the same accelerator-only table. If both types of
character columns are encountered, an error is returned.
return_value
-
An optional output parameter that is specific to the remote stored procedure. The following data types are currently allowed:
- SMALLINT
- INTEGER
- BIGINT
- DOUBLE
- REAL
- VARCHAR
- CLOB
The return value can be
NULL
. Follow the link at the end of this topic to see how these Netezza output types are mapped to Db2 for z/OS output types. message
- The
message
parameter works in much the same way as for the IBM Db2 Analytics Accelerator stored procedures. In addition to the general description, which you find if you click the appropriate link under Related reference, also see the specific information that this parameter contains in error cases.
Result sets of wrapper stored procedures
Wrapper stored procedures that are called by IBM Netezza Analytics functions always return two result sets. The first contains information about the execution of the stored procedure. The second is reserved for trace information. Both result sets can be empty.
If trace records have been created by a wrapper stored procedure, they are included in the last result set.
Some IBM Netezza Analytics stored procedures produce text output in addition to the output in their result parameter. This additional output is returned by the wrapper stored procedure in the first result set. If the length of the text exceeds 32698 characters, it is split into multiple result rows. For Db2 for z/OS, this means that the information becomes part of a global temporary table that is created by the following SQL statement:
CREATE GLOBAL TEMPORARY TABLE
DSNAQT.ACCEL_TABLES_DETAILS (
SEQID INTEGER,
TABLES_DETAILS VARCHAR(32698)
)
CCSID UNICODE
The original text output of the IBM Netezza Analytics stored procedure is restored by concatenating the values in the TABLES_DETAILS VARCHAR column in ascending order of SEQID.
CREATE GLOBAL TEMPORARY TABLE
DSNAQT.ACCEL_TRACE_ACCELERATOR (
SEQID INTEGER,
TRACEDATA VARBINARY(32698)
)
CCSID UNICODE
You find the trace information in the TRACEDATA column. It is sorted in ascending order of SEQID.
Authorizations for z/OS
The user ID under which the wrapper stored procedure is invoked must have the following privileges:
- EXECUTE on the wrapper stored procedure
- EXECUTE on the SYSACCEL.* packages
- Authorization to read the accelerator tables specified in the
parameters
string as input tables. - DISPLAY authorization for calling the SYSPROC.ADMIN_COMMAND_Db2(-DIS GROUP) command
- Authorization to create accelerator-only tables in the Db2 for z/OS database that is specified by the AQT_ANALYTICS_DATABASE environment variable or in the default database (in general, these are the specified output tables).
- Write access to the /tmp directory for the user who calls the stored procedure (for tracing)
- RACF® ACCESS(READ) on the data set that contains the AQTENV file in the started task procedure of the Workload Manager (WLM) environment.
Prerequisites
- The specified accelerator has been started and is online.
- The input tables that are specified in the
parameters
string have been defined on the accelerator and enabled for query acceleration. - The specified output tables (accelerator-only tables) must not yet be defined in Db2 for z/OS
- The necessary authorizations exist.
- The IBM Netezza Analytics stored procedures have been enabled on the accelerator.
Accessed data and systems
A wrapper stored procedure is implemented by a common load module on the Db2 side. This common load module completes the following tasks:
- Checks the Db2 authorizations for data (table) access
- Calls the remote IBM Netezza Analytics on the accelerator
- Creates accelerator-only tables
- Returns result information from the IBM Netezza Analytics stored procedure as text
in a result set. Optionally, a result value can be assigned, which is returned by the
return_value
parameter.
Error handling
If an error occurs during the execution of a remote stored procedure, the
message
output parameter contains specific information about the remote stored
procedure call.
Example
The KMEANS remote stored procedure from the IBM Netezza Analytics suite is called to create a model, but the model already exists. In this case, the following information is returned by themessage
output
parameter:<?xml version="1.0" encoding="UTF-8" ?> <dwa:messageOutput xmlns:dwa="http://www.ibm.com/xmlns/prod/dwa/2011" version="1.0"> <message severity="error" reason-code="AQT10050E"> <text> An internal error occurred on the 'AQP74' accelerator: ODBC operation SQLExecDirect failed with code -1 when executing statement: SELECT NZA..KMEANS('model=customer_gzd, intable="TLMQ61"."CUSTOMER-UID_02410003", outtable="BCKE"."CUSTOMER_OUT-ID_56", id=C_CUSTKEY, target=C_NATIONKEY, transform=S, distance=euclidean, k=3, maxiter=5, randseed=12345, idbased=false'): <ODBCDiagnostics> <SQLSTATE>HY000</SQLSTATE> <SQLCODE>46</SQLCODE> <Message>ERROR: NZAUT-0232E: A model with the name '"CUSTOMER_GZD"' already exists so a model being newly added cannot use this name. </Message> </ODBCDiagnostics> </text> <description> This error occurs when an unhandled error was detected on the accelerator. Additional information is returned in the diagnostic message. </description> <action> If this message was returned by the SYSPROC.ACCEL_LOAD_TABLES stored procedure, you might receive the message 04E-00E4007 abend LOC=DSNUT101.DSNUGSDA for those Db2 UNLOAD utilities processes or threads that have already unloaded data to the accelerator. You can ignore this message and delete the corresponding dump files. </action> </message> </dwa:messageOutput>
Example of a KMEANS wrapper stored procedure call
Suppose that a table with the name TLMQ61 has already been created by the following statement:
CREATE TABLE
"TLMQ61"."CUSTOMER" (
"C_CUSTKEY" INTEGER NOT NULL,
"C_NAME" VARCHAR(25) FOR SBCS DATA NOT NULL,
"C_ADDRESS" VARCHAR(40) FOR SBCS DATA NOT NULL,
"C_NATIONKEY" INTEGER NOT NULL,
"C_PHONE" CHAR(15) FOR SBCS DATA NOT NULL,
"C_ACCTBAL" DECIMAL(15 , 2) NOT NULL,
"C_MKTSEGMENT" CHAR(10) FOR SBCS DATA NOT NULL,
"C_COMMENT" VARCHAR(117) FOR SBCS DATA NOT NULL,
CONSTRAINT "CUSTOMER_PK" PRIMARY KEY("C_CUSTKEY")
) CCSID UNICODE;
Suppose further that this table has been defined as an accelerator-shadow table on an accelerator called ACCEL01, loaded, and enabled for query acceleration. In addition, suppose that the accelerator has been started and enabled for the use of IBM Netezza Analytics (for information on how to enable remote stored procedures, follow the Related tasks link at the end of this topicsee Enabling the IBM Netezza Analytics stored procedures on an accelerator in the IBM Db2 Analytics Accelerator for z/OS: Stored Procedures Reference). Finally, suppose that the ID used to call the INZA.KMEANS remote stored procedure has all required authorizations.
If all these preconditions are met, you can call the remote stored procedure by the following statement:
CALL INZA.KMEANS (
'ACCEL01',
'model=customer_gzd, intable=TLMQ61.CUSTOMER, outtable=customer_out,
id=C_CUSTKEY, target=C_NATIONKEY, transform=S,
distance=euclidean, k=3, maxiter=5,
randseed=12345, idbased=false',
?,
' <?xml version= "1.0" encoding= "UTF-8"?>
<aqttables:messageControl
xmlns:aqttables = "http://www.ibm.com/xmlns/prod/dwa/2011" version = "1.2">
<traceConfig keepTrace="false" forceFlush="true"
location="/tmp/INZAKMeansTableSim16.trc">
<component name = "PROCEDURE" level = "DEBUG" />
</traceConfig>
</aqttables:messageControl> '
)
Result of the example
The INZA.KMEANS stored procedure returns the number of generated clusters in the
return_value
parameter and the XML text in the
message
parameter:
Parameter | Value |
---|---|
return_value |
1 |
message |
|
The requested debug trace information is included in the result set.
Output table cleanup in error cases
If you cancel a remote stored procedure run, or if the remote stored procedure fails before the creation of the output tables (accelerator-only tables) has been committed in Db2 for z/OS, the corresponding tables are automatically removed from the accelerator by the system maintenance daemon.
If the creation of the output tables in Db2 for z/OS has been committed, and the remote stored procedure is still running at the point of the error, DROP TABLE statements are submitted automatically to remove the definitions of the accelerator-only tables from Db2 for z/OS.
However, if the table creation has been committed, but the remote stored procedure is not active anymore (was canceled or ended abnormally), you have to remove the definitions of the accelerator-only tables manually. To find the orphaned tables, run the SYSPROC.ACCEL_GET_TABLES_INFO stored procedure. Specify the list of output tables from the remote stored procedure as input for SYSPROC.ACCEL_GET_TABLES_INFO. For each of the tables that were actually created (as will be shown in the output of SYSPROC.ACCEL_GET_TABLES_INFO), submit a DROP TABLE statement.
Error handling in case of incorrect DDLs
If the provider of an external package has supplied a DDL for the creation of a wrapper stored procedure that does not meet the requirements, a special error message is issued. The message starts with the error code STLSTATE 01H16 and contains additional diagnostic information. For example, if the DDL does not contain a proper definition for the output message parameter, so that error messages cannot be returned, the following information will be returned:
- INVALID NUMBER OF PARAMETERS
- INVALID MESSAGE OUTPUT PARAMETER TYPE
- INVALID MESSAGE OUTPUT PARAMETER LENGTH