IDAX.IMPUTE_DATA - Impute missing data

Use this stored procedure to replace missing values in the input data with substituted values.

Note: This feature is available starting from Db2® version 11.5.4.

Authorization

The privileges held by the authorization ID of the statement must include the IDAX_USER role.

Syntax

IDAX.IMPUTE_DATA(in parameter_string varchar(32672))

Parameter descriptions

parameter_string
Mandatory one-string parameter that contains pairs of <parameter>=<value> entries that are separated by a comma.
Data type: VARCHAR(32672)
The following list shows the parameter values:
intable
Mandatory.
The name of the input table.
Data type: VARCHAR(256)
method
Optional.
The method of data imputation.
Allowed values are "mean", "median", "freq", and "replace".
If this parameter is not specified, the method is "median" for the numeric columns, and "freq" for the nominal columns.
If two values have the same frequency, the value that comes first alphabetically is used.
You must not use the methods "mean" and "median" for nominal columns.
Default: none
Data type: VARCHAR(ANY)
outtable
Optional.
The output table that contains the substituted values.
If this parameter is not specified, the missing values are replaced directly in the input table.
Default: none
Data type: VARCHAR(256)
inColumn
Optional.
The columns of the input table where the missing values must be replaced.
These columns are separated by a semi-colon (;).
If this parameter is not specified, all columns of the input data are considered.
Default: none
Data type: VARCHAR(128)
numericValue
Optional.
The numeric value of the replacement when the method is "replace".
Default: -1
Data type: DOUBLE
nominalValue
Optional.
The nominal value of the replacement when the method is "replace".
Default: missing
Data type: VARCHAR(ANY)
view
Optional.
A flag that indicates whether the output object should be stored as a view (true) or as a table (false).
Default: false
Data type: BOOLEAN

Returned information

INTEGER the number of attributes where data was updated.

Example

CREATE TABLE MISS(id int, col float, attr float,data date, class VARCHAR(10));
INSERT INTO MISS VALUES (1,1.4,null,'2017-01-01','white');
INSERT INTO MISS VALUES (2,2.5,random(),null,null);
INSERT INTO MISS VALUES (3,null,random(),'2017-01-01','white');
INSERT INTO MISS VALUES (4,1.4,random(),null,'black');
CALL IDAX.IMPUTE_DATA('intable=MISS, outtable=NOMISS');
CALL IDAX.IMPUTE_DATA('intable=MISS, method=replace,numericValue=0,incolumn=col;attr,outtable=NOMISS2');