IDAX.SPLIT_DATA - Split data into training data and test data

Use this stored procedure to randomly split the input data into two separated subsets, a training data set and a test data set. You can then use the created output tables to build a classification model or a regression model and test its prediction quality.

Note: This feature is available starting from Db2® version 11.5.4.

Authorization

The privileges held by the authorization ID of the statement must include the IDAX_USER role.

Syntax

IDAX.SPLIT_DATA(in parameter_string varchar(32672))

Parameter descriptions

parameter_string
Mandatory one-string parameter that contains pairs of <parameter>=<value> entries that are separated by a comma.
Data type: VARCHAR(32672)
The following list shows the parameter values:
intable
Mandatory.
The name of the input table.
Data type: VARCHAR(256)
traintable
Mandatory.
The output table that contains the fraction of the input records.
Data type: VARCHAR(256)
testtable
Mandatory.
The output table that contains the rest (1-<fraction>) of the input records.
Data type: VARCHAR(256)
id
Mandatory.
The column of the input table that identifies a unique instance ID.
Data type: VARCHAR(128)
fraction
Optional.
The fraction of the data that is to be split.
Min: 0
Max: 1.0
Default: 0.5
Data type: FLOAT
seed
Optional.
The seed of the random function.
Default: random()
Data type: FLOAT

Returned information

FLOAT the number of records in the traintable output table.

Example

CALL IDAX.SPLIT_DATA('intable=CENSUSINCOME, id=ID, traintable=CENSUSINCOME_TRAIN, testtable=CENSUSINCOME_TEST, fraction=0.6, seed=1');
Note: The IDAX.SPLIT_DATA stored procedure is non-deterministic in a MPP environment, regardless of whether or not the seed parameter is fixed.