IDAX.SPLIT_DATA - Split data into training data and test data

Use this stored procedure to randomly split the input data into two separated subsets, a training data set and a test data set. You can then use the created output tables to build a classification model or a regression model and test its prediction quality.

Note: This feature is available starting from Db2® version 11.5.4.

Authorization

The privileges held by the authorization ID of the statement must include the IDAX_USER role.

Syntax

IDAX.SPLIT_DATA(in parameter_string varchar(32672))

Parameter descriptions

parameter_string

Mandatory one-string parameter that contains pairs of <parameter>=<value> entries that are separated by a comma.

Data type: VARCHAR(32672)

The following list shows the parameter values:

intable: Mandatory.; The name of the input table.; Data type: VARCHAR(256)
traintable: Mandatory.; The output table that contains the fraction of the input records.; Data type: VARCHAR(256)
testtable: Mandatory.; The output table that contains the rest (1-<fraction>) of the input records.; Data type: VARCHAR(256)
id: Mandatory.; The column of the input table that identifies a unique instance ID.; Data type: VARCHAR(128)
fraction: Optional.; The fraction of the data that is to be split.; Min: 0; Max: 1.0; Default: 0.5; Data type: FLOAT
seed: Optional.; The seed of the random function.; Default: random(); Data type: FLOAT

Returned information

FLOAT the number of records in the traintable output table.

Example

CALL IDAX.SPLIT_DATA('intable=CENSUSINCOME, id=ID, traintable=CENSUSINCOME_TRAIN, testtable=CENSUSINCOME_TEST, fraction=0.6, seed=1');

Note: The IDAX.SPLIT_DATA stored procedure is non-deterministic in a MPP environment, regardless of whether or not the seed parameter is fixed.