IDAX.RANDOM_SAMPLE - Random sampling

Use this stored procedure to create a random sample of a table with a fixed size or a fixed probability. Stratified sampling is also supported.

Authorization

The privileges held by the authorization ID of the statement must include the IDAX_USER role.

Syntax

IDAX.RANDOM_SAMPLE(in parameter_string varchar(32672))

Parameter descriptions

parameter_string
Mandatory one-string parameter that contains pairs of <parameter>=<value> entries that are separated by a comma.
Data type: VARCHAR(32672)
The following list shows the parameter values:
intable
Mandatory.
The name of the input table.
Data type: VARCHAR(256)
outtable
Mandatory.
The output table into which the samples are written.
Data type: VARCHAR(256)
num
Optional.
The number of rows in the sample.
Alias of the size parameter.
If this parameter is specified, the size parameter must not be specified.
If this parameter is specified, the fraction parameter must not be specified.
Min: 0
Max: The number of rows in the input table if the replace parameter is specified as false.
Data type: BIGINT
size
Optional.
The number of rows in the sample.
Alias of the num parameter.
If this parameter is specified, the num parameter must not be specified.
If this parameter is specified, the fraction parameter must not be specified.
Min: 0
Max: The number of rows in the input table if the replace parameter is specified as false.
Data type: BIGINT
fraction
Optional.
The probability whether each of the rows is contained in the sample.
If this parameter is specified, the num parameter and the size parameter must not be specified.
If this parameter is not specified, either the num parameter or the size parameter must be specified.
Min: 0.0
Max: 1.0 if the replace parameter is specified as false.
Default: none
Data type: DOUBLE
by
Optional.
The column that is used to stratify the input table.
If this parameter is specified, stratified sampling is done.
Stratified sampling ensures that each value of the column is represented in the sample with approximately the same percentage as in the original input table.
Default: none
Data type: VARCHAR(ANY)
outsignature
Optional.
The columns of the input table that are to be kept in the sample. The columns are separated by a semi-colon (;).
If this parameter is not specified, all columns are kept in the output table.
Default: none
Data type: VARCHAR(ANY)
outclear
Optional.
A flag that indicates whether the output table is to be overwritten.
Default: false
Data type: BOOLEAN
replace
Optional.
A flag that indicates if sampling is to be done with replacement.
Default: false
Data type: BOOLEAN
randseed
Optional.
The seed of the random function.
Default: random()*1000000.0
Data type: INT4

Returned information

BIGINT the number of records in the sample output table.

Example

call IDAX.RANDOM_SAMPLE('intable=SAMPLES.CUSTOMER_CHURN, num=250, outtable=CC_SAMPLE1');
call IDAX.RANDOM_SAMPLE('intable=SAMPLES.CUSTOMER_CHURN, size=300, outtable=CC_SAMPLE2, replace=T');
call IDAX.RANDOM_SAMPLE('intable=SAMPLES.CUSTOMER_CHURN, fraction=0.4, outtable=CC_SAMPLE3, by=CENSOR');