Use this stored procedure to create a random sample of a table with a fixed size or a
fixed probability. Stratified sampling is also supported.
Authorization
The privileges held by the authorization ID of the statement must include the IDAX_USER role.
Syntax
IDAX.RANDOM_SAMPLE(in parameter_string varchar(32672))
Parameter descriptions
- parameter_string
- Mandatory one-string parameter that contains pairs of
<parameter>=<value> entries that are separated by a comma.
- Data type: VARCHAR(32672)
- The following list shows the parameter values:
-
- intable
- Mandatory.
- The name of the input table.
- Data type: VARCHAR(256)
- outtable
- Mandatory.
- The output table into which the samples are written.
- Data type: VARCHAR(256)
- num
- Optional.
- The number of rows in the sample.
- Alias of the size parameter.
- If this parameter is specified, the size parameter must not be specified.
- If this parameter is specified, the fraction parameter must not be
specified.
- Min: 0
- Max: The number of rows in the input table if the replace parameter is
specified as false.
- Data type: BIGINT
- size
- Optional.
- The number of rows in the sample.
- Alias of the num parameter.
- If this parameter is specified, the num parameter must not be specified.
- If this parameter is specified, the fraction parameter must not be
specified.
- Min: 0
- Max: The number of rows in the input table if the replace parameter is
specified as false.
- Data type: BIGINT
- fraction
- Optional.
- The probability whether each of the rows is contained in the sample.
- If this parameter is specified, the num parameter and the
size parameter must not be specified.
- If this parameter is not specified, either the num parameter or the
size parameter must be specified.
- Min: 0.0
- Max: 1.0 if the replace parameter is specified as false.
- Default: none
- Data type: DOUBLE
- by
- Optional.
- The column that is used to stratify the input table.
- If this parameter is specified, stratified sampling is done.
- Stratified sampling ensures that each value of the column is represented in the sample with
approximately the same percentage as in the original input table.
- Default: none
- Data type: VARCHAR(ANY)
- outsignature
- Optional.
- The columns of the input table that are to be kept in the sample. The columns are separated by a
semi-colon (;).
- If this parameter is not specified, all columns are kept in the output table.
- Default: none
- Data type: VARCHAR(ANY)
- outclear
- Optional.
- A flag that indicates whether the output table is to be overwritten.
- Default: false
- Data type: BOOLEAN
- replace
- Optional.
- A flag that indicates if sampling is to be done with replacement.
- Default: false
- Data type: BOOLEAN
- randseed
- Optional.
- The seed of the random function.
- Default: random()*1000000.0
- Data type: INT4
Returned information
BIGINT the number of records in the sample output table.
Example
call IDAX.RANDOM_SAMPLE('intable=SAMPLES.CUSTOMER_CHURN, num=250, outtable=CC_SAMPLE1');
call IDAX.RANDOM_SAMPLE('intable=SAMPLES.CUSTOMER_CHURN, size=300, outtable=CC_SAMPLE2, replace=T');
call IDAX.RANDOM_SAMPLE('intable=SAMPLES.CUSTOMER_CHURN, fraction=0.4, outtable=CC_SAMPLE3, by=CENSOR');