If the column is being used in a Row Generator or Column
Generator stage, you can specify extra details about the mock data
that is generated.
If the column is being used in a Row Generator
or Column Generator stage, this allows you to specify extra details
about the mock data being generated. The exact fields that appear
depend on the data type of the column being generated. They allow
you to specify features of the data being generated.
For example,
integers allow you to specify if values are random or whether they
cycle. If they cycle, you can specify an initial value, an increment,
and a limit. If they are random, you can specify a seed value for
the random number generator, whether to include negative numbers and
a limit.
The following diagram shows the generate options available
for the different data types.
All data types
All
data types other than string have two types of operation, cycle and
random:
- Cycle. The cycle option generates a repeating pattern
of values for a column. It has the following optional dependent properties:
- Increment. The increment value added to produce
the field value in the next output record. The default value is 1
(integer) or 1.0 (float).
- Initial value. is the initial field value (value
of the first output record). The default value is 0.
- Limit. The maximum field value. When the generated
field value is greater than Limit, it wraps back to Initial value.
The default value of Limit is the maximum allowable value for the
field's data type.
You can set these to `part'
to use the partition number (for example, 0, 1, 2, 3 on a four node
system), or `partcount' to use the total number of executing partitions
(for example, 4 on a four node system).
- Random. The random option generates random values
for a field. It has the following optional dependent properties:
- Limit. Maximum generated field value. The default
value of limit is the maximum allowable value for the field's data
type.
- Seed. The seed value for the random number generator
used by the stage for the field. You do not have to specify seed.
By default, the stage uses the same seed value for all fields containing
the random option.
- Signed. Specifies that signed values are generated
for the field (values between -limit and +limit). Otherwise, the operator
creates values between 0 and +limit.
You can limit
and seed to `part' to use the partition number (e.g., 0, 1, 2, 3 on
a four node system), or `partcount' to use the total number of executing
partitions (for example, 4 on a four node system).
Strings
By default
the generator stages initialize all bytes of a string field to the
same alphanumeric character. The stages use the following characters,
in the following order:
abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
For example, the following string with
a length of 5 would produce successive string fields with the values:
aaaaa
bbbbb
ccccc
ddddd
...
After the last character, capital Z,
values wrap back to lowercase a and the cycle repeats.
You can also use the algorithm property to determine
how string values are generated, this has two possible values: cycle
and alphabet.
Decimal
As well
as the Type property, decimal columns have the following properties:
- Percent invalid. The percentage of generated columns
that contain a invalid values. Set to 10% by default.
- Percent zero. The percentage of generated decimal
columns where all bytes of the decimal are set to binary zero (0x00).
Set to 10% by default.
Date
As well
as the Type property, date columns have the following properties:
- Epoch. Use this to specify the earliest generated
date value, in the format yyyy-mm-dd (leading
zeros must be supplied for all parts). The default is 1960-01-01.
- Percent invalid. The percentage of generated columns
that will contain invalid values. Set to 10% by default.
- Use current date. Set this to generate today's date
in this column for every row generated. If you set this all other
properties are ignored.
Time
As well
as the Type property, time columns have the following properties:
- Percent invalid. The percentage of generated columns
that will contain invalid values. Set to 10% by default.
- Scale factor. Specifies a multiplier to the increment
value for time. For example, a scale factor of 60 and an increment
of 1 means the field increments by 60 seconds.
Timestamp
As
well as the Type property, time columns have the following properties:
- Epoch. Use this to specify the earliest generated
date value, in the format yyyy-mm-dd (leading
zeros must be supplied for all parts). The default is 1960-01-01.
- Use current date. Set this to generate today's date
in this column for every row generated. If you set this all other
properties are ignored.
- Percent invalid. The percentage of generated columns
that will contain invalid values. Set to 10% by default.
- Scale factor. Specifies a multiplier to the increment
value for time. For example, a scale factor of 60 and an increment
of 1 means the field increments by 60 seconds.