target |
field |
In the Random Trees node, models require a single target and one or more input fields. A
frequency field can also be specified. See the topic Common Modeling Node Properties for more information. |
number_of_models |
integer |
Determines the number of models to build as part of the ensemble modeling. |
use_number_of_predictors |
flag |
Determines whether number_of_predictors is used. |
number_of_predictors |
integer |
Specifies the number of predictors to be used when building split models. |
use_stop_rule_for_accuracy |
flag |
Determines whether model building stops when accuracy cannot be improved. |
sample_size |
number |
Reduce this value to improve performance when processing very large datasets. |
handle_imbalanced_data |
flag |
If the target of the model is a particular flag outcome, and the ratio of the desired outcome
to a non-desired outcome is very small, then the data is imbalanced and the bootstrap sampling that
is conducted by the model may affect the model's accuracy. Enable imbalanced data handling so that
the model will capture a larger proportion of the desired outcome and generate a stronger
model. |
use_weighted_sampling |
flag |
When False, variables for each node are randomly selected with the same
probability. When True, variables are weighted and selected accordingly. |
max_node_number |
integer |
Maximum number of nodes allowed in individual trees. If the number would be exceeded on the
next split, tree growth halts. |
max_depth |
integer |
Maximum tree depth before growth halts. |
min_child_node_size |
integer |
Determines the minimum number of records allowed in a child node after the parent node is
split. If a child node would contain fewer records than specified here the parent node will not be
split |
use_costs |
flag |
|
costs |
structured |
Structured property. The format is a list of 3 values: the actual value, the predicted value,
and the cost if that prediction is wrong. For example:
tree.setPropertyValue("costs",
[["drugA", "drugB", 3.0], ["drugX", "drugY", 4.0]]) |
default_cost_increase |
none
linear
square
custom |
Note: only enabled for ordinal targets.
Set default values in the costs
matrix. |
max_pct_missing |
integer |
If the percentage of missing values in any input is greater than the value specified here,
the input is excluded. Minimum 0, maximum 100. |
exclude_single_cat_pct |
integer |
If one category value represents a higher percentage of the records than specified here, the
entire field is excluded from model building. Minimum 1, maximum 99. |
max_category_number |
integer |
If the number of categories in a field exceeds this value, the field is excluded from model
building. Minimum 2. |
min_field_variation |
number |
If the coefficient of variation of a continuous field is smaller than this value, the field
is excluded from model building. |
num_bins |
integer |
Only used if the data is made up of continuous inputs. Set the number of equal frequency bins
to be used for the inputs; options are: 2, 4, 5, 10, 20, 25, 50, or 100. |