The Discretize operator is used if there are too much different values in a column of a virtual table. It maps a range of values (interval) to a new single value. For instance, a table with bank customer data may contain a column with the average balance of the customer's account in a certain time period. Almost all customers will have different values in this column. You might want to consider only a few different ranges of average balances but not the exact values. Then you can use the Discretize operator to generate a new column that contains the information in which range or interval the customers balance falls.
You can discretize only numerical columns. However, the new column with the mapped values ("discretized values") can be of numerical or categorical type.
There are three methods to define the intervals:
Even if you have selected one of the methods that calculate intervals, you can still modify the resulting intervals manually later.
Some mining algorithms discretize input data internally, if necessary. The distribution-based clustering algorithm treats numerical columns almost like categorical columns by categorizing their values into buckets. The Associations mining function and the Sequential Patterns mining function also provide a discretization mechanism for numeric columns: If a numeric column contains more than 20 different values, the value range is automatically divided into buckets.