Background of discretization and moments

Discretization can be unsupervised or supervised. Moments are quantities that describe certain aspects of continuous attribute distributions.

Discretization

The discretization process assigns a discrete value to each interval of a continuous attribute a to create a new discrete attribute a '. The discretization algorithm determines the interval boundaries that should preserve as much useful information as possible from the original attribute. If the data set is used for the creation of a classification model, data set discretization should preserve the relationship between the class and the discretized attributes.

You can use the following kinds of discretization algorithms:

Equal-width discretization
An unsupervised discretization algorithm that uses the equal width criterion for interval bound setting.
Less complex, and therefore computationally more efficient.
Can handle large numbers of attributes.
The quality of the produced discretization intervals is sufficient for several applications.
Identifies the range of the discretized attributes and divides it evenly into a specified number of intervals.
Note: Before you use this algorithm, check the discretized attributes for outliers and remove them if necessary because outliers might extend the range of the attributes. A larger range might result in bad interval data.
Equal-frequency discretization
An unsupervised discretization algorithm that uses the equal frequency (data count) criterion for interval bound setting.
Less complex, and therefore computationally more efficient.
Can handle large numbers of attributes.
The quality of the produced discretization intervals is sufficient for several applications.
Identifies interval bounds in a robust way. Adapts to the actual data distribution by seeking intervals that contain the same number of instances.
Note: When the size of the data set does not divide evenly by the required number of intervals, or when the discretized attributes of several instances have the same values, the results might not be exact. Therefore, you can specify a parameter that modifies the required number of intervals to achieve more uniform interval frequencies.
Minimum-entropy discretization
A supervised discretization algorithm that identifies the most appropriate interval bounds by minimizing class distribution impurity.

Moments

Of particular interest are central moments or moments around the mean.

The kth central moment is the mean of differences between attribute values and the attribute mean that is raised to the power of k.

The 2nd central moment is the variance. The variance measures the dispersion of the distribution.

The 3rd central moment and the 4th central moment are usually used in a standardized form. They are divided by the corresponding power, that is 3 or 4, of the standard deviation.

The 3rd standardized central moment is also called the skewness. It serves as a common distribution asymmetry measure and takes a value of 0 for symmetrical distributions, negative values for a longer left tail, and positive values for a longer right tail.

The 4th standardized central moment is the kurtosis. It serves as a measure of distribution peakedness. Typically, a constant of 3 is subtracted from the 4th standardized central moment in a kurtosis calculation. The result is sometimes referred to as excess kurtosis. This correction makes the kurtosis of a normal distribution equal to 0. It is negative for distributions that are flatter than normal, and it is positive for distributions that are more peaked than normal.