IBM Support

Transforming Variable to Normality for Parametric Statistics

Troubleshooting


Problem

I have a numeric variable which I would like to analyze by parametric statistical procedures (t-test, ANOVA ...). However, I find that the variable does not have a normal distribution. What are some of my options for transforming this variable to normality so that I can run parametric tests upon it?

Resolving The Problem

Some transformation options are offered below. Before using any of these transformations, determine which transformations, if any, are commonly used in your field of research. These transformations are what you should first use.

Check the data for extreme outliers. Double-check that these outliers have been coded correctly. Extreme outliers may be the result of incorrect data entry (or computation). If you find outliers that were created by incorrect data entry, correct them. You will then want to re-test the normality assumption before considering transformations.

The primary attribute for deciding upon a transformation is whether the data is positively skewed (skewed to right, skew > 0) or negatively skewed (skewed to left, skew < 0).

Positively skewed data may be subject to a "floor," where values cannot drop lower (nearly everybody scores near 0% correct on a test). Negatively skewed data may be subject to a "ceiling,"
where values cannot rise higher (nearly everybody scores near 100% correct on a test).

Skewness may also be discerned from the variable's characteristics across groups. If group means are positively correlated with group variances (or standard deviations), the data may be positively skewed. If group means are negatively correlated with group variances, the data may be negatively skewed.

The secondary attribute to consider is whether the variable contains negative values or zero. Many transformations cannot be applied to negative or zero values. In these cases, a constant, such as 1,
is added to the variable before the transformation is applied.

Logarithmic transformation - Use if:

1) Data have positive skew.
2) You suspect an exponential component in the data.
3) Data might be best classified by orders-of-magnitude.
4) Cumulative main effects are multiplicative, rather than additive.

This transformation cannot be performed on non-positive data. The base of the logarithm is essentially arbitrary (results will only differ by a linear, multiplicative factor), though the most common
bases are e, 10, and 2.

COMPUTE NEWVAR = LG10(OLDVAR) .
COMPUTE NEWVAR = LG10(OLDVAR+1) .
COMPUTE NEWVAR = LN(OLDVAR) .
COMPUTE NEWVAR = LN(OLDVAR+1) .

Square Root transformation - Use if:

1) Data have positive skew.
2) Data may be counts or frequencies.
3) Data have many zero's or extremely small values.
4) Data may have a physical (power) component, such as area vs. length.

This transformation cannot be performed on negative data.

COMPUTE NEWVAR = SQRT(OLDVAR) .

Reciprocal transformation - Use if:

1) Data have positive skew.
2) Data may have been originally derived by division, or represents
a ratio.

The variable should not have values close to zero. This transformation cannot be performed on non-positive values.

COMPUTE NEWVAR = 1 / OLDVAR .
COMPUTE NEWVAR = 1 / (OLDVAR+1) .

Exponential transformation - Use if:

1) Data have negative skew.
2) You suspect an underlying logarithmic trend (decay, attrition, survival ...) in the data.

This transformation can be performed on negative numbers. Dependingon the range of values, this transformation is the most powerful in reducing negative skew. The exponential base is not trivial -
it can affect the characteristics of the transformed variable.

COMPUTE NEWVAR = EXP(OLDVAR) .
COMPUTE NEWVAR = 2 ** OLDVAR .

Power transformation - Use if:

1) Data have negative skew.
2) Data may have a physical (power) component, such as area vs. length.

Usually, data is raised to the second power (squared). Other, higher, powers are also possible. The choice of power exponent is not trivial. Try to choose a power that reflects an underlying physical reality. This transformation cannot be performed on negative values.

COMPUTE NEWVAR = OLDVAR ** 2 .
COMPUTE NEWVAR = OLDVAR ** 3 .

Arcsine transformation - Use if:

1) Data are a proportion ranging between 0.0 - 1.0 or percentage from 0 - 100.
2) Most data points are between 0.2 - 0.8 or between 20 and 80 for percentages.

This transformation yields radians (or degrees) whose distribution will be closer to normality.

COMPUTE NEWVAR = ARSIN(OLDVAR) .
*For percentages.
COMPUTE NEWVAR = ARSIN(OLDVAR/100) .

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

33050

Document Information

Modified date:
16 April 2020

UID

swg21479677