Discriminant analysis

Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant function (or, for more than two groups, a set of discriminant functions) based on linear combinations of the predictor variables that provide the best discrimination between the groups. The functions are generated from a sample of cases for which group membership is known; the functions can then be applied to new cases that have measurements for the predictor variables but have unknown group membership.

Note: The grouping variable can have more than two values. The codes for the grouping variable must be integers, however, and you need to specify their minimum and maximum values. Cases with values outside of these bounds are excluded from the analysis.
Example
On average, people in temperate zone countries consume more calories per day than people in the tropics, and a greater proportion of the people in the temperate zones are city dwellers. A researcher wants to combine this information into a function to determine how well an individual can discriminate between the two groups of countries. The researcher thinks that population size and economic information may also be important. Discriminant analysis allows you to estimate coefficients of the linear discriminant function, which looks like the right side of a multiple linear regression equation. That is, using coefficients a, b, c, and d, the function is:

D = a * climate + b * urban + c * population + d * gross domestic product per capita
If these variables are useful for discriminating between the two climate zones, the values of D will differ for the temperate and tropic countries. If you use a stepwise variable selection method, you may find that you do not need to include all four variables in the function.
Statistics
For each variable: means, standard deviations, univariate ANOVA. For each analysis: Box's M, within-groups correlation matrix, within-groups covariance matrix, separate-groups covariance matrix, total covariance matrix. For each canonical discriminant function: eigenvalue, percentage of variance, canonical correlation, Wilks' lambda, chi-square. For each step: prior probabilities, Fisher's function coefficients, unstandardized function coefficients, Wilks' lambda for each canonical function.

Data considerations

Data
The grouping variable must have a limited number of distinct categories, coded as integers. Independent variables that are nominal must be recoded to dummy or contrast variables.
Assumptions
Cases should be independent. Predictor variables should have a multivariate normal distribution, and within-group variance-covariance matrices should be equal across groups. Group membership is assumed to be mutually exclusive (that is, no case belongs to more than one group) and collectively exhaustive (that is, all cases are members of a group). The procedure is most effective when group membership is a truly categorical variable; if group membership is based on values of a continuous variable (for example, high IQ versus low IQ), consider using linear regression to take advantage of the richer information that is offered by the continuous variable itself.

Obtaining a Discriminant analysis

This feature requires Statistics Base Edition.

  1. From the menus choose:

    Analyze > Classification > Discriminant analysis

  2. Click Select variable under the Dependent variable section and select an integer-valued grouping variable that specifies the categories of interest. Click OK after selecting the variable.
  3. Click the Define range* link next to the dependent variable and specify the minimum and maximum value of the grouping variable for the analysis. Cases with values outside of this range are not used in the discriminant analysis but are classified into one of the existing groups based on the results of the analysis. The minimum and maximum values must be integers. Click OK when done.
  4. Click Select variables under the Independent variables section and select numeric independent (or predictor) variables that best predict the value of the dependent variable. Click OK after selecting the variables.
    Note: If your grouping variable does not have integer values, Transform > Automatic Recode will create a variable that does.
  5. Optionally, you can select the method for entering the independent variables.
    Enter independents together
    Simultaneously enters all independent variables that satisfy tolerance criteria. This is the default setting.
    Use stepwise method
    Uses stepwise analysis to control variable entry and removal. Select the statistic to be used for entering or removing new variables. Available methods include:
    Wilks' lambda
    A variable selection method for stepwise discriminant analysis that chooses variables for entry into the equation on the basis of how much they lower Wilks' lambda. At each step, the variable that minimizes the overall Wilks' lambda is entered.
    Unexplained variance
    At each step, the variable that minimizes the sum of the unexplained variation between groups is entered.
    Mahalanobis distance
    A measure of how much a case's values on the independent variables differ from the average of all cases. A large Mahalanobis distance identifies a case as having extreme values on one or more of the independent variables.
    Smallest F ratio
    A method of variable selection in stepwise analysis based on maximizing an F ratio computed from the Mahalanobis distance between groups.
    Rao's V
    A measure of the differences between group means. Also called the Lawley-Hotelling trace. At each step, the variable that maximizes the increase in Rao's V is entered. After selecting this option, enter the minimum value a variable must have to enter the analysis.
  6. Optionally, click Select variable under the Case selection variable section and select a variable that limits the analysis to a subset of cases that include particular values for the selected variable. Click OK after selecting the variable.
  7. Optionally, you can select the following options from the Additional settings menu:
    • Click Classification to specify constants, stepping methods, and missing value settings.
    • Click Statistics to select which statistics to include in the procedure.
    • Click Plots to specify charting settings.
    • Click Options to select the statistics or values to be used for entering or removing new variables.
    • Click Save to dataset to add predicted group memberships, predicted probabilities of group membership, and/or discriminant function scores.
    • Click Model export to export function coefficients, matrix of functions at group centroids and priors to the specified XML file.
    • Click Bootstrap for deriving robust estimates of standard errors and confidence intervals for estimates such as the mean, median, proportion, odds ratio, correlation coefficient or regression coefficient.

This procedure pastes DISCRIMINANT command syntax.