Binary logistic regression

Logistic regression is useful for situations in which you want to be able to predict the presence or absence of a characteristic or outcome based on values of a set of predictor variables. It is similar to a linear regression model but is suited to models where the dependent variable is dichotomous. Logistic regression coefficients can be used to estimate odds ratios for each of the independent variables in the model. Logistic regression is applicable to a broader range of research situations than discriminant analysis.

Example
What lifestyle characteristics are risk factors for coronary heart disease (CHD)? Given a sample of patients measured on smoking status, diet, exercise, alcohol use, and CHD status, you could build a model using the four lifestyle variables to predict the presence or absence of CHD in a sample of patients. The model can then be used to derive estimates of the odds ratios for each factor to tell you, for example, how much more likely smokers are to develop CHD than nonsmokers.
Statistics
For each analysis: total cases, selected cases, valid cases. For each categorical variable: parameter coding. For each step: variables entered or removed, iteration history, –2 log-likelihood, goodness of fit, Hosmer-Lemeshow goodness-of-fit statistic, model chi-square, improvement chi-square, classification table, correlations between variables, observed groups and predicted probabilities chart, residual chi-square. For each variable in the equation: coefficient (B), standard error of B, Wald statistic, estimated odds ratio (exp(B)), confidence interval for exp(B), log-likelihood if term removed from model. For each variable not in the equation: score statistic. For each case: observed group, predicted probability, predicted group, residual, standardized residual.
Methods
You can estimate models using block entry of variables or any of the following stepwise methods: forward conditional, forward LR, forward Wald, backward conditional, backward LR, or backward Wald.

Data considerations

Data
The dependent variable should be dichotomous. Independent variables can be interval level or categorical; if categorical, they should be dummy or indicator coded (there is an option in the procedure to recode categorical variables automatically).
Assumptions
Logistic regression does not rely on distributional assumptions in the same sense that discriminant analysis does. However, your solution may be more stable if your predictors have a multivariate normal distribution. Additionally, as with other forms of regression, multicollinearity among the predictors can lead to biased estimates and inflated standard errors. The procedure is most effective when group membership is a truly categorical variable; if group membership is based on values of a continuous variable (for example, "high IQ" versus "low IQ"), you should consider using linear regression to take advantage of the richer information offered by the continuous variable itself.
Related procedures
Use the Scatterplot procedure to screen your data for multicollinearity. If assumptions of multivariate normality and equal variance-covariance matrices are met, you may be able to get a quicker solution using the Discriminant analysis procedure. If all of your predictor variables are categorical, you can also use the Loglinear procedure. If your dependent variable is continuous, use the Linear regression procedure. You can use the ROC curve procedure to plot probabilities saved with the Logistic regression procedure.

Obtaining a binary logistic regression analysis

This feature requires Custom Tables and Advanced Statistics.

  1. From the menus choose:

    Analyze > Association and prediction > Binary logistic regression

  2. Click Select variable under the Dependent variable section and select a single, dichotomous dependent variable. The variable can be numeric or string. Click OK after selecting the variable.
  3. Click Select variables under the Independent variables section and select one or more continuous and/or categorical variables that may have an influence on the dependent variable. Click OK after selecting the variables.
  4. Optionally, click Select variables under the Factors section and select one or more variables that represent potential causes for variation in the dependent variable. Click OK after selecting the variables.
  5. Optionally, click Case selection variable and select a single variable that limits the analysis to a subset of cases having a particular values for the selected variable. Click the Define selection rule* link next to the variable to define the selection rule. For more information, see Binary logistic regression: Define selection rule. Click OK after selecting the variable.
  6. Optionally, you can select the following options from the Additional settings menu:
    • Click Model to specify the effects to be analyzed
    • Click Contrasts to test for differences among the factor variables.
    • Click Statistics to select which statistics to include in the procedure.
    • Click Plots to enable the display a classification plot of the actual/predicted values of the dichotomous dependent variable.
    • Click Options to specify constant, stepwise probability, classification, iteration, memory, and missing value settings.
    • Click Save to dataset to add values predicted by the model, residuals, and related measures to the dataset as new variables.
    • Click Model export to export parameter estimates and their covariance to a specified XML file. The model file information can be applied to other data files for scoring purposes.
    • Click Bootstrap for deriving robust estimates of standard errors and confidence intervals for estimates such as the mean, median, proportion, odds ratio, correlation coefficient or regression coefficient.
  7. Click Run analysis.

This procedure pastes LOGISTIC REGRESSION command syntax.