IBM Support

Classification table in logistic regression

Troubleshooting


Problem

I have used the SPSS Logistic Regression procedure to test a model and found that the model chi-square had a very low significance level. However, the classification table shows that all of the cases were predicted to have values of 0. This seems to suggest that the model was not effective at all. The ratio of cases with 0 on the dependent variable (DV) to cases with 1 was about 10 to 1. Does this relatively low frequency of 1s on the DV affect the predictive capability of the model?

Resolving The Problem

In Logistic Regression, the classification of a case is based on the predicted probability that the case will be an event (the higher value on the dependent variable (DV), as calculated with the current model equation. By default, a case is predicted to be in the event class (say, the 1 in DV coded as 0 and 1) if its predicted probability is equal to at least .5. If the event is rare in the sample, then the predicted probability may be less than .5 for all cases. If the predicted probability of the event ranged from .01 to .49 for the cases that truly did have the event, then all of these cases would still be predicted to be nonevents. You can adjust the cutoff value of the predicted probability to be used in classification. If you set the cutoff to .2, for example, then cases would be classified as an event if the predicted probability equalled or exceeded .2. In the example where predicted probability ranged from .01 to .49 for true event cases, this shift would result in some of those true events (those with predicted probability between .2 and .49) being correctly classified as events. There is a tradeoff in that some of the true nonevents would then be incorrectly classified as events. The overall correct prediction rate may not improve, but the probability of detecting a true event would improve. Where you set the cutoff will depend on the relative importance of the probability of detecting true event cases (sensitivity) and the probability of misclassifying nonevents as events (false positive rate). See Technote #1479847 ("C Statistic and SPSS Logistic Regression") for instructions in saving the predicted probabilities in Logistic Regression and applying these in the ROC (Receiver Operating Characteristic) graph. That resolution is focused on the 'Area under the Curve' statistic provided by the ROC procedure, but the graph and 'Coordinates' table can be helpful in choosing an optimal cutoff.

If you are running Logistic Regression from the menu system, then the classification cutoff is adjusted in the Options dialog for that procedure. Click the Options button in the main Logistic Regression dialog. You will find the "Classification cutoff" box in the lower right quadrant of the Options dialog box. Change the value there from .5 to the cutoff that you prefer. If you are running Logistic Regression from a syntax command, then you can adjust the cutoff by adding the "CUT()" keyword to the /CRITERIA subcommand with the desired cutoff value in the parentheses. For example, the following example requests a classification cutoff of .2.

LOGISTIC REGRESSION y
/METHOD = ENTER x1 x2 x3
/SAVE = PRED
/CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.2) .

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

41021

Document Information

Modified date:
16 April 2020

UID

swg21479726