ROC Curve

Figure 1. ROC curve
ROC curve

The ROC curve gives you a visual display of the sensitivity and specificity for all possible cutoffs in a single plot, which is much cleaner and more powerful than a series of tables. The chart shown here displays two curves, one for the category No and one for the category Yes. Since there are only two categories, the curves are symmetrical about a 45-degree line (not displayed) from the upper left corner of the chart to the lower right.

Note that this chart is based on the combined training and testing samples. To produce an ROC chart for the holdout sample, split the file on the partition variable and run the ROC Curve procedure on the saved predicted pseudo-probabilities.

Figure 2. Area under the curve
Area under the curve

The area under the curve is a numerical summary of the ROC curve, and the values in the table represent, for each category, the probability that the predicted pseudo-probability of being in that category is higher for a randomly chosen case in that category than for a randomly chosen case not in that category. For example, for a randomly selected defaulter and randomly selected non-defaulter, there is a 0.853 probability that the model-predicted pseudo-probability of default will be higher for the defaulter than for the non-defaulter.

While the area under the curve is a useful one-statistic summary of the accuracy of the network, you need to be able to choose a specific criterion by which customers are classified. The predicted-by-observed chart provides a visual start on this process.

Next