Naive Bayes

The Naive Bayes model is an old method for classification and predictor selection that is enjoying a renaissance because of its simplicity and stability.

The problems to which the Naive Bayes model is generally applied fall into two broad categories: feature selection and classification.

Feature selection. These are applications in which you choose a subset of predictors from a larger set of variables. Most classification methods do not perform well when there are too many predictors. Since, in practice, many predictors don't contribute to the classification, the preclassification step is to find a subset of predictors that are relevant.

Classification. These are applications in which you use known values of one or more independent variables (or predictors) to estimate the values of a categorical target variable.

The Naive Bayes component is an excellent tool for either of these types of problems, but it is most useful in applications that require feature selection followed by classification—for example:

  • Creators of anti-spam software need to classify incoming e-mail as spam based upon the content of the message. Determining which words are most useful in classifying spam is an important first step in building the final model.
  • Direct marketers want to maximize the probability of response to mailings. In order to create a model to classify households as likely or unlikely respondents, they need to discover the fields in the mailing database that are most useful to the classification.
  • Investigators want to detect fraudulent transactions based upon features in an insurance claims database. Before classifying transactions as fraudulent, they need to determine which features are most important to the classification.

Next