IBM Support

Poisson regression models in SPSS

Troubleshooting


Problem

Can SPSS estimate Poisson regression models?

Resolving The Problem

The GENLIN procedure, available beginning with Release 15 of SPSS, provides a more straightforward way to handle Poisson regression models, and should generally be used instead of GENLOG once it is available to you. An example is provided in the Case Studies in the SPSS Help.

The easiest way to handle Poisson regression models in earlier releases of SPSS is to use the GENLOG procedure, which does general loglinear and logit modeling. The simplest type of Poisson model for our purposes is one in which the counts are modeled without denominators (i.e., we are modeling counts rather than rates), and all predictors are categorical. Rate models and quantitative predictors introduce further complexities into the process, but they can still be handled.

If the simplest form of the model is being used, the variables in the data set consist simply of the predictors and a count variable. If rates are to be modeled, an additional variable representing the denominators is added. If rates are modeled and/or quantitative predictors are to be used, a subject ID variable is also added, with a unique value for each case. The file preparation is completed by weighting the data by the count variable (in the dialog boxes, click on Data->Weight Cases, then click on Weight cases by and move the appropriate variable into the Frequency Variable box). The status bar at the bottom of the screen should have "Weight On" in one of the boxes.

The Poisson option in the GENLOG procedure is the default for Statistics>Loglinear>General. If there are only categorical predictors and no denominator variable, the categorical predictors are selected as factors. The count variable is not used (the data are already weighted by this variable). The model is specified by clicking on the Model button, checking Custom and specifying the desired predictive model (or leaving it as full factorial, if that's the desired model). Continue out of the Model dialog and make any desired specifications in the Options or Save dialogs (printing of parameter estimates is specified in the Options dialog). When finished with these, click OK, or Paste if you want to see the GENLOG command that's been created.

As noted above, if quantitative predictors and/or a denominator variable are used, the situation will be more complicated. The term quantitative predictor here is used to denote a predictor to be treated as is for modeling purposes. That is, you want to use that variable's values directly in the design matrix and produce a single parameter estimate, rather than creating a set of indicator variables to represent the unique levels or categories. Though technically not entirely accurate, the term continuous is commonly used for such a variable.

The complications arise because GENLOG analyzes data on a cell by cell basis, with cells defined by the combinations of the factor variables. Cell covariates are handled by averaging all values in the same cell defined by the factors specified in the main dialog (on the GENLOG command). Thus, information about individual case values will be lost unless each case is treated as a cell. This is why we have to use the subject ID variable when there are cell covariates: to trick GENLOG into seeing each case as a cell. Note that if you have any quantitative/continuous variables, you must treat all predictors that way (as cell covariates). If you use any factors other than the subject ID variable, the table that GENLOG will construct internally will have as many cells as the product of the number of subjects and the number of levels of those factors, and you will not get the desired results. Instead, you need to create dummy or effect coded variables to use as cell covariates to represent contrasts among the levels of the factor(s), just as you would when running the SPSS REGRESSION procedure with categorical predictors.

The inclusion of a denominator variable (which may be referred to elsewhere variously as an exposure variable or as an offset), with or without quantitative predictors, necessitates the use of the ID variable trick in order to force GENLOG to fit the desired rate model to the separate cases in the data file. As long as there are no quantitative predictors, without a denominator variable the loglinear model fitted to the aggregated data produces the same estimates as are produced by modeling the data with cases defined as separate cells (though the goodness of fit statistics will not be the same). However, when a denominator variable is used, this is no longer true.

To fit a model with quantitative predictors and/or a denominator variable, proceed to the main General Loglinear dialog box. Specify your predictors as cell covariates and the subject ID variable as the only factor. If there is a denominator variable, specify it as the cell structure variable. Then go into the Model dialog box and specify a Custom model that includes the desired predictors and any interactions, but not the subject ID. After requesting any other specifications in other dialogs, click OK or Paste. Since you have a subject ID variable specified as a factor that is not being used in the model, a popup warning will tell you about this. Simply click OK.

If you are working with a release prior to SPSS 12 and there are any 0 counts that produce 0 marginals (as any 0 count will when using the subject ID trick), these should be recoded to a small positive number (e.g., 1E-12). ML estimates will still be good to as many decimals as are printed on the output. 0 marginals will cause the procedure to refuse to run. In releases beginning with SPSS 12, this is not necessary

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

16655

Document Information

Modified date:
16 April 2020

UID

swg21476126