IBM Support

How do I enter count data for one-sample chi-square tests or chi-square tests of independence for two-way tables?

Question & Answer


Question

I have data values in the form of frequencies or counts and would like to test some hypotheses about the distributions of counts across the levels of a single variable, and/or perform chi-square tests of independence in two-way cross-classifications of variables. What's the easiest way to enter this type of data to perform such tests in SPSS?

Answer

For a single categorical variable with k categories, you need only enter k cases and two variables: one to denote the category of the variable, and another to indicate the number of observations at each level. Using command syntax, you can enter a four-level categorical variable and associated counts and run a one-sample chi-square test of the null hypothesis of equal distribution of counts over categories in the population as:

DATA LIST LIST / a count.
BEGIN DATA.
1 12
2 24
3 36
4 48
END DATA.
WEIGHT BY count.
NPAR TESTS
/CHISQUARE=a.

You can also test against specific unequal proportions in each level. This can be done using command syntax or via the SPSS menus (Analyze>Nonparametric Tests>Chi-Square).

For a two-way table, you simply add an additional variable and have as many cases as the number of cells in the table, again using the WEIGHT command to instruct SPSS to consider each case as representing as many cases as are named on the weighting variable:

DATA LIST LIST / a b count.
BEGIN DATA
1 1 10
1 2 20
1 3 30
2 1 20
2 2 20
2 3 20
3 1 30
3 2 20
3 3 10
END DATA.
WEIGHT BY count.
CROSSTABS a BY b
/STATISTIC=CHISQ.

In the SPSS menus, Analyze>Descriptive Statistics>Crosstabs lets you specify this analysis. You can also enter data directly into the SPSS Data Editor using counts and use Data>Weight Cases to apply the weighting variable. When data are weighted, the status bar in the Data Editor displays "Weight On" in the lower right area.

The principle of entering aggregated or count data using one case or row per combination of variables and a weighting variable extends beyond two-way tables to ones that you would analyze using more complicated loglinear or logit models.

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

70684

Document Information

Modified date:
16 April 2020

UID

swg21478320