Case summaries: Statistics
The Statistics dialog provides options for selecting which statistics to include in the current procedure. You can choose one or more subgroup statistics for the variables within each category of each grouping variable. Summary statistics are displayed for each variable across all categories.
- Summary statistics
- The following case summary statistics are available.
- Number of cases (N)
- The number of cases (observations or records).
- First
- Displays the first data value encountered in the data file.
- Last
- Displays the last data value encountered in the data file.
- Sum
- The sum or total of the values, across all cases with nonmissing values.
- Percent of total N
- Percentage of the total number of cases in each category.
- Percent of total sum
- Percentage of the total sum in each category.
- Central tendency
- The following statistics that describe the central location of the
distribution are available.
- Mean (arithmetic)
- A measure of central tendency. The arithmetic average, the sum divided by the number of cases.
- Geometric mean
- The nth root of the product of the data values, where n represents the number of cases.
- Grouped median
- Median that is calculated for data that is coded into groups. For example, with age data, if each value in the 30s is coded 35, each value in the 40s is coded 45, and so on, the grouped median is the median calculated from the coded data.
- Harmonic mean
- Used to estimate an average group size when the sample sizes in the groups are not equal. The harmonic mean is the total number of samples divided by the sum of the reciprocals of the sample sizes.
- Median
- The value above and below which half of the cases fall, the 50th percentile. If there is an even number of cases, the median is the average of the two middle cases when they are sorted in ascending or descending order. The median is a measure of central tendency not sensitive to outlying values (unlike the mean, which can be affected by a few extremely high or low values).
- Dispersion
- The following statistics that measure the amount of variation or spread in
the data are available.
- Minimum
- The smallest value of a numeric variable.
- Maximum
- The largest value of a numeric variable.
- Range
- The difference between the largest and smallest values of a numeric variable, the maximum minus the minimum.
- Standard deviation
- A measure of dispersion around the mean. In a normal distribution, 68% of cases fall within one standard deviation of the mean and 95% of cases fall within two standard deviations. For example, if the mean age is 45, with a standard deviation of 10, 95% of the cases would be between 25 and 65 in a normal distribution.
- Variance
- A measure of dispersion around the mean, equal to the sum of squared deviations from the mean divided by one less than the number of cases. The variance is measured in units that are the square of those of the variable itself.
- Standard error of mean
- A measure of how much the value of the mean may vary from sample to sample taken from the same distribution. It can be used to roughly compare the observed mean to a hypothesized value (that is, you can conclude the two values are different if the ratio of the difference to the standard error is less than -2 or greater than +2).
- Distribution
- The following statistics describe the shape and symmetry of the
distribution.
- Skew
- A measure of the asymmetry of a distribution. The normal distribution is symmetric and has a skewness value of 0. A distribution with a significant positive skewness has a long right tail. A distribution with a significant negative skewness has a long left tail. As a guideline, a skewness value more than twice its standard error is taken to indicate a departure from symmetry.
- Kurtosis
- A measure of the extent to which there are outliers. For a normal distribution, the value of the kurtosis statistic is zero. Positive kurtosis indicates that the data exhibit more extreme outliers than a normal distribution. Negative kurtosis indicates that the data exhibit less extreme outliers than a normal distribution.
- Standard error of kurtosis
- The ratio of kurtosis to its standard error can be used as a test of normality (that is, you can reject normality if the ratio is less than -2 or greater than +2). A large positive value for kurtosis indicates that the tails of the distribution are longer than those of a normal distribution; a negative value for kurtosis indicates shorter tails (becoming like those of a box-shaped uniform distribution).
- Standard error of skewness
- The ratio of skewness to its standard error can be used as a test of normality (that is, you can reject normality if the ratio is less than -2 or greater than +2). A large positive value for skewness indicates a long right tail; an extreme negative value indicates a long left tail.
Specifying statistics for Case summaries
This feature requires Statistics Base Edition.
- From the menus choose:
- In the Case summaries dialog, expand the Additional settings menu and click Statistics.