IBM Support

Displaying the Coefficient of Variation (across cases) in SPSS

Troubleshooting


Problem

I would like SPSS to display the coefficient of variation (CV) for a variable in my active data file. The CV for variable X is the ratio of the standard deviation (SD) of X to the mean of X. I do not see this option under any of the descriptive statistics procedures. I see that there is a COMPUTE function called CFVAR that will compute the CV across a set of variables for each case. However, I want the CV for one variable across cases. How can I request this CV in SPSS?

Resolving The Problem

There are 3 methods for computing the coefficient of variation (CV) across cases, all of which are available in the IBM SPSS Statistics Base module. . Note that the Complex Samples module provides a coefficient of variation in the CSDESCRIPTIVES procedure, but this statistic is actually a ratio of the variable's standard error (of the mean) to its mean.

Method 1: Ratio Statistics


The Ratio Statistics procedure calculates descriptive statistics on the ratio of two variables in your active data. You must specify one variable as the numerator and another variable as the denominator. The CV values that it calculates are the CV for the ratio, i.e. the standard deviation (SD) of the ratio to the mean of the ratio. You can use this procedure to print the CV for a single variable by defining a new variable which is a constant value of 1 and making this constant the denominator while the variable of interest is the numerator.

The following syntax commands define the variable CONST as a constant of value 1, then run Ratio Statistics with HEIGHT as the numerator and CONST as the denominator. The /PRINT subcommand requests the mean centered CV for the ratio, which is also the CV for HEIGHT.

compute const=1.
execute.
ratio statistics height WITH const
/print=mncov.

The Ratio Statistics solution can be run from the menu and dialog system, i.e., the Graphic User Interface (GUI).
The COMPUTE command is available from Transform->Compute Variable. Enter CONST (capitalization not necessary) in the Target Variable box and type 1 into the Numeric Expression box. Then click OK.
Ratio Statistics is available in the menu system at Analyze-Descriptive Statistics->Ratio.
Use the arrows in the main dialog to place your variable of interest (e.g., height) in the Numerator box and Const in the Denominator box.
Click the Statistics button. In the Dispersion area of the Statistics dialog, check the box for "Mean-Centered COV" to get the CV as defined above. Note that a "Median-Centered COV is also available, The median-centered CV is the result of expressing the root mean squares of deviation from the median as a percentage of the median. It may be more robust than the mean-centered CV against the presence of extreme values in your sample. The median-centered CV is requested in syntax by adding the keyword MDCOV to the /PRINT subcommand (capitalization not required).

Method 2: Aggregate and Compute


You could calculate the mean and standard deviation of a numeric column by using the AGGREGATE command with a constant 'break' variable, copying the new variables to the current active data set as constants. COMPUTE commands then create the CV variables from the respective SD and MEAN variables. The Descriptives procedure prints the mean of the CV variables as the CVs for the sample.

compute const=1.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=const
/height_sd weight_sd pulse_sd = SD(height weight pulse)
/height_mean weight_mean pulse_mean = MEAN(height weight pulse).
compute cv_height = height_sd/height_mean.
compute cv_weight = weight_sd/weight_mean .
compute cv_pulse = pulse_sd/pulse_mean .
DESCRIPTIVES VARIABLES=cv_height cv_weight cv_pulse
/STATISTICS=MEAN MIN MAX.

Note that the /BREAK subcommand is not really necessary in the above AGGREGATE command. Current versions of SPSS Statistics treat a constant break, i.e. aggregating over all cases in the sample, as the default. If you wish to calculate the CVs within subgroups, then the group variable(s) would be listed in the /BREAK subcommand. The new variables would be constants within the subgroups. Rather than adding the aggregate variables to the active file, you could add them to a new dataset, where there would be a single record for each break variable value, then compute and print the CV variables in that dataset.
Suppose that your active data file has the dataset name Case_Level. The following syntax illustrates the use of a group variable (JOBCAT) and a separate dataset (Agg_Level) for the computations.

DATASET ACTIVATE case_level.
SORT CASES BY jobcat.
DATASET DECLARE Agg_Level.
AGGREGATE
/OUTFILE='Agg_Level'
/PRESORTED
/BREAK=jobcat
/height_sd weight_sd pulse_sd = SD(height weight pulse)
/height_mean weight_mean pulse_mean = MEAN(height weight pulse).
DATASET ACTIVATE Agg_Level.
compute cv_height = height_sd/height_mean.
compute cv_weight = weight_sd/weight_mean .
compute cv_pulse = pulse_sd/pulse_mean .
MEANS TABLES=cv_height cv_weight cv_pulse BY jobcat
/CELLS=MEAN COUNT.
* return to case level data - You can save the aggregate data set as a file first if you wish.
DATASET ACTIVATE case_level.

Means was used rather than Descriptives to print the CVs because an independent variable can be named. Descriptives would have required a SPLIT FILE structure to report the CVs separately for each jobcat value. Note that the group statistics will be equally weighted in computing the total sample statistics, so that variation in group sizes will lead to different total sample CVs than would be produced by excluding the break variable (or using a constant break variable as in the first example for this method).
Note that the file was sorted by the group variable and that the Aggregate command included the subcommand /PRESORTED. If there are many break variable values, /PRESORTED can reduce the memory requirements of AGGREGATE substantially, but the file must actually be sorted by the break variable for this to work.

SORT CASES can be accessed in the menu system from Data->Sort Cases
AGGREGATE is available from Data->Aggregate. Variables to be aggregated are entered into the Summaries of Variables box once for each summary statistic that you wish to compute. The default function is the mean, but you can change the function for one or more highlighted variables in the Summaries of Variables box by clicking the Function button to change the summary function. You can change the name of a single highlighted variable in that box by clicking the Name and Label button.
The default location for saving the aggregated variables is in the active dataset, as in the first example in this Method 2 section. You can define the name of the new dataset or data file in the choices below that default option.
The COMPUTE command is available from the Transform->Compute Variable menu. Enter the new variable name (such as cv_height) in the Target Variable box and the formula to compute it (such as "height_sd/height_mean", without the quotes) in the Numeric Expression box.
The DESCRIPTIVES procedure is available from Analyze->Descriptive Statistics-Descriptives . Move the variables to be summarized into the Variables box. Click the Options button to check those statistics that you want printed.
The MEANS procedure is available from Analyze->Compare Means->Means. Enter the dependemt variables into the Dependent List and the group variable into the Independent List. Click the Options button to choose the summary statistics that you want. They will be printed for each group and for the total sample.

Method 3: Report Procedure with Composite Functions

You can use the composite functions in the REPORT procedure to get the CV. The following example requests the mean, SD, and CV for each of the 3 variables HEIGHT, WEIGHT, and PULSE.

Report
/FORMAT= CHWRAP(ON) PREVIEW(OFF) CHALIGN(BOTTOM) UNDERSCORE(ON)
ONEBREAKCOLUMN(OFF) CHDSPACE(1) SUMSPACE(0) AUTOMATIC NOLIST
BRKSPACE(0)
PAGE(1) MISSING'.' LENGTH(1, 59) ALIGN(LEFT) TSPACE(1) FTSPACE(1)
/TITLE=
RIGHT 'Page )PAGE'
/VARIABLES
weight (VALUES) (RIGHT) (OFFSET(0)) (8)
height (VALUES) (RIGHT) (OFFSET(0)) (8)
pulse (VALUES) (RIGHT) (OFFSET(0)) (8)
/BREAK (TOTAL) 'Grand Total' (SKIP(1))
/SUMMARY MEAN( weight) SKIP(1) MEAN( height ) MEAN( pulse ) 'Mean'
/SUMMARY STDDEV( weight) STDDEV( height ) STDDEV( pulse ) 'StdDev'
/SUMMARY = DIVIDE ( STDDEV( weight) MEAN( weight) )
(weight (2) ) SKIP(1)
DIVIDE ( STDDEV( height) MEAN( height) )
(height (2) ) SKIP(1)
DIVIDE ( STDDEV( pulse) MEAN( pulse) )
(pulse (2) ) SKIP(1) 'Coefficient of Variation' .


The CVs for WEIGHT, HEIGHT, and PULSE are requested in the third SUMMARY subcommand. For the variable WEIGHT, the code:

DIVIDE ( STDDEV( weight) MEAN( weight) )

performs division of the first aggregate function, STDDEV(weight), over the 2nd aggregate function, MEAN(weight), in its argument list. The following designation, (weight (2)), sets the number of decimal places for CV display to 2. A similar structure is used for the CV of HEIGHT and PULSE. The title for the summary row, "Coefficient of Variation", is added to the end of this Summary subcommand. Check the REPORT command in the current SPSS Syntax Reference Guide for the definition of other keywords that are used in this example. Note that you may want to explore the use of dummy columns for other applications of composite functions, particularly when you want the CV or other composite statistic(s) to be printed beside other summary statistics for the same variable(s).

Composite functions are only available in REPORTS via syntax commands. You can begin to set up the command in the graphic user interface (GUI) if you wish, but you will need to paste the command to a syntax window and edit it there. To start the command through menus, Go to:

Analyze->Reports->Report Summaries in Rows

in the menus, and paste the variable(s) in the Data Columns box. Even if you don't wish to print the mean(s) and SD(s) in the report, you can save some typing in the syntax window if you request both of these statistics in the GUI. Click the Summary button under "Report" and choose 'Mean of values' and Standard deviation. Then click Paste in the main "Report Summaries in Rows" dialog box. The Reports command can then be edited in the syntax window to which it is pasted. In the example above, the command was started in the GUI and then pasted to a syntax window by clicking the Paste button in the "Report Summaries in Rows" dialog. A new SUMMARY subcommand for the CV was added in the syntax editor, copying and pasting some code from the MEAN and STDEV Summary subcommands. The MARGINS keyword was deleted from the FORMATS subcommand to widen the report, but otherwise the formatting specifications that were pasted into the command from the GUI were retained.

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

19184

Document Information

Modified date:
16 April 2020

UID

swg21476679