IBM SPSS Statistics Data Files

IBM® SPSS® Statistics data files are files specifically formatted for use by IBM SPSS Statistics, containing both data and the metadata (dictionary) that define the data.

  • To save the active dataset in IBM SPSS Statistics format, use SAVE or XSAVE. On most operating systems, the default extension of a saved IBM SPSS Statistics data file is .sav. IBM SPSS Statistics data files can also be matrix files created with the MATRIX=OUT subcommand on procedures that write matrices.
  • To open IBM SPSS Statistics data files, use GET.

IBM SPSS Statistics Data File Structure

The basic structure of IBM SPSS Statistics data files is similar to a database table:

  • Rows (records) are cases. Each row represents a case or an observation. For example, each individual respondent to a questionnaire is a case.
  • Columns (fields) are variables. Each column represents a variable or characteristic that is being measured. For example, each item on a questionnaire is a variable.

IBM SPSS Statistics data files also contain metadata that describes and defines the data contained in the file. This descriptive information is called the dictionary. The information contained in the dictionary includes:

Use DISPLAY DICTIONARY to display the dictionary for the active dataset. See the topic DISPLAY for more information. You can also use SYSFILE INFO to display dictionary information for any IBM SPSS Statistics data file.

Long Variable Names

In some instances, data files with variable names longer than eight bytes require special consideration:

  • If you save a data file in portable format (see EXPORT ), variable names that exceed eight bytes are converted to unique eight-character names. For example, mylongrootname1, mylongrootname2, and mylongrootname3 would be converted to mylongro, mylong_2, and mylong_3, respectively.
  • When using data files with variable names longer than eight bytes in version 10.x or 11.x, unique, eight-byte versions of variable names are used; however, the original variable names are preserved for use in release 12.0 or later. In releases prior to 10.0, the original long variable names are lost if you save the data file.
  • Matrix data files (commonly created with the MATRIX OUT subcommand, available in some procedures) in which the VARNAME_ variable is longer than an eight-byte string cannot be read by releases prior to 12.0.