IBM SPSS Statistics Data Files
IBM® SPSS® Statistics data files are files specifically formatted for use by IBM SPSS Statistics, containing both data and the metadata (dictionary) that define the data.
- To save the active dataset in IBM SPSS Statistics format, use SAVE or XSAVE. On most operating systems,
the default extension of a saved IBM SPSS Statistics data file is .sav. IBM SPSS Statistics data files can also be matrix files
created with the
MATRIX=OUT
subcommand on procedures that write matrices. - To open IBM SPSS Statistics data files, use GET.
IBM SPSS Statistics Data File Structure
The basic structure of IBM SPSS Statistics data files is similar to a database table:
- Rows (records) are cases. Each row represents a case or an observation. For example, each individual respondent to a questionnaire is a case.
- Columns (fields) are variables. Each column represents a variable or characteristic that is being measured. For example, each item on a questionnaire is a variable.
IBM SPSS Statistics data files also contain metadata that describes and defines the data contained in the file. This descriptive information is called the dictionary. The information contained in the dictionary includes:
- Variable names and descriptive variable labels (VARIABLE LABELS command).
- Descriptive values labels (VALUE LABELS command).
- Missing values definitions (MISSING VALUES command).
- Print and write formats (FORMATS command).
Use DISPLAY DICTIONARY
to
display the dictionary for the active dataset. See the topic DISPLAY for more information. You can also use SYSFILE INFO to display
dictionary information for any IBM SPSS Statistics data file.
Long Variable Names
In some instances, data files with variable names longer than eight bytes require special consideration:
- If you save a data file in portable format (see EXPORT ), variable names that exceed eight bytes are converted to unique eight-character names. For example, mylongrootname1, mylongrootname2, and mylongrootname3 would be converted to mylongro, mylong_2, and mylong_3, respectively.
- When using data files with variable names longer than eight bytes in version 10.x or 11.x, unique, eight-byte versions of variable names are used; however, the original variable names are preserved for use in release 12.0 or later. In releases prior to 10.0, the original long variable names are lost if you save the data file.
- Matrix data files (commonly created
with the
MATRIX OUT
subcommand, available in some procedures) in which the VARNAME_ variable is longer than an eight-byte string cannot be read by releases prior to 12.0.