spssdata.GetDataFromSPSS Function (R)

spssdata.GetDataFromSPSS(variables,cases,row.label,keepUserMissing,missingValueToNA,factorMode, rDate,dateVar,asList,orderedContrast) Retrieves case data from the active dataset and returns it as an R data frame or optionally as a list. All of the arguments are optional. If no arguments are provided and the active dataset does not have split groups, the function retrieves all cases for all variables in the active dataset. If the active dataset has split groups and no arguments are provided, the function retrieves all cases in the first split group. For the default of returning an R data frame, each retrieved case is stored as a row in the data frame. The option of returning the data as a list is most useful when retrieving large datasets since the list structure requires less memory than the data frame structure.

  • The argument variables specifies the set of variables whose case values will be retrieved. The argument can be a character vector or list specifying the variable names, a character string consisting of variable names separated by blanks, or a numeric vector or list of integers specifying the index values of the variables (index values represent position in the dataset, starting with 0 for the first variable in file order). Variable names must match case with the names as they exist in the active dataset's dictionary. If the argument is omitted, all variables will be retrieved.

    When specifying variable names, you can use TO to indicate a range of variables. For example, variables=c("age TO income") specifies age, income, and all variables between them in the active dataset's dictionary. You can also specify a range of values using index values, as in variables=c(2:4), which specifies the variables with index values 2 through 4.

  • The argument cases is an integer specifying the number of cases to retrieve, beginning with the first case in the active dataset. If the argument is omitted and the active dataset has no split groups, all cases will be retrieved. If the active dataset has split groups and the argument is either omitted or greater than or equal to the number of cases in the first split group, then all cases in the first split group are retrieved.
  • The argument row.label specifies a variable from the active dataset whose case values will be the row labels of the resulting data frame. The argument is either the variable name or the index value of the variable (index values represent position in the dataset, starting with 0 for the first variable in file order). If the argument is omitted, the row labels will be the default used by R. The argument has no effect if asList=TRUE.
  • The argument keepUserMissing specifies whether user-missing values should be treated as valid data. The argument is boolean and the default is FALSE, meaning that user-missing values are treated as missing. With keepUserMissing set to FALSE (or omitted), user-missing values of string variables are converted to the R NA value. The handling of missing values of numeric variables depends on the argument missingValueToNA described below.
  • The argument missingValueToNA specifies whether missing values of numeric variables are converted to the R NA value. The argument is boolean and the default is FALSE, which specifies that missing values (system and user) of numeric variables are converted to the R NaN.
  • The argument factorMode specifies whether categorical variables from IBM® SPSS® Statistics(variables with a measurement level of nominal or ordinal) are converted to R factors. The value "none" is the default and specifies that categorical variables are not converted to factors. The value "levels" specifies that categorical variables are converted to factors whose levels are the values that occur in the data. The value "labels" specifies that categorical variables are converted to factors whose levels are the value labels of the variables. Values in the data for which value labels do not exist have a level equal to the value itself. Value labels whose associated value does not occur in the data are included as empty factor levels. Ordinal variables are converted to ordered factors and nominal variables are converted to unordered factors. The levels of the resulting R factor are always sorted in ascending order of the data values, even when factorMode="labels".
  • The argument rDate specifies how variables in dateVar, with date or datetime formats, are converted to R date/time objects. The value "none" is the default and specifies that no conversion will be done. The value "POSIXct" specifies to convert to R POSIXct objects and "POSIXlt" specifies to convert to R POSIXlt objects.
  • The argument dateVar specifies a set of IBM SPSS Statistics variables with date or datetime formats to convert to R date/time objects. The argument supports the same options for specifying variables as described for the variables argument. If the argument is omitted and rDate specifies a POSIXt object, then all variables with date or datetime formats are converted.
  • The argument asList specifies whether the result from GetDataFromSPSS is a list. The argument is boolean with a default of FALSE, which specifies that the result is returned as a data frame. If asList is TRUE the result is a list with an element for each retrieved variable. Setting asList to TRUE is most useful when retrieving large datasets since the list structure requires less memory than the default data frame structure.
  • The argument orderedContrast specifies a contrast function to associate with the ordered factors created from any ordinal variables retrieved from IBM SPSS Statistics. It only applies (to ordinal variables) in the case that factorMode is set to "levels" or "labels". You can specify any valid contrast function, as a quoted string, such as "contr.helmert". The default is "contr.treatment".
  • If the active dataset has split groups, GetDataFromSPSS will only return data from the first split group. To get data from IBM SPSS Statistics datasets with split groups, use the GetSplitDataFromSPSS function.
  • If a weight variable has been defined for the active dataset, then cases with zero, negative, or missing values for the weighting variable are skipped when retrieving data with GetDataFromSPSS.
  • String values are right-padded to the defined width of the string variable.
  • The default value of FALSE for asList results in strings being returned as R factors. Set asList to TRUE if you don't want strings to be returned as factors. Note, however, that with asList set to TRUE, the result from GetDataFromSPSS is a list, not a data frame.
  • Values retrieved from IBM SPSS Statistics variables with time formats are returned as integers representing the number of seconds from midnight.
  • The GetDataFromSPSS function honors case filters specified with the FILTER or USE commands.

Example

DATA LIST FREE /var1 (F2) var2 (A2) var3 (F2) var4 (F2).
BEGIN DATA
11 ab 13 14
21 cd 23 24
31 ef 33 34
END DATA.
BEGIN PROGRAM R.
casedata <- spssdata.GetDataFromSPSS()
print(casedata)
END PROGRAM.

Result


  var1 var2 var3 var4
1   11   ab   13   14
2   21   cd   23   24
3   31   ef   33   34
  • Since the argument row.label was not specified, the row labels are the default provided by R.
  • The column labels of the resulting data frame are the names of the variables retrieved from the active dataset.