spssdata.GetDataFromSPSS Function (R)
spssdata.GetDataFromSPSS(variables,cases,row.label,keepUserMissing,missingValueToNA,factorMode, rDate,dateVar,asList,orderedContrast) Retrieves case data from the active dataset and returns it as an R data frame or optionally as a list. All of the arguments are optional. If no arguments are provided and the active dataset does not have split groups, the function retrieves all cases for all variables in the active dataset. If the active dataset has split groups and no arguments are provided, the function retrieves all cases in the first split group. For the default of returning an R data frame, each retrieved case is stored as a row in the data frame. The option of returning the data as a list is most useful when retrieving large datasets since the list structure requires less memory than the data frame structure.
- The argument variables specifies the set of variables whose
case values will be retrieved. The argument can be a character vector
or list specifying the variable names, a character string consisting
of variable names separated by blanks, or a numeric vector or list
of integers specifying the index values of the variables (index values
represent position in the dataset, starting with 0 for the first variable
in file order). Variable names must match case with the names as they
exist in the active dataset's dictionary. If the argument is omitted,
all variables will be retrieved.
When specifying variable names, you can use
TO
to indicate a range of variables. For example,variables=c("age TO income")
specifies age, income, and all variables between them in the active dataset's dictionary. You can also specify a range of values using index values, as invariables=c(2:4)
, which specifies the variables with index values 2 through 4. - The argument cases is an integer specifying the number of cases to retrieve, beginning with the first case in the active dataset. If the argument is omitted and the active dataset has no split groups, all cases will be retrieved. If the active dataset has split groups and the argument is either omitted or greater than or equal to the number of cases in the first split group, then all cases in the first split group are retrieved.
- The argument row.label specifies a variable from the active
dataset whose case values will be the row labels of the resulting
data frame. The argument is either the variable name or the index
value of the variable (index values represent position in the dataset,
starting with 0 for the first variable in file order). If the argument
is omitted, the row labels will be the default used by R. The argument
has no effect if
asList=TRUE
. - The argument keepUserMissing specifies whether user-missing values should be treated as valid data. The argument is boolean and the default is FALSE, meaning that user-missing values are treated as missing. With keepUserMissing set to FALSE (or omitted), user-missing values of string variables are converted to the R NA value. The handling of missing values of numeric variables depends on the argument missingValueToNA described below.
- The argument missingValueToNA specifies whether missing values of numeric variables are converted to the R NA value. The argument is boolean and the default is FALSE, which specifies that missing values (system and user) of numeric variables are converted to the R NaN.
- The argument factorMode specifies whether categorical variables from IBM® SPSS® Statistics(variables with a
measurement level of nominal or ordinal) are converted to R factors. The value
"none"
is the default and specifies that categorical variables are not converted to factors. The value"levels"
specifies that categorical variables are converted to factors whose levels are the values that occur in the data. The value"labels"
specifies that categorical variables are converted to factors whose levels are the value labels of the variables. Values in the data for which value labels do not exist have a level equal to the value itself. Value labels whose associated value does not occur in the data are included as empty factor levels. Ordinal variables are converted to ordered factors and nominal variables are converted to unordered factors. The levels of the resulting R factor are always sorted in ascending order of the data values, even whenfactorMode="labels"
. - The argument rDate specifies how variables in dateVar,
with date or datetime formats, are converted to R date/time objects.
The value
"none"
is the default and specifies that no conversion will be done. The value"POSIXct"
specifies to convert to RPOSIXct
objects and"POSIXlt"
specifies to convert to RPOSIXlt
objects. - The argument dateVar specifies a set of IBM SPSS Statistics variables with date or datetime formats to convert to R date/time objects. The argument supports the same options for specifying variables as described for the variables argument. If the argument is omitted and rDate specifies a POSIXt object, then all variables with date or datetime formats are converted.
- The argument asList specifies whether the result from
GetDataFromSPSS
is a list. The argument is boolean with a default of FALSE, which specifies that the result is returned as a data frame. If asList is TRUE the result is a list with an element for each retrieved variable. Setting asList to TRUE is most useful when retrieving large datasets since the list structure requires less memory than the default data frame structure. - The argument orderedContrast specifies a contrast function
to associate with the ordered factors created from any ordinal variables
retrieved from IBM SPSS Statistics.
It only applies (to ordinal variables) in the case that factorMode is
set to
"levels"
or"labels"
. You can specify any valid contrast function, as a quoted string, such as"contr.helmert"
. The default is"contr.treatment"
. - If the active dataset has split groups,
GetDataFromSPSS
will only return data from the first split group. To get data from IBM SPSS Statistics datasets with split groups, use theGetSplitDataFromSPSS
function. - If a weight variable has been defined for the active dataset,
then cases with zero, negative, or missing values for the weighting
variable are skipped when retrieving data with
GetDataFromSPSS
. - String values are right-padded to the defined width of the string variable.
- The default value of FALSE for asList results in
strings being returned as R factors. Set asList to TRUE if
you don't want strings to be returned as factors. Note, however, that
with asList set to TRUE, the result from
GetDataFromSPSS
is a list, not a data frame. - Values retrieved from IBM SPSS Statistics variables with time formats are returned as integers representing the number of seconds from midnight.
- The
GetDataFromSPSS
function honors case filters specified with theFILTER
orUSE
commands.
Example
DATA LIST FREE /var1 (F2) var2 (A2) var3 (F2) var4 (F2).
BEGIN DATA
11 ab 13 14
21 cd 23 24
31 ef 33 34
END DATA.
BEGIN PROGRAM R.
casedata <- spssdata.GetDataFromSPSS()
print(casedata)
END PROGRAM.
Result
var1 var2 var3 var4
1 11 ab 13 14
2 21 cd 23 24
3 31 ef 33 34
- Since the argument row.label was not specified, the row labels are the default provided by R.
- The column labels of the resulting data frame are the names of the variables retrieved from the active dataset.