spss.Dataset Class (Python)

spss.Dataset(name,hidden,cvtDates). Provides the ability to create new datasets, read from existing datasets, and modify existing datasets. A Dataset object provides access to the case data and variable information contained in a dataset, and allows you to read from the dataset, add new cases, modify existing cases, add new variables, and modify properties of existing variables.

An instance of the Dataset class can only be created within a data step or StartProcedure-EndProcedure block, and cannot be used outside of the data step or procedure block in which it was created. Data steps are initiated with the spss.StartDataStep function. You can also use the spss.DataStep class to implicitly start and end a data step without the need to check for pending transformations. See the topic spss.DataStep Class (Python) for more information.

The number of variables in the dataset associated with a Dataset instance is available using the len function, as in:

len(datasetObj)

Note: Datasets that are not required outside of the data step or procedure in which they were accessed or created should be closed prior to ending the data step or procedure in order to free the resources allocated to the dataset. This is accomplished by calling the close method of the Dataset object.

Example: Creating a New Dataset

BEGIN PROGRAM.
import spss
spss.StartDataStep()
datasetObj = spss.Dataset(name=None)
datasetObj.varlist.append('numvar',0)
datasetObj.varlist.append('strvar',1)
datasetObj.varlist['numvar'].label = 'Sample numeric variable'
datasetObj.varlist['strvar'].label = 'Sample string variable'
datasetObj.cases.append([1,'a'])
datasetObj.cases.append([2,'b'])
spss.EndDataStep()
END PROGRAM.

Example: Saving New Datasets

When creating new datasets that you intend to save, you'll want to keep track of the dataset names since the save operation is done outside of the associated data step.

DATA LIST FREE /dept (F2) empid (F4) salary (F6).
BEGIN DATA
7  57  57000 
5  23  40200
3  62  21450
3  18  21900
5  21  45000
5  29  32100
7  38  36000
3  42  21900
7  11  27900
END DATA.
DATASET NAME saldata.
SORT CASES BY dept.
BEGIN PROGRAM.
import spss
with spss.DataStep():
   ds = spss.Dataset()
   # Create a new dataset for each value of the variable 'dept'
   newds = spss.Dataset(name=None)
   newds.varlist.append('dept')
   newds.varlist.append('empid')
   newds.varlist.append('salary')
   dept = ds.cases[0,0][0]
   dsNames = {newds.name:dept} 
   for row in ds.cases:
      if (row[0] != dept):
         newds = spss.Dataset(name=None)
         newds.varlist.append('dept')
         newds.varlist.append('empid')
         newds.varlist.append('salary')
         dept = row[0]
         dsNames[newds.name] = dept
      newds.cases.append(row) 
# Save the new datasets
for name,dept in dsNames.iteritems():
   strdept = str(dept)
   spss.Submit(r"""
   DATASET ACTIVATE %(name)s.
   SAVE OUTFILE='/mydata/saldata_%(strdept)s.sav'.
   """ %locals())
spss.Submit(r"""
DATASET ACTIVATE saldata.
DATASET CLOSE ALL.
""" %locals())
END PROGRAM.

Example: Modifying Case Values

DATA LIST FREE /cust (F2) amt (F5).
BEGIN DATA
210 4500
242 6900
370 32500
END DATA.
BEGIN PROGRAM.
import spss
spss.StartDataStep()
datasetObj = spss.Dataset()
for i in range(len(datasetObj.cases)):
   # Multiply the value of amt by 1.05 for each case
   datasetObj.cases[i,1] = 1.05*datasetObj.cases[i,1][0]
spss.EndDataStep()
END PROGRAM.

See the topic CaseList Class (Python) for more information.

Example: Comparing Datasets

Dataset objects allow you to concurrently work with the case data from multiple datasets. As a simple example, we'll compare the cases in two datasets and indicate identical cases with a new variable added to one of the datasets.

DATA LIST FREE /id (F2) salary (DOLLAR8) jobcat (F1).
BEGIN DATA
1 57000 3
3 40200 1
2 21450 1
END DATA.
SORT CASES BY id.
DATASET NAME empdata1.
DATA LIST FREE /id (F2) salary (DOLLAR8) jobcat (F1).
BEGIN DATA
3 41000 1
1 59280 3
2 21450 1
END DATA.
SORT CASES BY id.
DATASET NAME empdata2.
BEGIN PROGRAM.
import spss
spss.StartDataStep()
datasetObj1 = spss.Dataset(name="empdata1")
datasetObj2 = spss.Dataset(name="empdata2")
nvars = len(datasetObj1)
datasetObj2.varlist.append('match')
for i in range(len(datasetObj1.cases)):
   if datasetObj1.cases[i] == datasetObj2.cases[i,0:nvars]:
      datasetObj2.cases[i,nvars] = 1
   else:
      datasetObj2.cases[i,nvars] = 0
spss.EndDataStep()
END PROGRAM.