Data Preparation - IBM SPSS Statistics

What SPSS Data Preparation can do for your business

IBM® SPSS® Data Preparation performs advanced techniques to streamline the data preparation stage, delivering faster, more accurate data analysis results.

Choose from an automated data preparation procedure for fast results or select other methods to prepare more challenging datasets.
Identify suspicious or invalid cases, variables and data values.
View patterns of missing data, summarize variable distributions and more accurately work with algorithms designed for nominal attributes.

This module is included in the SPSS Professional edition for on-premises and in the Base edition for subscription plans.

Interactive demo

Try the interactive product tour of SPSS Statistics to see how easily you can extract actionable insights to optimize your decisions.

Explore all product features

Feature spotlights

Variables tab

The "validate data" dialog is used to validate your data. The variables tab shows variables in your file. Start by selecting the variables you want and moving them to the "analysis ariables" list.

Basic checks

You can specify basic checks to apply to variables and cases in your file. For example, you can obtain reports that identify variables with a high percentage of missing values or empty cases.

Standard and custom rules

Apply rules to individual variables that identify invalid values, that is, values outside a valid range or missing values. You can also create your own rules, cross-variable rules or apply predefined rules.

Recommendations

Automated data preparation delivers recommendations and enables users to drill in and examine the recommendations.

Automatically prepare data in a single step

Manual data preparation is a complex and time-consuming process. When you need results quickly, the ADP procedure helps you detect and correct quality errors and impute missing values in one efficient step. The ADP feature provides an easy-to-understand report with comprehensive recommendations and visualizations to help you determine the right data to use in your analysis.

Additional options for data preparation

Perform automatic data checks and help eliminate time-consuming, tedious, manual checks by using the validate data procedure. This procedure enables you to apply rules to perform data checks based on each variable’s measure level, whether categorical or continuous. Then, determine data validity and remove or correct suspicious cases at your discretion before analysis.

Access to a range of features

IBM SPSS Data Preparation includes features including data validation, automated data preparation, optimal binning and identification of unusual cases.

Read the documentation

Bin or set cut points for scale variables

With the optimal binning procedure, you can more accurately use algorithms designed for nominal attributes, such as Naive Bayes and logit models. Optimal binning enables you to bin—or set cut points for—scale variables.

Select from 3 types of optimal binning

Choose one of these types of optimal binning for preprocessing data before model building:

1) Unsupervised: Create bins with equal counts.
2) Supervised: Take the target variable into account to determine cut points. This method is more accurate than unsupervised. However, it is also more computationally intensive.
3) Hybrid approach: Combine the unsupervised and supervised approaches. This method is useful if you have a large amount of distinct values.

Technical details

Software requirements

For on-premises: purchase the Professional edition.
For subscription plans: purchase the Base edition.

See a complete list of software requirements

Hardware requirements

Processor: 2 GHz or faster
Display: 1024*768 or higher
Memory: 4 GB of RAM required, 8 GB of RAM or more recommended
Disk space: 2 GB or more

See a complete list of hardware requirements

Next steps

Experience the intuitive interface and robust features at no cost.

Try SPSS Statistics at no cost

More ways to explore

Explore all product features

Documentation

Community