IBM® SPSS® Data Preparation performs advanced techniques to streamline the data preparation stage — delivering faster, more accurate data analysis results. Choose from an automated data preparation procedure for fast results or select other methods to prepare more challenging data sets. Easily identify suspicious or invalid cases, variables and data values. View patterns of missing data, summarize variable distributions and more accurately work with algorithms designed for nominal attributes.
This module is included in the SPSS Professional edition for on premises, and in the base edition for subscription plans.
The Validate Data dialog is used to validate your data. The variables tab shows variables in your file. Start by selecting the variables you want and moving them to the Analysis Variables list.
You can specify basic checks to apply to variables and cases in your file. For example, you can obtain reports that identify variables with a high percentage of missing values or empty cases.
Apply rules to individual variables that identify invalid values — values outside a valid range or missing values. You can also create your own rules, cross-variable rules or apply predefined rules.
Automated data preparation delivers recommendations and allows users to drill in and examine the recommendations.
Manual data preparation is a complex and time-consuming process. When you need results quickly, the ADP procedure helps you detect and correct quality errors and impute missing values in one efficient step. The ADP feature provides an easy-to-understand report with comprehensive recommendations and visualizations to help you determine the right data to use in your analysis.
Perform automatic data checks and help eliminate time-consuming, tedious, manual checks by using the validate data procedure. This procedure enables you to apply rules to perform data checks based on each variable’s measure level — whether categorical or continuous. Then, determine data validity and remove or correct suspicious cases at your discretion prior to analysis.
SPSS Data Preparation includes features like data validation, automated data preparation, optimal binning and identification of unusual cases.
With the optimal binning procedure, you can more accurately use algorithms designed for nominal attributes, such as Naive Bayes and logit models. Optimal binning enables you to bin — or set cut points for — scale variables.
Choose one of these types of optimal binning for preprocessing data prior to model building:
1) Unsupervised: Create bins with equal counts.
2) Supervised: Take the target variable into account to determine cut points. This method is more accurate than unsupervised; however, it is also more computationally intensive.
3) Hybrid approach: Combine the unsupervised and supervised approaches. This method is particularly useful if you have a large amount of distinct values.