Data preparation steps

Data preparation starts at the end of the data understanding phase when the relevant data is understood and its content is known.
This data is usually not ready for immediate analysis for the following reasons:
Often, the task of this data reorganization is called data preparation. Data preparation consists of the following major steps:
Defining a data preparation input model
The first step is to define a data preparation input model. This means to localize and relate the relevant data in the database. This task is usually performed by a database administrator (DBA) or a data warehouse administrator, because it requires knowledge about the database model.

In this step, the DBA defines semantic concepts such as hierarchies. The relevant tables are joined so that the data transformation tasks can be defined by using these semantic concepts.

If an OLAP model in form of a Cubing Services model is available, this step can be skipped because the cube model can be imported as input model.

Defining a data preparation profile
The second step is to define a data preparation profile. This means to determine the focus of analysis and to specify the relevant properties that are to be computed by the data transformation. Because the profile definition can be based on the semantic concepts that are defined in the previous step, it can easily be performed by the mining analyst.

At the end of this step, a single logical table is defined. This logical table is the starting point for subsequent data mining analysis. You can create this table by generating a data flow or an SQL script. The resulting table of the data flow or the SQL script is then used as table source in a mining flow.

Figure 1. Data preparation overview
The picture above illustrates the different phases of the data mining process


Feedback | Information roadmap