Building a clustering model

Clustering models allow you to categorize records into a certain number of clusters. This can help you identify natural groups in your data.

Clustering models focus on identifying groups of similar records and labeling the records according to the group to which they belong. This is done without the benefit of prior knowledge about the groups and their characteristics. In fact, you may not even know exactly how many groups to look for. This is what distinguishes clustering models from the other machine-learning techniques—there is no predefined output or target field for the model to predict. These models are often referred to as unsupervised learning models, since there is no external standard by which to judge the model's classification performance. There are no right or wrong answers for these models. Their value is determined by their ability to capture interesting groupings in the data and provide useful descriptions of those groupings.

Clustering methods are based on measuring distances between records and between clusters. Records are assigned to clusters in a way that tends to minimize the distance between records belonging to the same cluster.

Clustering models are often used to create clusters or segments that are then used as inputs in subsequent analyses. A common example of this is the market segments used by marketers to partition their overall market into homogeneous subgroups. Each segment has special characteristics that affect the success of marketing efforts targeted toward it. If you are using data mining to optimize your marketing strategy, you can usually improve your model significantly by identifying the appropriate segments and using that segment information in your predictive models.

To obtain a clustering model

  1. Specify a data source. This can be any analytical source that records the outcome you want to predict.
  2. Specify optional settings as desired. See the topic Optional model settings for more information.
  3. If desired, click the Data Overview icon to see an overview of the data that will be used to build the current model. See the topic Data overview for more information.
  4. Click Find Clusters.
  5. Optionally, you can add manual clusters. See the topic Using manual clusters for more information.
  6. Optionally, use the Evaluate and Test features to see how the model performs on your sample data.
  7. Save the model before closing the model builder or returning to the application.
  8. Click Use Model, and select the model field you want to use. For example, if you want to use the value predicted by the model as input to a rule, select the field that contains the predictions.