The Clusters View

In the Clusters view, you can build and explore cluster results found in your text data. Clusters are groupings of concepts generated by clustering algorithms based on how often concepts occur and how often they appear together. The goal of clusters is to group concepts that co-occur together while the goal of categories is to group documents or records based on how the text they contain matches the descriptors (concepts, rules, patterns) for each category.

The more often the concepts within a cluster occur together coupled with the less frequently they occur with other concepts, the better the cluster is at identifying interesting concept relationships. Two concepts co-occur when they both appear (or one of their synonyms or terms appear) in the same document or record. See the topic Analyzing clusters for more information.

You can build clusters and explore them in a set of charts and graphs that could help you uncover relationships among concepts that would otherwise be too time-consuming to find. While you cannot add entire clusters to your categories, you can add the concepts in a cluster to a category through the Cluster Definitions dialog box. See the topic Cluster Definitions for more information.

You can make changes to the settings for clustering to influence the results. See the topic Building Clusters for more information.

Figure 1. Clusters view
Clusters view

The Clusters view is organized into three panes, each of which can be hidden or shown by selecting its name from the View menu. Typically, only the Clusters pane and the Visualization pane are visible.

Clusters Pane

Located on the left side, this pane presents the clusters that were discovered in the text data. You can create clustering results by clicking the Build button. Clusters are formed by a clustering algorithm, which attempts to identify concepts that occur together frequently.

Whenever a new extraction takes place, the cluster results are cleared, and you have to rebuild the clusters to get the latest results. When building the clusters, you can change some settings, such as the maximum number of clusters to create, the maximum number of concepts it can contain, or the maximum number of links with external concepts it can have. See the topic Exploring Clusters for more information.

Visualization Pane

Located in the upper right corner, this pane offers two perspectives on clustering: a Concept Web graph and a Cluster Web graph. If not visible, you can access this pane from the View menu (View > Visualization). Depending on what is selected in the clusters pane, you can view the corresponding interactions between or within clusters. The results are presented in multiple formats:

  • Concept Web. Web graph showing all of the concepts within the selected cluster(s), as well as linked concepts outside the cluster.
  • Cluster Web. Web graph showing the links from the selected cluster(s) to other clusters, as well as any links between those other clusters.
Note: In order to display a Cluster Web graph, you must have already built clusters with external links. External links are links between concept pairs in separate clusters (a concept within one cluster and a concept outside in another cluster). See the topic Cluster Graphs for more information.

Data Pane

The Data pane is located in the lower right corner and is hidden by default. You cannot display any Data pane results from the Clusters pane since these clusters span multiple documents/records, making the data results uninteresting. However, you can see the data corresponding to a selection within the Cluster Definitions dialog box. Depending on what is selected in that dialog box, only the corresponding text appears in the Data pane. Once you make a selection, click the Display & button to populate the Data pane with the documents or records that contain all of the concepts together.

The corresponding documents or records show the concepts highlighted in color to help you easily identify them in the text. You can also hover your mouse over color-coded items to display the concept under which it was extracted and the type to which it was assigned. The Data pane can contain multiple columns but the text field column is always shown. It carries the name of the text field that was used during extraction or a document name if the text data is in many different files. Other columns are available. See the topic The Data Pane for more information.