Associating a system text analysis engine with a collection

If custom text analysis engines are associated with the system, you can select one to use with a collection. Users can then specify semantic queries when searching the collection, and improve the quality and precision of the search results.

About this task

If a system text analysis engine is already associated with this collection, the following actions occur when you associate a different engine:
  • If you select No custom analysis, then all text analysis mappings that you previously defined for the collection are reset. The collection begins using the system default values.
  • If you select the name of a different custom text analysis engine, then all text analysis mappings that you previously defined for the collection are retained. For example, if you change from engine_1 to engine_2, then engine_2 inherits the XML mapping files that you configured for engine_1.

Procedure

To associate a system text analysis engine with a collection:

  1. On the Collections view, expand the collection that you want to configure. In the Parse and Index pane, click Configure > Text processing options.
  2. Click Select a system text analysis engine.
  3. On the Select a System Text Analysis Engine for this Collection page, select the name of the engine that you want to use with this collection.
    If no text analysis engines are available, or if you select No custom analysis, then the parser applies default text analysis rules as it annotates documents and prepares documents for the index.
  4. If your custom processing engine archive was exported from IBM® Watson™ Explorer Content Analytics Studio, specify the common analysis structure (CAS) view that you want to use for custom text analysis.
    By using a separate CAS view, you can avoid potential conflicts between the IBM Watson Explorer Content Analytics Studio linguistic components in your custom annotator and the linguistic components that are built into the provided annotators.
  5. For custom processing engines such as multimedia annotators that need to analyze the original document that is crawled, specify the name of the CAS view to use for the original document content.
    Ensure that this view name matches the view name that is specified in the multimedia annotator or UIMA descriptor XML file.