Creating Custom PEAR Files for Use with Lexical Analysis Streams

You can use IBM® Watson™ Explorer Content Analytics Studio, which is included in the Analytical Components of Watson Explorer, to create additional custom PEAR files that can be used to index Lexical Analysis language streams.

About this task

These instructions walk you through the steps of creating a PEAR file that can be used in a Lexical Analysis language stream. The PEAR file will use a built-in dictionary. If you need to create a custom dictionary, see Creating a Custom PEAR File with a Custom Dictionary.

Note: PEAR files for 17 languages are already included with the Watson Explorer Foundational Components. See Lexical Analysis Streams for the list and additional information.

See UIMA Software Development Kit for additional information about creating PEAR files.

Procedure

  1. Enable PEAR export in Content Analytics Studio
    1. From the main menu, select Window > Preferences.
    2. In the Preferences tree view, select General > Capabilities.
    3. Click the Advanced button on the Capabilities pane.
    4. In the Advanced Capabilities Settings dialog, under Miscellaneous, select the Export ICA Studio UIMA Pipeline as UIMA Pear check box.
    5. Click OK, then OK again.
  2. Create a new Content Analytics Studio project
    1. From the main menu, select File > New > ICA Studio Project. Name your project.
    2. Enter the Default UIMA Type prefix, which will be the package name for some of the PEAR file Java artifacts. It can be any Java package name, but avoid the prefixes com.ibm and org.apache.
    3. Click Finish to create the project.
    4. On the Studio Explorer tab, navigate to your project name and open the Configuration folder. Right-click the Annotators folder and select New > UIMA Pipeline Configuration.
    5. Add a file name (typically projectname.annoconfig). Click Finish.
    6. On the Studio Explorer tab, double-click the Configuration > Annotators > projectname.annoconfig file. The new project will display four UIMA Pipeline Stages. Two of those stages, Lexical Analysis and Parsing Rules, will display an error because they are not yet configured.
    7. Delete the Parsing Rules stage. It is not used by Watson Explorer Engine.
    8. Select the Document Language stage. Select the Manually specify the document language radio button. Choose the language from the drop-down.
    9. Select the Lexical Analysis stage. Choose the same language from the list and click the Built In button.
    10. Save the changes you made to the UIMA pipeline configuration.
  3. Export the PEAR file
    1. On the Studio Explorer tab, right-click Configuration > Annotators > projectname.annoconfig and select Export.
    2. In the Export dialog, under ICA Studio, select UIMA Pipeline as UIMA PEAR from the list. Click Next.
    3. Choose a folder and a name for your file and click Save.
    4. Click Finish. You do not need to specify index fields and facets because this information is not used by Watson Explorer Engine.