Defining field types and enrichments

The data in your documents is surfaced as the values for the fields. You can specify details about the fields that facilitate and improve the extraction and storage of your data.

About this task

Fields in documents can contain various different data types:
  • String
  • Decimal
  • Numeric
  • Boolean
  • Date

In addition, some fields are composite fields, where the value of the field is created from multiple pieces of data. An address is an example of a composite field because it can include values for street name, city, postal code, and so on.

The default model for document types provides a number of field types that you can use if they apply to your specific document types. These field types are available as part of the System library. In addition, the samples that you use to train the model add field types to your project. Your field types are available in your <Project> library.

The Enrich tab provides the capability to add new field types to your project library. You can also set up field enrichments to extract the correct data and reformat extracted values that might differ between documents. In the Value settings, you can add the following enrichments to your fields types:
Extractors
You can use a trained extractor, Natural Language Extractor, or regular expressions. The trained extractor is the default extractor that from the machine learning model. You can add extractors to the process, either by specifying an included Natural Language Extractor, or by building your own extractor by using regular expressions.
Formatters
Add a formatter to reformat or clean up your data. For example, a formatter can remove spaces, add decimals, remove characters, and so on. The system provides a set of default formatter, but you can also define your own formatters by specifying a formatter string and optional delimiter.
Converters
Add a converter to ensure that variations on a field value will provide a standardized output. For example, changing all date formats to a single date format, or changing all currencies to a single currency.

In addition to the Value settings, you can add validators to set specific thresholds and requirements for your extracted data to help improve the model's accuracy. Use validators to define criteria for your values that can indicate when a document contains an invalid value that must be checked and resolved by your document processing user.

Procedure

To define field types and enrichments:

  1. From the project overview page, click the Enrich tab.
  2. Select the Field types and enrichments tab.
  3. Select the <Project name> library, then click Create.
  4. Enter a display name for the new field type. You can set this field in any language, using any Unicode character.
  5. Select a value type from the drop-down list.
  6. Optional: Provide a description for the field type.
    This can be useful if there is a specific and recurring situation that this field applies to, for example, a country-specific postal code.
  7. Click Create to add the field type to your library.
  8. Specify the General settings for the field type:
    1. Provide a display name and a symbolic name. The display name can be set in any language, but the symbolic name can only contain alphanumeric characters and no spaces.
    2. Specify whether the field is required.
      If a field is required, any document that does not contain a value for that field is invalid.
    3. Specify whether the field value is sensitive.
      A sensitive value contains personal or private information.
    4. Optionally, specify a description to give more information about this field type.
    5. Provide any other names that this field type might have in the Alias section.
    6. Click Next.
  9. Specify Value settings for the field type:
    1. Confirm your Value type setting.
    2. Optional: Under Value format, click Edit to add extractors, formatters, and converters for the field value.
      Value format Setup
      Extractors Create an extractor:
      1. On the Extractors tab, click Add extractor, then select Create new.
      2. Enter values for the Name and optional Description for your extractor.
      3. In the Regular expression field, add the regular expression for the format that you want.

        For more information on regular expressions, see RegExr.

      Use an existing extractor:
      1. On the Extractors tab, click Add, then select Select existing.
      2. From the available list, choose one of the extractors and specify the relevant parameters.
      Formatters Create a formatter:
      1. On the Formatters tab, click Add formatter, then select Create new.
      2. From the list of types, select the type of formatter that you want to create.
      3. Enter values for the Name, optional Description, and any other parameters that are required for the type of formatter you selected.
      Use an existing formatter:
      1. On the Formatters tab, click Add, then select Select existing.
      2. From the list, select one of the available formatters and specify the additional parameters. For more information, see the full list of Formatters.
      Converters Create a converter:
      1. On the Converters tab, click Add converter, then select Create new.
      2. Select the type of the converter from the drop-down list.
      3. Enter values for the Name and optional Description for your converter.
      4. Specify the information for your converter.

        For example, if your converter type is Map, you use the Add control to add pairs of Original values and Converted values.

      Use an existing converter:
      1. On the Converters tab, click Add, then select Select existing.
      2. From the list, select one of the available converters and specify the additional parameters. For more information, see the full list of Converters.
    3. Optional: Test your enrichments.
      1. From Value Settings > Value format, click Edit.
      2. Select the Extractors, Formatters, or Converters tab, and enter a value in the testing panel.
      3. Click Test. For formatters and converters, the returned results show how the input field value is modified by the enrichment. For extractors, the returned result always shows the same data that was entered, but with the extracted value highlighted. If nothing is highlighted, then the extractor did not extract anything for this field.
        Note: You can test all formatters at once, and all converters at once, but you must test one extractor at a time.
    4. Add validators for your field type.

      Set specific thresholds and requirements for your extracted data to help improve the model's accuracy. You can use validators to surface error messages in your document processing applications based on your extraction needs.

      Some validators are automatically provided by the model (see the full list in Validators). You can also create additional validators.

    5. Optional: Test your validator.
      1. In Value validators, select your validator and enter a value in the testing panel.
      2. Click Test. If your test string matches the validator, the test is successful. If your test string does not match, the results show the error message that you defined.
        Note: You cannot test required value validators and low confidence validators. You must test one validator at a time.
  10. When you have added all the value settings that you need, click Create.