Adding fields

Define fields that you want to extract from your document types.

About this task

When you create a new document type in your project, there are initially no fields. The Add fields tab is where you define the fields that you want to extract. Document Processing also provides three example, out-of-the-box document types: Bill of Lading, Invoice, and Utility Bill. Relevant fields are already defined for these document types, which you can modify as you need.

Procedure

  1. To identify the fields that you need, click View a sample to see sample documents for your document type. It is a good practice to create a list of all the fields that you need.
  2. After you identify a field that you want to add, close the sample and click Add fields to define the new field.
  3. Enter a display name for the field by using any Unicode characters from any language. The name might contain blank characters.
  4. Enter a symbolic name for the field, which is used to identify the field in the code and can only contain alphanumeric characters, and no spaces. You cannot change this symbolic name.
  5. Select a value format. This reflects how the field is displayed in your sample document. You can select among the following formats:
    • Text: This is the default format, and most fields are of type text. Names, dates, addresses, numbers and currency amounts are all types of text, as well as composite fields.
    • Barcode: The data is represented as a barcode or QR code that the system can extract data from.
    • Table: The data is rendered in a table with columns and rows. A table can also have summary fields and additional fields.
    • Checkbox: The data is rendered as a graphic box that is checked or unchecked. A checkbox is always a Boolean field type.
    • Signature presence: Indicates a Boolean field that can contain a signature. If a signature is present, the value is true. If the field is empty, the value is false.
    Note: A field can have only one value format, except for text and barcode. A field can support both the text and barcode format, which means that your data might be rendered as text in some samples, while in other samples it is rendered as a barcode.
  6. Select the field type.
    The field type defines the data type that is used to hold the field while the document is processed. Multiple field types are provided by Document Processing, but you can also define your own custom field types. Depending on the value format that you selected, the available field types in the drop-down list might be restricted.
    • Text and barcode value formats can be mapped to any field type except for table.
    • A table value format must be mapped to a table field type.
    • Checkbox and Signature presence can be mapped only to the sys:Boolean field type, or a custom field type that inherits from sys:Boolean.
  7. Enter alternative names, or aliases for your field. In a document, the same field can be identified by different names, case, or phrasing, for example Purchase order number, PO number, and PO#. In the Alias section, you can add any alternative name that might come up for your field.

    You can also use aliases to identify ambiguous fields and values that might relate to different contexts, for example if there are different phone numbers in your document. In that case you can state the context and the field by following this format:

    Context(exact words in the documents)||ambiguous field name

    Document example Ambiguous fields
    screenshot of a document with ambiguous fields in different contexts

    In this screenshot, you can define some of the ambiguous fields as follows.

    Shipper phone number (409-299-3466):
    • Field name: Shipper Phone
    • Alias: SHIPPER (complete name & address)||Phone
    Notify party phone number (084-629-1414):
    • Field name: Notify Party Phone
    • Alias: NOTIFY PARTY (complete name & address)||Phone
  8. Select whether that field is required.

    If you select Yes, this field is required, an error occurs when the value is missing for that field at processing time.

  9. Select whether the value contains sensitive information.

    If you select Yes, this field is sensitive, this field is marked so that it can be handled in different ways by your applications. Inside of Document Processing, the setting does not change how the data is handled.

  10. Select Link to table data if you want to link this field to table summary data, then select the existing summary field from the drop-down list. The field data can sometimes be displayed inside or outside of the table, or in both places. If you select this option, the extraction model considers values found inside or outside the table as potential matches to this field.
  11. Click Add field to create the field.

What to do next

Repeat this procedure to add all the fields that you need for your document type. You can edit those fields later, add more fields, or delete fields.

Table fields and composite fields require special attention. For more information, see the following topics about these specific field types.

Enrichments are optional and can be added now or later. It is often better to do this later, after you have seen your extractions results and decide which fields need formatting and what criteria you want to use to validate the extracted data. For more information, see the following topic about adding enrichments to a field.

The next step is annotating sample documents. Select Teach the model or click Next to annotate sample documents.