Define fields that you want to extract from your document types.
About this task
When you create a new document type in your project, there are initially no fields. The
Add fields tab is where you define the fields that you want to extract.
Document Processing also provides
three example, out-of-the-box document types: Bill of Lading, Invoice, and Utility Bill. Relevant
fields are already defined for these document types, which you can modify as you need.
Procedure
-
To identify the fields that you need, click View a sample to see sample
documents for your document type. It is a good practice to create a list of all the fields that you
need.
- After you identify a field that you want to add, close the sample and click
Add fields to define the new field.
- Enter a display name for the field by using any Unicode characters from any language. The
name might contain blank characters.
- Enter a symbolic name for the field, which is used to identify the field in the code and
can only contain alphanumeric characters, and no spaces. You cannot change this symbolic name.
- Select a value format. This reflects how the field is displayed in your sample document.
You can select among the following formats:
- Text: This is the default format, and most fields are of type text.
Names, dates, addresses, numbers and currency amounts are all types of text, as well as composite
fields.
- Barcode: The data is represented as a barcode or QR code that the system
can extract data from.
- Comb of characters: The data is represented as a comb field, where each
character is divided by a vertical line. This is typically used to fill in information in forms,
where the expected value has a restricted number of characters.
- Signature: Indicates a Boolean field that can contain a signature. If a
signature is present, the value is
true. If the field is empty, the value is
false.
- Checkbox: The data is rendered as a graphic box that is checked or
unchecked. A checkbox is always a Boolean field type.
Note: A field can have only one value format, except for text, bar code, and comb. A field can
support the text, bar code, and comb format at the same time (or a combination of these three
formats), which means that your data might be rendered as text in some samples, while in other
samples it is rendered as a bar code or a comb field.
- Select the field type.
The field type defines the data type that is used to hold the field while the document is
processed. Multiple field types are provided by
Document Processing, but you can also define
your own custom field types. Depending on the value format
that you selected, the available field types in the drop-down list might be restricted.
- Text, bar code, and comb value formats can be mapped to any field type except for table and
Boolean.
- A table value format must be mapped to a table field type.
- Checkbox and Signature can be mapped only to the
sys:Boolean field type, or a
custom field type that inherits from sys:Boolean.
- Enter alternative names, or aliases for your field. In a document, the same field can be
identified by different names, case, or phrasing, for example Purchase order
number, PO number, and PO#. In the
Alias section, you can add any alternative name that might come up for your
field.
You can also use aliases to identify ambiguous fields and values that might relate to different
contexts, for example if there are different phone numbers in your document. In that case you can
state the context and the field by following this format:
Context(exact words in the documents)||ambiguous field
name
| Document example |
Ambiguous fields |
 |
In this screenshot, you can define some of the ambiguous fields as follows.
Shipper phone number ( 409-299-3466):
- Field name: Shipper Phone
- Alias: SHIPPER (complete name &
address)||Phone
Notify party phone number ( 084-629-1414):
- Field name: Notify Party Phone
- Alias: NOTIFY PARTY (complete name &
address)||Phone
|
- Select whether that field is required.
If you select Yes, this field is required, an error occurs when the value
is missing for that field at processing time.
- Select whether the value contains sensitive information.
If you select Yes, this field is sensitive, this field is marked so that
it can be handled in different ways by your applications. Inside of Document Processing, the setting does not
change how the data is handled.
- Select Link to table data if you want to link this field to table
summary data, then select the existing summary field from the drop-down list. The field data can
sometimes be displayed inside or outside of the table, or in both places. If you select this option,
the extraction model considers values found inside or outside the table as potential matches to this
field.
- Click Add field to create the field.
What to do next
Repeat this procedure to add all the fields that you need for your document type. You can edit
those fields later, add more fields, or delete fields.
Table fields and composite fields require special attention. For more information, see the
following topics about these specific field types.
Enrichments are optional and can be added now or later. It is often better to do this later,
after you have seen your extractions results and decide which fields need formatting and what
criteria you want to use to validate the extracted data. For more information, see the following
topic about adding enrichments to a field.
The next step is annotating sample documents. Select Teach the model or
click Next to annotate sample documents.