Annotating sample documents

In the Teach the model tab, you annotate all of your sample documents to teach the system where to find fields on each document.

Initially, you can see on this tab a list of all the samples for the selected document type, and the status of each document. Each sample document starts out as Not ready. You must annotate each sample document to change the status to Ready for training. Sample documents in a Not ready state are not used to train the extraction model.

To start annotating, click Annotate incomplete samples or open the samples that you want individually. After some initial samples have been annotated, click Reanalyze to reanalyze all remaining samples and automatically capture some fields on them, if they are similar documents.

On each document, you can see the key-value pairs that the system found are underlined in dashed blue lines. On the right-side panel, you see the list of fields that are defined for the current document type. The fields that have already been found by the system have a green Ready checkmark. If several values are found for a field, you can select the best option in the recommended matches section.

To improve understanding when the sample is ready to train, you can indicate when a field is not present in a sample document by clicking the overflow menu and selecting Mark "Not in document". Your omitted field is then visible in the list with a "not in document" Icon meaning "not in document" icon. You can revert this state by clicking again in the menu and selecting Undo "Not in Document".

Depending on your scenario, go through one of the following procedures to annotate the missing fields.

Note: To annotate tables and composite fields, see the next pages in this section: Annotating table fields and Annotating composite fields. There are specific steps for these fields types.

To associate a system-discovered key-value pair with a field

Follow these steps when one of the key-value pair that is found by the system matches one of your fields.
  1. Select a key-value pair, underlined with dashed blue lines, from the sample image. A purple bounding box is displayed around the field label, while a green bounding box is displayed around the field value. In the menu that opens over the captured field value, you can choose to save the match or cancel.
  2. Verify that the bounding boxes that the system drew are correct, and correct them if necessary.
  3. Verify that the captured values in Field label and Field value are correct, and correct them if necessary.
  4. You can see and edit the aliases for this field. Aliases are alternative names that can help the model identify this field. As you extract data, names that are found in documents are added to this list. Aliases must be unique for each field within a document type.
  5. Select Save match from the menu to save this field.

To annotate a field from the field list

Follow these steps when a field does not correspond directly to one of the key-value pair that the system found.
  1. Select a field from the field list.
  2. Under Field label, click Draw. If your field does not have a label, go to step 4.
  3. Use your mouse to draw a bounding box around the label for this field. Make any necessary corrections to the text captured in the label text box.
  4. Under Field value, click Draw.
  5. Use the mouse to draw a bounding box around the value for this field. Make any necessary corrections to the text captured in the value text box.
  6. You can see and edit the aliases for this field. Aliases are alternative names that can help the model identify this field. As you extract data, names that are found in documents are added to this list. Aliases must be unique for each field within a document type.
  7. Click Save to save this field.

To annotate a field by using keyboard shortcuts

Follow these steps to annotate fields with keyboard shortcuts.
  1. Select the field that you want in the list of fields, then press the letter Q on your keyboard to enter the Draw label mode.
  2. Draw a bounding box around the field label on the sample document image.
  3. Press the letter W on your keyboard to switch to Draw value mode.
  4. Draw a bounding box around the field value on the sample document image.
  5. Press the letter E to save this field and move to the next field in your list.
Note: If your document contains field labels whose corresponding values are missing, you must still annotate the blank values by drawing a bounding box around the area of the document where the value should be. For example:

screenshot of a table with a "sales tax" field label and an annotated empty value

What to do next

After you annotated all of the fields that can be found in a sample document, select Mark this document as ready for training at the top of the field list. You must select this checkbox, otherwise the document is not used for training, even if all fields are annotated. It is not necessary to annotate all fields before you select this checkbox.

When you are done with annotating all sample documents for all of your document types, click Train model.