Defining field types and enrichments
The data in your documents is surfaced as the values for the fields. You can specify details about the fields that facilitate and improve the extraction and storage of your data.
About this task
- String
- Decimal
- Numeric
- Boolean
- Date
In addition, some fields are composite fields, where the value of the field is created from multiple pieces of data. An address is an example of a composite field because it can include values for street name, city, postal code, and so on.
The default model for document types provides a number of field types that you can use if they apply to your specific document types. These field types are available as part of the System library. In addition, the samples that you use to train the model add field types to your project. Your field types are available in your <Project> library.
- Extractors
- You can use a trained extractor, Natural Language Extractor, or regular expressions. The trained extractor is the default extractor that from the machine learning model. You can add extractors to the process, either by specifying an included Natural Language Extractor, or by building your own extractor by using regular expressions.
- Formatters
- Add a formatter to reformat or clean up your data. For example, a formatter can remove spaces, add decimals, remove characters, and so on. The system provides a set of default formatter, but you can also define your own formatters by specifying a formatter string and optional delimiter.
- Converters
- Add a converter to ensure that variations on a field value will provide a standardized output. For example, changing all date formats to a single date format, or changing all currencies to a single currency.
In addition to the Value settings, you can add validators to set specific thresholds and requirements for your extracted data to help improve the model's accuracy. Use validators to define criteria for your values that can indicate when a document contains an invalid value that must be checked and resolved by your document processing user.
Procedure
To define field types and enrichments: