Date fields and custom date formats
The document date is critical for text analytics, especially for exploring how data changes over time and observing deviations and trends.
To preserve how dates are calculated, Watson Explorer Content Analytics includes a parametric index field named Date that you cannot edit or remove. When you configure the parser for a collection, however, you can specify custom date formats to ensure that date data added to the collection is mapped to this index field and indexed correctly.
- If you configure a crawler, you can map data source fields and metadata fields to the Date index field.
- If you import CSV files to a collection, you can specify the format of date values.
- If you add documents to a collection by using the REST administration API, the API can identity date fields.
- If you map HTML and XML elements to index fields, you can map elements to the Date index field.
- If you configure facets for a collection, you can map the Date index field to facets.
- If you associate a UIMA annotator with a collection, the annotator can produce date values for the date facet.
Date facet
The Date index field is used as the document date in the query results. The value of this field is converted into a date facet in content analytics collections so that it can be used to compare time lines, deviations, and trends. In search collections, users can use the date facet to narrow results.
The
date facet consists of the following path components: date, year, month, day, hour
.
The levels of the path components cannot be changed.
When the
parser detects a date value, it converts the value into epoch time
(the number of milliseconds since January 1, 1970, 00:00:00 GMT),
such as 1235487600000
. The characters in the string
are handled as the number of milliseconds since the epoch date.
In addition to the predefined Date index field, you can configure other fields to be used as date fields. In this case, you must specify that the data source field or metadata field is a parametric index field and you must specify that the field contains date data. When the parser detects parametric date fields, the field value is converted to epoch time.
Date formats detected by default
The parser can automatically detect the following date and time formats, in the order specified in this table. In addition to these formats, you can configure the parser to recognize custom date formats for the content that you include in a collection.
Date format | Sample value |
---|---|
RFC 1123 | Sun, 06 Nov 1994 08:49:37 GMT |
RFC 850 | Sunday, 06-Nov-94 08:49:37 GMT |
asctime | Sun Nov 6 08:49:37 1994 |
ISO8601. Only the calendar date is supported.
Unsupported representations are:
|
2004-02-05 |
RFC 1123 without timezone | Sun, 06 Nov 1994 08:49:37 |
RFC 850 without timezone | Sunday, 06-Nov-94 08:49:37 |
Date and time format for the collection's default
local. Obtained through the the Java DateFormat.getDateInstance() class:
|
Custom date formats
When you configure parse and index options for a collection, you can specify custom date formats to ensure that date data that you include in the collection is indexed correctly. The parser tests your custom date formats (following the order that you specify) to parse date values, and then tests the default date formats. The first value that is successfully parsed is used as the date.
- The format string, such as EEE, d MMM yyyy HH:mm:ss Z (for example, Wed, 4 Jul 2001 12:08:56 -0700). The string can be in any format supported by the Java SimpleDateFormat class.
- The locale and time zone for the date. The collection locale and time zone are selected by default.
- The order in which your custom date formats are to be applied. After you add a new custom format, you can move it to first, last, or any position in the list.
Your custom date formats apply to all date content that is added or configured for a collection. For your changes to become effective, you must restart the parser. To apply the changes to documents in the index, either rebuild the index or, if the collection uses a document cache, rebuild the index from the cache.
Displaying dates in the query results
- Edit the properties file for the application, such as the config.properties file for the enterprise search application. In the date.fields property, specify a space-separated list of the fields that are to be formatted like date data in the query results. The format of the displayed date matches the locale settings in the Web browser.
- Run the application customizer, expand the Results tab, and include the names of fields that are to be formatted as dates in the Date fields field. The format of the displayed date matches the locale settings in the Web browser.