Generating a CSV Term Hit Details Export report

The CSV Term Hit Details Export system report provides an exhaustive list of matches from the documents in the selected infoset to the search terms given as report parameters.

Before you begin

This report returns the most reliable results when it is run on full-text indexed infosets, preferably infosets on which at least one Step-up Analytics action with one or more cartridges was run.

Important: Before you run this report on an infoset that you created in a product release before version 7.6.0.15, rerun any Step-up Analytics actions to ensure that the information required for proper report results is added to the index.

Starting with version 7.6.0.17, unmodified files are skipped when a Step-up Analytics action is run unless any of its cartridges was updated. Therefore, update at least one cartridge per action to be able to rerun the action.

About this task

You can use this report, for example, in a development system for debugging during cartridge development and for reviewing the detection quality of your cartridge before promoting the cartridge to production. The report does not cover CDA cartridges. Only cartridges generating indexed annotations are supported. The report also provides important information for risk assessment and remediation efforts: after compiling a set of documents that match a specific pattern, you can find out what exact string matched the pattern at which position in a document and what the context of the string is.

When you generate the report, you specify the search terms in addition to the basic report information such as the report name and optional recipients.

For each match in a data object, the report contains basic information about the object, such as its location, size, type, and ownership, and when the object was created, last accessed, or modified. In addition, these details are available:

The search term as you entered it in your request.
The overall match count for a document; a value of 3, for example, indicates that three search expression matches were found in the document. The matches can be for one search expression or for different ones. Each has its own entry in the report resulting in three report rows for that document.
The text that matched a search expression. A maximum of 128 characters is included in the report.
Short snippets of the text to the left and right of the match to provide some context. This column is filled only for text matches on indexed annotations in documents that were processed with the respective cartridge.
The UIMA type of the match. This column is filled only for text matches on indexed annotations in documents that were processed with the respective cartridge.
The language as detected for that document.
The offset of both the start and the end of the match within the document as a number of plain text characters.
The document size in KB.
An offset percentage to give you a rough understanding of where in the document to look for the match; the lower the value, the closer to the start of the document.
The node ID, which is the unique StoredIQ identifier of the document.
The date and time that the index was built.

Important: Depending on your search terms, the generated report might contain sensitive data.

Procedure

To generate a CSV Term Hit Details Export report:

Select an infoset and open the report pane.
From the system report list, select CSV Term Hit Details Export and enter the basic information.
On the Terms tab, add custom and cartridge terms as required.
Single search terms can have the format of full-text filters:
- Simple terms such as single words that can include wildcards
- Full-text macros in the format {macro}
- Regular expressions in the format re:"regex"
- Indexed annotations in the format ia:filter_term
Complex terms containing logical AND, OR, or NOT operators, or spaces are not supported. For details about the handling of special characters or punctuation in search terms, see the topic about Extended ASCII characters.
When you add a cartridge, the list of the cartridge's supported results is expanded and all of the cartridge terms are added to the list of search terms.

You cannot include CDA cartridges in the report.
You can review your selection in the List of Terms to include in the Report section. The prefix CustomTerm: identifies search terms that you added manually. Search terms that stem from a cartridge are prefixed with the cartridge name. You can add terms to or delete terms from the list as required.
Click Generate to run the report.