Audit Tool for Analytical Components

Use the AuditTool command to produce a report that shows the amount of data that is indexed or analyzed in collections that you create in Watson™ Explorer Content Analytics or Watson Explorer Annotation Administration Console.

Watson Explorer Content Analytics is provided with IBM Watson Explorer Advanced Edition. Watson Explorer Annotation Administration Console is a module that you can install after you install the Watson Explorer Foundational Components.

The audit tool is in the installation_directory\bin directory, and it writes output in CSV format to standard output.

You can use the audit tool to measure your use of the product against the entitlement terms in your license agreement with IBM®. The audit tool can help you distinguish between collections that count against the entitlement terms and collections that do not:

Collections that contain indexed content, such as collections that you create in the Watson Explorer Content Analytics administration console, count against your entitlement.
Collections that you analyze by using the natural language processing (NLP) API might count against your entitlement:
- In Watson Explorer Content Analytics, you can use the NLP API to do ad hoc text analytics on documents instead of adding the documents to the index. In this case, the collections that you analyze count against your entitlement.
- Collections that you create for analyzing data through remote NLP API calls from Watson Explorer Engine, such as collections that you create in Annotation Administration Console, do not count against your entitlement.

To assess product usage, you or an auditor must measure all usage and calculate the volume that counts against the terms of entitlement in your license agreement with IBM.

Indexed collections in Watson Explorer Content Analytics

To produce a report that shows the amount of data that is indexed in each collection, enter the following command:

AIX® or Linux®: AuditTool.sh idx
Windows: AuditTool.bat idx

The output of the audit tool contains one header row and one row for each index partition of each collection. Each row contains the following fields, in the specified order:

collection-id: the ID of the collection.
collection-type: the type of collection. The value '1' indicates that the collection is a content analytics collection. The value '0' indicates that the collection is a search collection.
partition-number: the index partition number.
input-bytes: the estimated number of bytes of source data that is stored in the collection index. For example, for a file system crawler, the sizes of the files.
input-bytes-fields: the estimated number of bytes of metadata or text data (with UTF-16 encoding) that is stored in the collection index as search index fields. For example, for a file system crawler, the sizes of the file names, directory names, and so on.

To determine the volume that counts against your terms of entitlement, sum the values in all of the input-bytes and input-bytes-fields fields.

NLP API analysis of Watson Explorer Content Analytics collections

To produce a report that shows the amount of data that was analyzed through the NLP API for each collection, enter the following command:

AIX or Linux: AuditTool.sh nlp
Windows: AuditTool.bat nlp

The output of the audit tool contains one header row and one row for each collection. Each row contains the following fields, in the specified order:

collection-id: the ID of the collection.
collection-type: the type of collection. The value '1' indicates that the collection is a content analytics collection. The value '0' indicates that the collection is a search collection.
input-bytes: the estimated number of bytes of text data (with UTF-16 encoding) that were processed for the collection since the start time.
start-time: the start time of the measurement.

To determine the volume that counts against your terms of entitlement, sum the values in all of the input-bytes field.

NLP API calls to Annotation Administration Console collections

To produce a report that shows the amount of data that was analyzed after it was crawled or analyzed through the NLP API, enter the following command:

AIX or Linux: AuditTool.sh nlp --detail
Windows: AuditTool.bat nlp --detail

The output of the audit tool contains one header row and one row for each collection. Each row contains the following fields, in the specified order:

collection-id: the ID of the collection.
collection-type: the type of collection. The value '1' indicates that the collection is a content analytics collection. The value '0' indicates that the collection is a search collection.
input-bytes-idx: based on crawled data, the estimated number of bytes of text data (with UTF-16 encoding) that were processed for the collection since the start time.
start-time-idx: the start time of the measurement.
input-bytes-nlp: based on data analyzed by the NLP API, the estimated number of bytes of text data (with UTF-16 encoding) that were processed for the collection since the start time.
start-time-nlp: the start time of the measurement.

The measurements for this type of data usage do not count against your terms of entitlement.