390 indexer
You can use the 390 indexer to extract index data from and generate index data about line data and AFP reports. In addition, other data types, such as TIFF images, can be captured using the ANYSTORE Exit.
- Indexing parameters that specify how the data should be indexed. You can create the indexing parameters when you define a Content Manager OnDemand application. The parameters are of the same form as used by ACIF, along with some extensions which are unique to the 390 indexer.
- The print data stream.
- AFP reports. For AFP reports, the index values are already specified within the AFP data stream.
- Document organization. For reports made up of logical items, such as statements, policies, and invoices. The 390 indexer can generate index data for each logical item in the report.
- Report organization. For reports that contain line data with sorted values on each page, such as a transaction log or general ledger. The 390 indexer can divide the report into groups of pages and generate index data for each group of pages.
- Anystore Exit. This exit determines the content and index values of each document.
- Large Object. Large object support is designed to provide enhanced usability and better retrieval performance for reports that contain very large documents by segmenting the documents into groups of pages and downloading only the page groups that the users request to view.
- Examine the input data to determine how users use the report, including what information they need to retrieve a report from the system (indexing requirements).
- Create parameters for indexing.
You run the 390 indexer as part of the Content Manager OnDemand load process with the ARSLOAD program. The Content Manager OnDemand application retrieves the indexing parameters from the Content Manager OnDemand database and uses the parameters to process the input data.
The 390 indexer can logically divide reports into individual items, such as statements, policies, and bills. You can define up to 128 index fields for each item in a report.
- If the document (or large object segment) size exceeds 20 MB, then the document data is
temporarily stored in the Content Manager OnDemand temporary HFS directory
(described in the following information). Therefore, if the largest document is 2 GB, then
the temporary HFS directory must have at least 2 GB of available space. If the available HFS
disk space is not sufficient to store the largest document in the report, the load
fails.
If the available HFS disk space is not sufficient to store the largest document in the report, the load fails.
The temporary HFS directory is defined by one of these options:- The
-c
option in theARSLOAD
parameters. If this is not specified, then: - The environment variable
ARS_TMP
. If this is not specified, then: - The environment variable TEMP. If this is not specified, then:
- The current working directory.
- The
- In the final load stage, the complete document (or large object segment) needs to be loaded into memory. Therefore, if the document (or large object segment) is 2 GB in size, then the load program needs to be able to acquire 2 GB of memory to load the data. If the available memory is not sufficient to store the largest document in the report, the load fails.
Any data type can be captured using the 390 indexer. Native support exists for line data and AFP data. Other data types, such as PDF and TIFF images, can be captured by using the Anystore Exit. This provides a method to capture documents of any type and size (including those greater than 2 GB) into Content Manager OnDemand.
Indexing
Indexing parameters include information that allow the 390 indexer to identify key items in the input data stream so they can be extracted from the report and stored in the Content Manager OnDemand database. Content Manager OnDemand uses these index values for efficient, structured search and retrieval.
- AFP Reports. The 390 indexer can capture fully
resolved AFP data streams (AFPDS). The AFPDS must contain the index
values either in the form of TLE or NOP records. For details on these
record types, see INDEXSTYLE. You can capture AFP resources in either of the following ways:
- The resources are in-stream at the beginning of the AFPDS. In this case, the Begin Resource Group (BRG) record and End Resource Group (ERG) record must occur prior to the Begin Document (BDT) record.
- In a z/OS® environment, the resources are in a separate input file and specified in the ARSLOAD JCL via a RESOURCE ddname. On AIX, the resources must be included inline at the beginning of the load file.
- Line Print Reports. Line Print Reports consist of text formatted print streams. Column one of
each record contains a carriage control character.
You specify the index information that allows the 390 indexer to segment the print stream into individual items known as groups. A group is a collection of one or more pages. You define the bounds of the collection, for example, a bank statement, insurance policy, phone bill, or other logical segment of a report file. A group can also represent a specific number of pages in a report. For example, you might decide to segment a 10,000 page report into groups of 100 pages. The 390 indexer creates indexes for each group. Groups are determined when the value of an index changes (for example, account number) or when the maximum number of pages for a group is reached.
An indexing parameter is made up of an attribute name (for example, Customer Name) and an attribute value (for example, Earl Hawkins). The parameters include pointers that tell the 390 indexer where to locate the attribute information in the data stream. For example, the tag
Account Number
with the pointer1,21,16
means that the 390 indexer can expect to find Account Number values starting in column 21 of specific input records. The 390 indexer collects 16 bytes of information starting at column 21 and adds it to a list of attribute values found in the input. For each group that is identified by the 390 indexer, a set of index values that are associated with the group are stored by the Content Manager OnDemand load process into the Content Manager OnDemand database. - Anystore Exits. The use of an Anystore Exit allows for the capture of any type of data. The exit is responsible for reading the data to be captured, breaking it into documents, and determining the index values. A sample Anystore Exit is provided which captures TIFF images using a pre-generated set of indexing instructions read from a separate file.
- Large Object. Provides enhanced usability and better retrieval
performance for reports that contain very large logical items (for
example, statements that exceed 500 pages) and files that contain
many images, graphics, fonts, and bar codes. Content Manager OnDemand segments data into groups
of pages, compressed inside a large object. You determine the number
of pages in a group. When the user retrieves an item, Content Manager OnDemand retrieves and uncompresses
the first group of pages. As the user navigates pages of the item, Content Manager OnDemand automatically retrieves
and uncompresses the appropriate groups of pages. To enable large
object support, fill the Large Object check
box on the Load Information tab of the Application definition.
The Large Object option is supported for AFP reports as well as Line Print reports.
- The 390 indexer also provides support for line print reports with global and/or local Xerox DJDE records. These documents can be loaded in the same manner as the standard line print reports described earlier with the addition of DJDE record handling logic. The global DJDE records are stored separately from the individual documents and retrieved at print time as required.