Data harvesting
Harvesting or indexing is the process or task by which IBM® StoredIQ® examines and classifies data in your network.
- A full harvest can be run on every volume or on individual volumes.
- An incremental harvest only harvests the changes on the requested volumes
These options are selected when you create a job for the harvest. A harvest must be run before you can start searching for data objects or textual content. An Administrator initiates a harvest by including a harvest step in a job.
Most harvesting parameters are selected from the Configuration subtab. You can specify the number of processes to use during a harvest, whether a harvest must continue where it left off if it was interrupted and many other parameters. Several standard harvesting-related jobs are provided in the system.
Harvesting with and without post-processing
- Loading all metadata for a volume.
- Computing all tags that are registered to a particular volume.
- Generating all reports for that volume.
- If configured, updating tags, and creating explorers in the harvest job.
Incremental harvests
Harvesting volumes takes time and taxes your organization's resources. You can maintain the accuracy of the metadata repository quickly and easily with incremental harvests. With both of these features, you can ensure that the vocabulary for all volumes is consistent and up to date. When you harvest a volume, you can speed up subsequent harvests by only harvesting for data objects that were changed or are new. An incremental harvest indexes new, modified, and removed data objects on your volumes or file servers. Because the harvests are incremental, it takes less time to update the metadata repository with the additional advantage of putting a lighter load on your systems than the original harvests.
Reharvesting
The behavior is the same for both types of data server:
On a reharvest, the metadata for a document is updated because only the latest version of the document is considered. Therefore, the document might then no longer match previously applied filter criteria although is it still part of the infoset.
On a reharvest, also the full-text index is updated. Any previously applied cartridges are automatically reapplied to the latest document version to ensure that the results of any Step-up Analytics action are still available in the full-text index. Step-up Analytics or Step-up Full-Text actions run after a reharvest analyze and annotate the latest document version on the data source.