IBM Support

How to Consolidate Date-Partitioned Content Search Services collections in FileNet Content Engine

How To


Summary

How can I consolidate Content Search Services collections that are not full and that will no longer be used to store more data due to date partitioning?

Objective

Content Engine and Content Search Services in releases 5.2.1 and higher allow for collections to be assigned individual date ranges based on a metadata property set at the time of creation of document objects. Each date-partitioned collection contains only data with dates within that range assigned. This increases CBR search performance for queries that include the property used for the date partitioning, since only collections identified as containing documents in that date range are searched.

The optimal state for collections is that they be full, which means to populate a collection fully without exceeding either the "Index Maximum Object Count" or the "Index Maximum Size" setting assigned within to each index area.  In P8 5.x, these settings are on the General tab of each Index Area.

With date partitioning, there is the potential for collections to have "excess capacity" (not full as described earlier), when they reach the end of a date range. This situation is possible when multiple IndexAreas are configured, since there are multiple collections for each date range created and being populated simultaneously. In most cases, no more data can be ingested for that date range and so there is unused capacity in all those collections.

Since it is better to have fewer collections to search overall, it is best to reduce the total number collections for each date partition range in the P8 5.x Domain when possible. However, note that it is impossible to get rid of all extra capacity since document totals within a date range seldom are an exact multiple of the "Index Maximum Object Count" or the "Index Maximum Size" settings.

One other point of consideration:

A collection can never reopen itself to new content if the active document count later becomes less than the "Index Maximum Object Count" or the "Index Maximum Size" setting due to documents being deleted. However, the P8 administrator can manually set a collection back to the open state so that the remainder of its capacity is used until it again reaches the appropriate threshold.

Once consolidation takes place, one smaller collection remains. It is recommended to temporarily increase the "Index Maximum Object Count" or the "Index Maximum Size" setting so that one of the existing collections can hold the remaining data. This threshold change would be made in the procedure where indicated.  After the procedure is completed, restore the "Index Maximum Object Count" or the "Index Maximum Size" setting value to its original setting.

Steps

Overview
If you have collections smaller than the maximum settings that use the same date partition range, select the smallest one, close the rest of the collections for that date range and reindex all of the collections that were closed. As the content is reindexed, it writes to the existing open collection, fills it to capacity and then creates more collections as required. After the first reindex, the collections may no longer have content assigned to them and are deleted automatically.  If not, reindex the closed collections a second time so they will be deleted from the P8 configuration then. This action reduces the total collections for the date partition used within the domain.

Planning and Preparation

Identify the collections that share the same partitioned date range by reviewing the following collection details:
- The date partition ranges of the collections (by using ACCE or SQL Query)
- The "Indexed Object Count" or the "Index Size" setting (by using ACCE or SQL Query)

Using ACCE to identify collection date partition ranges and counts and sizes:

Under each IndexArea in the object store, navigate to the Collections tab and highlight each collection, observing the date ranges and sizes assigned to each as you highlight them OR better yet, use the Index View page and scroll to the right until the partition date ranges and sizes are displayed. 
Collections that share a date range can be candidates for consolidation. When multiple IndexAreas are in use, there is one open collection per IndexArea for any date range, but multiple may exist if some filled to the maximum setting during that date range's processing.

Using SQL queries to identify collection date partition ranges:

If you are unable to use the Index View tab to identify your candidates for consolidation, a system DBA can run this query directly on the P8 object store tables using database product tools:

For instance,
SELECT
TEXTINDEX.INDEX_NAME,
INDEXAREA.DISPLAY_NAME AS INDEX_AREA_NAME,
TEXTINDEX.STATUS AS INDEX_RESOURCE_STATUS,
TEXTINDEX.INDEX_OBJECT_COUNT,
TEXTINDEX.INDEX_SIZE_KB,
TEXTINDEXPARTITION.PARTITION_PROP_NAME,
TEXTINDEXPARTITION.PARTITION_START_DATE,
TEXTINDEXPARTITION.PARTITION_END_DATE
FROM TEXTINDEX
LEFT OUTER JOIN TEXTINDEXPARTITION
ON TEXTINDEX.OBJECT_ID = TEXTINDEXPARTITION.PARENT_ID
LEFT OUTER JOIN INDEXAREA  
ON TEXTINDEX.PARENT_ID = INDEXAREA.OBJECT_ID
The results of this query show collections that share the same date range and also identifies which IndexArea "owns" them, making it easier to navigate straight to them when closing or reindexing them. Collections that share a date range can be candidates for consolidation. When multiple IndexAreas are in use, there is one open collection per IndexArea for any date range, but multiple may exist if some filled to the maximum setting during that date range's processing.
 
Steps for consolidating the collections:
  1. Taking the information from the Planning section, determine which collections share a date range and for which sets of collections it is worthwhile to consolidate into fewer collections.

    Calculate how many nearly full collections can be achieved by re-indexing them together. To avoid having one remaining unpopulated collection, increase the "Index Maximum Object Count" or the "Index Maximum Size" setting by the appropriate amount to allow that data to fit within the existing collections. The Index maximum size value should be no more then 250000 MB. Take note of the existing value in order to restore that value at the end of this procedure.

    For each set of collections that share a date range, select the smallest collection: this collection is the destination collection for the reindexing and will stay in the "Open" state in ACCE. The rest of the collections are to be consolidated.
     
  2. Set the Collection Status of each of the other collections to be consolidated to "Closed" in ACCE (meaning they are closed for new content): locate the collection in the appropriate IndexArea's Collections tab, highlight it to verify its date partition is the right one then change the Collection Status radio button from "Open" to "Closed" and click Apply or OK.
     
  3. Submit an Index Job for each of the closed collections for this date range in ACCE:

    Locate the collection in the appropriate IndexArea's Collections tab, highlight it to verify its date partition is the right one then click the "Index for Content Search" button. A warning will show stating that the indexing task could take some time to complete: click "Yes" and then acknowledge the next prompt which confirms that an IndexJob was created.
     
  4. The Index Jobs can now be monitored in the ACCE Index Jobs Management window.

    Access the list of jobs under Index Jobs Manager in ACCE. P8 will only ever process one Index Job at a time, even though all jobs are visible and queued up for processing. As each one completes, its status changes to become "Terminated Normally" and the next one will begin.
     
  5. Trigger removal of the empty collections:

    Once all of the jobs complete, the original collections may still exist but with zero Indexed Object Count as shown in the ACCE Index View list. Check the list (pressing the Refresh button if needed) to verify they have zero Active Documents. Reindex each of the closed collections that have zero documents once more in ACCE (as was done in step 3) and again monitor the jobs to completion (as was done in step 4).

    Once the jobs complete this second time, all the "excess" collections are deleted by P8. To confirm this in ACCE, either "Refresh" the object store metadata display completely at the P8 Domain Level or exit and restart ACCE completely.
     
  6. If the value of the "Index Maximum Object Count" or the "Index Maximum Size" setting was increased in step 1, revert to the original value that was used.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSNVNV","label":"FileNet Content Manager"},"ARM Category":[{"code":"a8m0z000000bpTwAAI","label":"CBR"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"5.2.1;5.5.0;5.5.1;5.5.2;5.5.3;5.5.4;5.5.5;5.5.6;5.5.7"}]

Document Information

Modified date:
13 October 2021

UID

ibm16497741