Rebuilding the index for InfoSphere Information Server clients

You can rebuild the index to ensure that assets from Information Governance Catalog, InfoSphere® Information Analyzer, and InfoSphere Metadata Asset Manager are updated and displayed in each client.

Before you begin

You must have the Suite Administrator role to complete this task.
If you are planning to rebuild index due to missing synchronization, follow the instructions in this technote to troubleshoot indexing.

About this task

The index of assets is used during search operations and in the display of certain attributes like the number of business terms that are associated with an asset. There are several instances when you need to rebuild the index.

If there are existing assets in the metadata repository when you install the clients, you must rebuild the index before you can view them in each client. If you are doing a new installation and no metadata was imported, it is not necessary to rebuild the index.
If the index becomes out-of-sync or corrupted, you must rebuild the index. This can happen when there is a loss of communication between one or more suite components or when one or more components are temporarily uninstalled or not working. If you notice that tables in the metadata repository are not visible in the clients, your index might be out-of-sync or corrupted.
If you run analysis on InfoSphere Information Analyzer data sets in the InfoSphere Information Analyzer workbench, you must rebuild the data set index before you can use that analysis information to search for data sets in the thin client.
If you import InfoSphere DataStage® assets by using InfoSphere DataStage.
If you delete a high level or parent asset in a hierarchy. For example, if you delete a host asset that contains a database and a database schema.
When some asset types are not available in catalog search in Information Governance Catalog New after installation, or import of assets.

If your browser times out while rebuilding the index, you can set a higher time-out setting for your browser. For example, you can use the cURL command line tool to increase the maximum wait time for your browser.

Performance recommendations

The reindex operation runs either on Information Server Enterprise Search host, or on a services tier. It processes queries on XMETA database and sends results to Solr microservice. Therefore, the performance of the reindex operation depends on factors like computing capacity (like the number of CPUs, speed, availability) of services tier or Enterprise Search host and XMETA database, available memory, read and write speed, network speed, or Solr JVM settings.

For the optimal performance of the reindex operation, complete the following steps:

When services tier, Enterprise Search host, or XMETA database are busy, don't run reindex.
If you have many assets, like hundreds of thousands or more, update XMETA statistics:
- On Oracle 12c, run the following command as a system database administrator ('XMETA' is the name of the XMETA schema):
```
EXEC dbms_stats.gather_schema_stats('XMETA', cascade=>TRUE);
```
- On Db2, run the runstats command on all tables in the XMETA schema.
Make sure that the Solr JVM has at least 1 GB memory assigned. By default, when Information Server Enterprise Search is installed, these requirements are met. If you use Shared Open Source, complete these steps:
- On Linux operating systems, open the <InfoSphere Shared Open Source install dir>/solr/install/bin/solr.in.sh file, where <InfoSphere Shared Open Source install dir> is by default /opt/IBM/InformationServer/shared-open-source/. In this file, change the line SOLR_JAVA_MEM="-Xms512m -Xmx512m" to:
```
SOLR_JAVA_MEM="-Xms512m -Xmx1024m"
```
  This line might be commented out in the file.
- On Windows operating systems, open the <InfoSphere Shared Open Source install dir>\solr\install\bin\solr.in.cmd file, where <InfoSphere Shared Open Source install dir> is by default C:\IBM\InformationServer\shared-open-source\. In this file, change the line set SOLR_JAVA_MEM=-Xms512m -Xmx512m to:
```
set SOLR_JAVA_MEM=-Xms512m -Xmx1024m
```
  This line might be commented out in the file.
If services tier runs on IBM® WebSphere® Application Server Network Deployment, make sure that the setting Allow thread allocation beyond maximum thread size is enabled. For more information, see Thread pool settings topic.
When the reindex of all assets is not required, use the assetType parameter to reindex only selected asset types.

Procedure

Close all browser instances.
Open a new supported browser.

Enter the following URL, which corresponds to the reindex REST API method for InfoSphere Information Server:

https://server_name:port_number/ibm/iis/common-utils/rest/v1/app/reindex?batchSize=100&solrBatchSize=100

Notes:

This command contains default values for the batchSize and solrBatchSize parameters. You can use other values that are described in the following table.
If you are currently running tasks that affect product performance, for example, running analysis, or importing data, consider running reindex task later. Reindex greatly affects performance, and running it at a time when the system is already busy might result in the reindex failure.

Table 1. Parameters for reindex REST API method.
Parameter	Description	Sample value
server_name	The name or IP address of the services tier computer.	localhost
port_number	The port number. The default port number for HTTPS is 9443.	9443
batchSize	The batch size to retrieve information from the database. Increasing this size may improve performance but there is a possibility of reindex failure. The default is 100. The maximum value is 10000.	100
solrBatchSize	The batch size to use for Solr indexing. Increasing this size might improve performance. The default is 100. The maximum value is 10000. When you run reindex on a system with Information Server Enterprise Search installed, values over 2000 might cause the reindex to fail. If you use the default product configuration, the optimal value for this parameter is 500.	100
maxWaitTime	The maximum wait time to process a batch of assets or data sets. The default is 240 seconds (4 minutes). This parameter can be increased if the browser timeout setting is higher. Some browsers allow you to extend or disable the timeout setting.	240
assetType	Specifies one or more comma separated asset types. Use the updateIndex parameter and specify false to list all supported asset types.	Category,Term,Information Governance Rule,Information Governance Policy
excludeAssetType	Specifies one or more comma separated asset types to exclude.	Category,Term,Information Governance Rule,Information Governance Policy
start	Specifies if you want indexing to resume from a starting point after a failure. This parameter is applicable to any single asset type.	Any value between 1 and the corresponding number of assets.
updateIndex	When set to `false`, this parameter lists supported asset types and their counts. This parameter is only supported when using the ibm/iis/common-utils/rest/v1/app/reindex URL. The default is true. When set to false all other parameters are ignored.	false
threadCount	Allows you to run a reindexing job in parallel. The default is 4.	4

Enter your credentials to start rebuilding the index.

A message is displayed when the reindexing is complete. This is the sample output if you specify the default values:

Fri Mar 30 11:23:35 PDT 2018: Reindexing assets.
Fri Mar 30 11:23:35 PDT 2018: Deleted all entries from "da-datasets" (Solr) index.
Fri Mar 30 11:23:35 PDT 2018: Number of assets of 'Application and File' type to index is 5.
Fri Mar 30 11:23:35 PDT 2018: Number of assets of 'Attribute' type to index is 60.
Fri Mar 30 11:23:35 PDT 2018: Number of assets of 'Attribute Type' type to index is 16.
...
Fri Mar 30 11:24:13 PDT 2018: Indexed 5 of 5 'XSD Element' asset(s).
Fri Mar 30 11:24:13 PDT 2018: Indexed 2 of 2 'XSD Element Group' asset(s).
Fri Mar 30 11:24:13 PDT 2018: Indexed 1 of 1 'XSD Attribute Group' asset(s).

Fri Mar 30 11:24:13 PDT 2018: Reindex summary: 

Fri Mar 30 11:23:40 PDT 2018: Indexed 5 of 5 'Application and File' asset(s).
Fri Mar 30 11:23:40 PDT 2018: Indexed 60 of 60 'Attribute' asset(s).
...
Fri Mar 30 11:24:13 PDT 2018: Indexed 5 of 5 'XSD Element' asset(s).
Fri Mar 30 11:24:13 PDT 2018: Indexed 2 of 2 'XSD Element Group' asset(s).
Fri Mar 30 11:24:13 PDT 2018: Reindex completed successfully.

Close the browser.

Example

For your reference, see the example values for the parameters and the performance results. Note that the results are only an example, and even with the same system configuration the results might differ depending on asset types, the number of assets, relationships between assets, and types of servers.

Test on InfoSphere Information Server with Enterprise Search and Db2

System configuration:

InfoSphere Information Server version 11.7.1
48 processors (@2.50GHz, 12 cores)
125 GB RAM
Red Hat Enterprise Linux Server, release 7.4

Properties values:

batchSize: 1000
solrBatchSize: 1000
threadCount: 4 (default)
maxWaitTime: 240 (default)
Solr heap size: 1 G (default)

Results: Total asset count is around 12 million and the reindex operation takes about 99 minutes.

Test on InfoSphere Information Server without Enterprise Search and with Oracle 12c

System configuration:

InfoSphere Information Server version 11.7.1
48 processors (@2.50GHz, 12 cores)
125 GB RAM
Red Hat Enterprise Linux Server, release 7.4

Properties values:

batchSize: 2000
solrBatchSize: 500
threadCount: 6
maxWaitTime: 600
Solr heap size: 10 G

Results: Total asset count is around 8 million and the reindex operation takes about 45 minutes.

General recommendations

In general, when the server tier or Enterprise Search host, and XMETA database are configured correctly, and aren't busy, you can use the following values for the reindex operation:

Installation with Enterprise Search

batchSize=500
solrBatchSize=500
maxWaitTime=600

Installation without Enterprise Search

batchSize=6000
solrBatchSize=3000
threadCount =12 or 16
maxWaitTime=600