Data Cataloging

Data Cataloging service is a modern metadata management software that provides data insight for exabyte-scale heterogeneous file, object, backup, and archive storage on premises and in the cloud. It can help you manage your unstructured data by reducing the data storage costs, uncovering hidden data value, and reducing the risk of massive data stores.

Before you begin

Meet the system requirements to install the Data Cataloging service.
Important: Support for Data Cataloging service deployments on Linux on IBM zSystems and IBM Power Systems are only available on OpenShift® Container Platform 4.12.x and not on 4.13.
The following details are a base line for finding the resources that are needed for IBM Storage Fusion Data Cataloging service deployment. Based on the following tables, the resources can be estimated based on the number of approximate files that are required. The following are the resource values that are calculated per compute node: You must have at least two worker nodes, each with the same amount of resources available.
Compute nodes: IBM Storage Fusion Data Cataloging service recommends at least two compute nodes. The resources available on the compute nodes directly impact install and performance. The following table shows three resources for Data Cataloging service dedicated Nodes:
Table 1. Starter profile requirements

CPU RAM Disk space Network Storage Workload

Per worker node 16 32 GB 120 GB 10 GB 500 GB 50 M

Table 2. Middle profile requirements

CPU RAM Disk space Network Storage Workload

Per worker node 34 64 GB 120 GB 10 GB 2.4 TB 1 B

Table 3. Large profile requirements

CPU RAM Disk space Network Storage Workload

Per worker nodes 380 814 GB 120 GB 10 GB 21.4 TB 20 B

Table 1. Starter profile requirements
	CPU	RAM	Disk space	Network	Storage	Workload
Per worker node	16	32 GB	120 GB	10 GB	500 GB	50 M

Table 2. Middle profile requirements
	CPU	RAM	Disk space	Network	Storage	Workload
Per worker node	34	64 GB	120 GB	10 GB	2.4 TB	1 B

Table 3. Large profile requirements
	CPU	RAM	Disk space	Network	Storage	Workload
Per worker nodes	380	814 GB	120 GB	10 GB	21.4 TB	20 B

The standard deployment for Data Cataloging service project requests and limits:

Table 4. OpenShift Container Platform requests and limits
Custom resources	Limits
CPU requests	18190 Minimum
CPU limits	96600 Minimum
Memory requests	44140 Minimum
Memory limits	172700 Minimum
Storage	500 GB Minimum

Important: For the Data Cataloging service to run successfully on all platforms, ensure that the storage classes have the following attributes:
- ReadWriteMany (RWX) permissions
- volumeBindingMode set to Immediate
- AllowVolumeExpansion set to true
Go through troubleshooting information related to the installation of Data Cataloging. See Data Cataloging service issues.

About this task

Procedure

Go to Services page in IBM Storage Fusion user interface.
In the Available section, click Data cataloging tile.
In the Data cataloging window, go through the details of the service and click Install.
In the Install service message box, select a Storage class.

Important: If you want to use Global Data Platform as the storage provider, then it is recommended to select the default storage class ibm-spectrum-fusion. Otherwise, if you want to use Fusion Data Foundation, then select the ocs-storagecluster-cephfs storage class. You can also use a custom storage class that matches the requirements.
Click Install. In case of failures, go through the downloaded logs to understand the cause of the failure and fix the issue. For more information about service issues in IBM Storage Fusion, see Troubleshooting installation and upgrade issues in IBM Storage Fusion services.

Validate the installation.

IBM Storage Fusion user interface:

After you enable the Data Cataloging service, you can view the service version and health status. From the ellipsis menu, you can download logs and view documentation. After you successfully collect the logs, a success notification gets displayed. The notification disappears automatically after some time.

Table 5. Health states Data Cataloging service
State	Description
`Installing`	Service installation is in progress
`Upgrading`	Service upgrade is in progress
`Healthy`	Service is healthy
`Degraded`	Service is not healthy