Data Cataloging

Data Cataloging service is a modern metadata management software that provides data insight for exabyte-scale heterogeneous file, object, backup, and archive storage on premises and in the cloud. It can help you manage your unstructured data by reducing the data storage costs, uncovering hidden data value, and reducing the risk of massive data stores.

Before you begin

  • Meet the system requirements to install the Data Cataloging service.
  • Important: Support for Data Cataloging service deployments on Linux on IBM zSystems and IBM Power Systems are only available on OpenShift® Container Platform 4.12.x and not on 4.13.
  • The following details are a base line for finding the resources that are needed for IBM Storage Fusion Data Cataloging service deployment. Based on the following tables, the resources can be estimated based on the number of approximate files that are required. The following are the resource values that are calculated per compute node: You must have at least two worker nodes, each with the same amount of resources available.
  • Compute nodes: IBM Storage Fusion Data Cataloging service recommends at least two compute nodes. The resources available on the compute nodes directly impact install and performance. The following table shows three resources for Data Cataloging service dedicated Nodes:
    Table 1. Starter profile requirements
      CPU RAM Disk space Network Storage Workload
    Per worker node 16 32 GB 120 GB 10 GB 500 GB 50 M
    Table 2. Middle profile requirements
      CPU RAM Disk space Network Storage Workload
    Per worker node 34 64 GB 120 GB 10 GB 2.4 TB 1 B
    Table 3. Large profile requirements
      CPU RAM Disk space Network Storage Workload
    Per worker nodes 380 814 GB 120 GB 10 GB 21.4 TB 20 B
  • The standard deployment for Data Cataloging service project requests and limits:
    Table 4. OpenShift Container Platform requests and limits
    Custom resources Limits
    CPU requests 18190 Minimum
    CPU limits 96600 Minimum
    Memory requests 44140 Minimum
    Memory limits 172700 Minimum
    Storage 500 GB Minimum
  • Important: For the Data Cataloging service to run successfully on all platforms, ensure that the storage classes have the following attributes:
    • ReadWriteMany (RWX) permissions
    • volumeBindingMode set to Immediate
    • AllowVolumeExpansion set to true
  • Go through troubleshooting information related to the installation of Data Cataloging. See Data Cataloging service issues.

About this task

Procedure

  1. Go to Services page in IBM Storage Fusion user interface.
  2. In the Available section, click Data cataloging tile.
  3. In the Data cataloging window, go through the details of the service and click Install.
  4. In the Install service message box, select a Storage class.
    Important: If you want to use Global Data Platform as the storage provider, then it is recommended to select the default storage class ibm-spectrum-fusion. Otherwise, if you want to use Fusion Data Foundation, then select the ocs-storagecluster-cephfs storage class. You can also use a custom storage class that matches the requirements.
  5. Click Install. In case of failures, go through the downloaded logs to understand the cause of the failure and fix the issue. For more information about service issues in IBM Storage Fusion, see Troubleshooting installation and upgrade issues in IBM Storage Fusion services.
  6. Validate the installation.
    • IBM Storage Fusion user interface:

      After you enable the Data Cataloging service, you can view the service version and health status. From the ellipsis menu, you can download logs and view documentation. After you successfully collect the logs, a success notification gets displayed. The notification disappears automatically after some time.

      Table 5. Health states Data Cataloging service
      State Description
      Installing Service installation is in progress
      Upgrading Service upgrade is in progress
      Healthy Service is healthy
      Degraded Service is not healthy