Determining which IBM Cloud Pak for Data components to install

IBM Cloud Pak for Data is composed of numerous components so that you can install the specific services that support your needs. Before you install Cloud Pak for Data, determine which components you need to install to support your business requirements.

Installation phase
  • You are not here. Setting up a client workstation
  • You are not here. Setting up a cluster
  • You are here icon. Collecting required information
  • You are not here. Preparing to run installs in a restricted network
  • You are not here. Preparing to run installs from a private container registry
  • You are not here. Preparing the cluster for Cloud Pak for Data
  • You are not here. Preparing to install an instance of Cloud Pak for Data
  • You are not here. Installing an instance of Cloud Pak for Data
  • You are not here. Setting up the Cloud Pak for Data control plane
  • You are not here. Installing solutions and services
Who needs to complete this task?

Cloud Pak for Data operations team The IBM Cloud Pak for Data operations team should determine which components will be installed on the cluster.

When do you need to complete this task?

Complete this task before you complete any of the following tasks:

  • Mirror images to a private container registry.
  • Prepare your cluster for Cloud Pak for Data.
  • Install Cloud Pak for Data software on your cluster.

Repeat as needed You might need to repeat this task if you plan to deploy multiple instances of Cloud Pak for Data, especially if the instances will be installed on different clusters or will include different services.

Before you begin

Before you determine which components you plan to install, review the following guidance:

Guidance for clusters with multiple instances of Cloud Pak for Data
Some components, such as the scheduling service, are cluster-wide resource that can be installed exactly once per cluster. If you are installing multiple instances of Cloud Pak for Data on the same cluster, it is strongly recommended that you install these cluster-wide components at the same release as the latest version of Cloud Pak for Data on the cluster.

For example, if you have one instance of Cloud Pak for Data at Version 5.0.0 and one instance of Cloud Pak for Data at Version 5.0.3, install the cluster-wide components at Version 5.0.3.

Version support
Important: All of the components that are associated with an instance of Cloud Pak for Data must be installed at the same version. You cannot install services from different releases of Cloud Pak for Data in the same instance.
Options for installing components

You have two options for installing the components:

Option Benefits Drawbacks
Install each component individually. If you feel more comfortable running installs one at a time, this option gives you more granular control over the installation process. You must complete more steps to successfully install the software on your environment.

This method also takes longer to complete.

Install multiple components at the same time. You can:
  • Complete the installation in fewer steps.
  • Run parallel installations of some components, which reduces installation time.
There are no known drawbacks associated with this option.

If you encounter an issue when installing a specific component, the cpd-cli gives you the option to resume your install from the point of failure.

About this task

The components are grouped into the following categories:
Cluster-wide components
These components can be installed exactly once on the cluster and are shared by all instances of Cloud Pak for Data on the cluster. Some of these components are optional but strongly recommended.
Instance prerequisites
These components must be installed each time you install an instance of Cloud Pak for Data:
  • IBM Cloud Pak foundational services
  • IBM Cloud Pak for Data
Services
Services are the components that enable you to complete specific tasks. Install the services that support your use case.

Procedure

To determine which components to install:

  1. Identify which cluster-wide components to install on the cluster:
    Cluster-wide component Notes
    Certificate manager A certificate manager is required.

    The IBM Cloud Pak foundational services Certificate manager is recommended over the Red Hat® OpenShift® cert-manager or a community certificate manager. However, if the cpd-cli detects another certificate manager on the cluster, it will not install the IBM Cloud Pak foundational services Certificate manager.

    License Service The IBM Cloud Pak foundational services License Service is required.

    You are required to keep a record of the size of your deployments to report to IBM as requested. If you are using Container Licensing, you can use the License Service to measure Cloud Pak for Data usage.

    Scheduling service The scheduling service is required if you plan to use:
    • The quota enforcement feature in Cloud Pak for Data
    • The node scoring feature for pod placement
    • The Watson Machine Learning Accelerator service
    • Priority scheduling and co-scheduling in the Analytics Engine powered by Apache Spark service
    • Remote physical locations

      If you plan to use remote physical locations, you must install the scheduling service on the primary cluster and on the remote cluster where you plan to create the remote physical location.

    If none of these scenarios applies to you, the scheduling service is optional but strongly recommended.

  2. Identify which services you plan to install in the Cloud Pak for Data instance.
    If you are installing the services required to support a specific solution, see the appropriate section for the solution:

    If you are designing your own solution, see the All services section.

    Business Analytics solution

    The Business Analytics solution supports several use cases. The services that you install depend on the use cases that you want to implement:


    Business Intelligence
    Service Component ID
    Cognos Analytics cognos_analytics

    Planning, Budgeting, and Forecasting
    Service Component ID
    Planning Analytics planning_analytics

    Customer Care solution

    The Customer Care solution supports several use cases. The services that you install depend on the use cases that you want to implement:


    Content Intelligence
    Service Component ID
    Watson Discovery watson_discovery

    Conversational AI
    Service Component ID
    watsonx Assistant watson_assistant

    Speech
    Service Component ID
    Watson Speech services watson_speech

    Data Fabric solution

    The Data Fabric supports several use cases. The services that you install depend on the use cases that you want to implement:


    Customer 360
    Service Component ID Notes
    Data Virtualization dv When you install Data Virtualization, the following service is automatically installed:
    • Db2 Data Management Console (dmc)
    IBM Knowledge Catalog wkc When you install IBM Knowledge Catalog, the following services are automatically installed:
    • Analytics Engine powered by Apache Spark (analyticsengine)
    • Data Refinery (datarefinery)

    If you plan to install the data quality feature, the service automatically installs the following service:

    • DataStage Enterprise (datastage_ent)

      If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (datastage_ent).

    IBM Match 360 with Watson match360  
    Optional components
    If you want to use dashboards to share analytics results, you can optionally install the Cognos Dashboards component:
    Service Component ID
    Cognos Dashboards dashboard

    Data Governance and Privacy
    Service Component ID Notes
    Data Privacy dp To install this service, you must install both of the following services:
    • Analytics Engine powered by Apache Spark (analyticsengine), which is automatically installed by IBM Knowledge Catalog.
    • IBM Knowledge Catalog (wkc)
    Data Virtualization dv When you install Data Virtualization, the following service is automatically installed:
    • Db2 Data Management Console (dmc)
    IBM Knowledge Catalog wkc When you install IBM Knowledge Catalog, the following services are automatically installed:
    • Analytics Engine powered by Apache Spark (analyticsengine)
    • Data Refinery (datarefinery)

    If you plan to install the data quality feature, the service automatically installs the following service:

    • DataStage Enterprise (datastage_ent)

      If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (datastage_ent).


    MLOps and Trustworthy AI
    Service Component ID Notes
    AI Factsheets factsheet To install this service, you must also install one or both of the following services:
    • IBM Knowledge Catalog (wkc)
    • Watson Studio (ws)
    IBM Knowledge Catalog wkc When you install IBM Knowledge Catalog, the following services are automatically installed:
    • Analytics Engine powered by Apache Spark (analyticsengine)
    • Data Refinery (datarefinery)

    If you plan to install the data quality feature, the service automatically installs the following service:

    • DataStage Enterprise (datastage_ent)

      If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (datastage_ent).

    Watson Machine Learning wml To use the experiment builder for AutoAI and Federated Learning, you must install the following service:
    • Watson Studio (ws)
    Watson OpenScale openscale To install this service, you must have an external database or you must install one of the following services:
    • Db2 (db2oltp)
    • Db2 Warehouse (db2wh)
    • EDB Postgres (edb_cp4d)
    Orchestration Pipelines ws_pipelines  
    Watson Studio ws When you install Watson Studio, the following services are automatically installed:
    • Data Refinery (datarefinery)
    • Watson Studio Runtimes (ws_runtimes)
    If you complete the following actions for Watson Studio, the Data Refinery objects and Watson Studio Runtimes objects are automatically included:
    • Mirror images
    • Create OLM artifacts
    • Create the operands

    Multicloud Data Integration
    Service Component ID Notes
    Data Virtualization dv When you install Data Virtualization, the following service is automatically installed:
    • Db2 Data Management Console (dmc)
    DataStage Enterprise datastage_ent  
    IBM Knowledge Catalog wkc When you install IBM Knowledge Catalog, the following services are automatically installed:
    • Analytics Engine powered by Apache Spark (analyticsengine)
    • Data Refinery (datarefinery)

    If you plan to install the data quality feature, the service automatically installs the following service:

    • DataStage Enterprise (datastage_ent)

      If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (datastage_ent).

    Optional components
    If you want to use Quality Stages, such as the Address Verification stage, you can optionally upgrade from DataStage Enterprise to DataStage Enterprise Plus:
    Service Component ID
    DataStage Enterprise Plus datastage_ent_plus

    Data Management

    The Data Management solution includes various data storage options. Choose the components that support your business needs:


    Analytics data sources
    Service Component ID
    Data Virtualization dv
    Db2 Warehouse db2wh

    Transactional data sources
    Service Component ID
    Db2 db2oltp
    Informix informix_cp4d

    OEM data sources
    Service Component ID
    EDB Postgres edb_cp4d
    MongoDB mongodb_cp4d

    All services

    You can install a custom set of services based on your business needs.

    Service Component IDs Notes
    AI Factsheets factsheet To install this service, you must also install one or both of the following services:
    • IBM Knowledge Catalog (wkc)
    • Watson Studio (ws)
    Analytics Engine powered by Apache Spark analyticsengine This service is automatically installed if you install IBM Knowledge Catalog (wkc).

    However, you can install Analytics Engine powered by Apache Spark without installing IBM Knowledge Catalog.

    Cognos Analytics cognos_analytics  
    Cognos Dashboards dashboard If you are upgrading from Version 4.5 or Version 4.6, the dashboard component replaces the cde component.
    Data Gate datagate To provision an instance of this service, you must install one of the following services:
    • Db2 (db2oltp)
    • Db2 Warehouse (db2wh)
    Data Privacy dp To install this service, you must install both of the following services:
    • Analytics Engine powered by Apache Spark (analyticsengine), which is automatically installed by IBM Knowledge Catalog.
    • IBM Knowledge Catalog (wkc)
    Data Product Hub dataproduct When you install Data Product Hub, the following services are automatically installed:
    • Analytics Engine powered by Apache Spark (analyticsengine)
    • Data Refinery (datarefinery)
    • IBM Knowledge Catalog (wkc)
    Data Refinery datarefinery This service is automatically installed if you install one of the following services:
    • Data Product Hub
    • IBM Knowledge Catalog
    • Watson Studio
    Important: Do not specify the datarefinery component to install or upgrade Data Refinery.

    The component is automatically installed and upgraded by the dataproduct, wkc or ws components.

    You cannot install or upgrade Data Refinery independently of these services.

    If you complete the following actions for Data Product Hub, IBM Knowledge Catalog, or Watson Studio, the Data Refinery objects are automatically included:
    • Mirror images
    • Create OLM artifacts
    • Create the operands
    Data Replication replication  
    DataStage Enterprise datastage_ent  
    DataStage Enterprise Plus datastage_ent_plus  
    Data Virtualization dv When you install Data Virtualization, the following service is automatically installed:
    • Db2 Data Management Console (dmc)
    Db2 db2oltp  
    Db2 Big SQL bigsql  
    Db2 Data Management Console dmc This service is automatically installed if you install Data Virtualization.
    Db2 Warehouse db2wh  
    Decision Optimization dods To install this service, you must install both of the following services:
    • Watson Machine Learning (wml)
    • Watson Studio (ws)
    EDB Postgres
    • edb_cp4d
    • postgresql
    The postgresql component is automatically installed when you install the edb_cp4d component.
    Execution Engine for Apache Hadoop hee To install this service, you must install both the following services:
    • Watson Machine Learning (wml)
    • Watson Studio (ws)
    IBM Knowledge Catalog wkc When you install IBM Knowledge Catalog, the following services are automatically installed:
    • Analytics Engine powered by Apache Spark (analyticsengine)
    • Data Refinery (datarefinery)

    If you plan to install the data quality feature, the service automatically installs the following service:

    • DataStage Enterprise (datastage_ent)

      If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (datastage_ent).

    IBM Knowledge Catalog Premium ikc_premium When you install IBM Knowledge Catalog Premium, the following services are automatically installed:
    • Analytics Engine powered by Apache Spark (analyticsengine)
    • Data Refinery (datarefinery)

    If you plan to install the data quality feature, the service automatically installs the following service:

    • DataStage Enterprise (datastage_ent)

      If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (datastage_ent).

    IBM Knowledge Catalog Standard ikc_standard When you install IBM Knowledge Catalog Standard, the following service is automatically installed:
    • Analytics Engine powered by Apache Spark (analyticsengine)
    IBM Match 360 with Watson match360  
    Informix
    • informix_cp4d
    • informix
    The informix component is automatically installed when you install the informix_cp4d component.
    MANTA Automated Data Lineage mantaflow To install this service, you must install one of the following services:
    • IBM Knowledge Catalog
    • IBM Knowledge Catalog Premium
    • IBM Knowledge Catalog Standard
    MongoDB
    • mongodb
    • mongodb_cp4d
    The mongodb component is automatically installed when you install the mongodb_cp4d component.
    OpenPages openpages  
    Orchestration Pipelines ws_pipelines  
    Planning Analytics planning_analytics  
    Product Master productmaster  
    RStudio® Server Runtimes rstudio To install this service, you must install the following service:
    • Watson Studio (ws)
    SPSS Modeler spss To install this service, you must install the following service:
    • Watson Studio (ws)
    Synthetic Data Generator syntheticdata  
    Voice Gateway voice_gateway To install this service, you must install the following services:
    • watsonx Assistant (watson_assistant)
    • Watson Speech to Text (watson_speech)
    • Watson Text to Speech (watson_speech)
    Watson Discovery watson_discovery  
    Watson Machine Learning wml To use the experiment builder for AutoAI and Federated Learning, you must install the following service:
    • Watson Studio (ws)
    Watson Machine Learning Accelerator wml_accelerator To install this service, you must install the following cluster-wide component:
    • Scheduling service (scheduler)
    Watson OpenScale openscale To install this service, you must have an external database or you must install one of the following services:
    • Db2 (db2oltp)
    • Db2 Warehouse (db2wh)
    • EDB Postgres (edb_cp4d)
    Watson Speech services watson_speech  
    Watson Studio ws When you install Watson Studio, the following services are automatically installed:
    • Data Refinery (datarefinery)
    • Watson Studio Runtimes (ws_runtimes)
    If you complete the following actions for Watson Studio, the Data Refinery objects and Watson Studio Runtimes objects are automatically included:
    • Mirror images
    • Create OLM artifacts
    • Create the operands
    Watson Studio Runtimes ws_runtimes

    The default runtime is automatically installed or upgraded when you install or upgrade Watson Studio.

    Upgrades
    If you want to upgrade all existing runtimes automatically when you upgrade Watson Studio, specify the ws_runtimes component when you upgrade Watson Studio.

    If you do not specify the ws_runtimes component when you upgrade Watson Studio, only the default runtime is upgraded. You must upgrade the non-default runtimes manually.

    Fresh installations
    Do not specify the ws_runtimes component when you install Watson Studio.

    The default runtime is automatically installed when you install Watson Studio.

    If you want to use non-default runtimes on your environment, you must install them individually.

    For details on how to install or upgrade non-default runtimes, see Watson Studio Runtimes.

    If you complete the following actions for Watson Studio, the Watson Studio Runtimes objects are automatically included:
    • Mirror images
    • Create OLM artifacts
    watsonx.ai watsonx_ai When you install watsonx.ai, the following services are automatically installed:
    • Watson Studio (ws)
    • Watson Machine Learning (wml)
    watsonx Assistant watson_assistant  
    watsonx Code Assistant for Red Hat Ansible® Lightspeed wca_ansible When you install watsonx Code Assistant for Red Hat Ansible Lightspeed, the following services are automatically installed:
    • watsonx.ai (watsonx_ai)
    watsonx Code Assistant for Z wca_z  
    watsonx Code Assistant for Z Code Explanation wca_z_ce  
    watsonx.data watsonx_data When you install watsonx.data, the following services are automatically installed:
    • Analytics Engine powered by Apache Spark (analyticsengine)
    watsonx.governance watsonx_governance When you install watsonx.governance, the following services are automatically installed:
    • Watson Machine Learning (wml)
    watsonx Orchestrate watsonx_orchestrate When you install watsonx Orchestrate, the following services are automatically installed:
    • watsonx Assistant (watson_assistant)

What to do next

Now that you've determined which components to install, you're ready to complete Setting up installation environment variables.