Determining which IBM Cloud Pak for Data components to install

IBM Cloud Pak for Data is composed of numerous components so that you can install the specific services that support your needs. Before you install Cloud Pak for Data, determine which components you need to install to support your business requirements.

Installation phase

Setting up a client workstation
Setting up a cluster
Collecting required information
Preparing to run installs in a restricted network
Preparing to run installs from a private container registry
Preparing the cluster for Cloud Pak for Data
Preparing to install an instance of Cloud Pak for Data
Installing an instance of Cloud Pak for Data
Setting up the Cloud Pak for Data control plane
Installing solutions and services

Who needs to complete this task?

Cloud Pak for Data operations team The IBM Cloud Pak for Data operations team should determine which components will be installed on the cluster.

When do you need to complete this task?

Complete this task before you complete any of the following tasks:

Mirror images to a private container registry.
Prepare your cluster for Cloud Pak for Data.
Install Cloud Pak for Data software on your cluster.

Repeat as needed You might need to repeat this task if you plan to deploy multiple instances of Cloud Pak for Data, especially if the instances will be installed on different clusters or will include different services.

Before you begin

Before you determine which components you plan to install, review the following guidance:

Guidance for clusters with multiple instances of Cloud Pak for Data
Version support
Options for installing components

Guidance for clusters with multiple instances of Cloud Pak for Data

Some components, such as the scheduling service, are cluster-wide resource that can be installed exactly once per cluster. If you are installing multiple instances of Cloud Pak for Data on the same cluster, it is strongly recommended that you install these cluster-wide components at the same release as the latest version of Cloud Pak for Data on the cluster.

For example, if you have one instance of Cloud Pak for Data at Version 5.0.0 and one instance of Cloud Pak for Data at Version 5.0.3, install the cluster-wide components at Version 5.0.3.

Version support

Important: All of the components that are associated with an instance of Cloud Pak for Data must be installed at the same version. You cannot install services from different releases of Cloud Pak for Data in the same instance.

Options for installing components

You have two options for installing the components:

Option	Benefits	Drawbacks
Install each component individually.	If you feel more comfortable running installs one at a time, this option gives you more granular control over the installation process.	You must complete more steps to successfully install the software on your environment. This method also takes longer to complete.
Install multiple components at the same time.	You can: Complete the installation in fewer steps. Run parallel installations of some components, which reduces installation time.	There are no known drawbacks associated with this option. If you encounter an issue when installing a specific component, the `cpd-cli` gives you the option to resume your install from the point of failure.

About this task

The components are grouped into the following categories:

Cluster-wide components

These components can be installed exactly once on the cluster and are shared by all instances of Cloud Pak for Data on the cluster. Some of these components are optional but strongly recommended.

Instance prerequisites

These components must be installed each time you install an instance of Cloud Pak for Data:

IBM Cloud Pak foundational services
IBM Cloud Pak for Data

Services

Services are the components that enable you to complete specific tasks. Install the services that support your use case.

Procedure

To determine which components to install:

Identify which cluster-wide components to install on the cluster:

Cluster-wide component Notes

Cluster-wide component	Notes
Certificate manager	A certificate manager is required. The IBM Cloud Pak foundational services Certificate manager is recommended over the Red Hat® OpenShift® `cert-manager` or a community certificate manager. However, if the `cpd-cli` detects another certificate manager on the cluster, it will not install the IBM Cloud Pak foundational services Certificate manager.
License Service	The IBM Cloud Pak foundational services License Service is required. You are required to keep a record of the size of your deployments to report to IBM as requested. If you are using Container Licensing, you can use the License Service to measure Cloud Pak for Data usage.
Scheduling service	The scheduling service is required if you plan to use: The quota enforcement feature in Cloud Pak for Data The node scoring feature for pod placement The Watson Machine Learning Accelerator service Priority scheduling and co-scheduling in the Analytics Engine powered by Apache Spark service Remote physical locations If you plan to use remote physical locations, you must install the scheduling service on the primary cluster and on the remote cluster where you plan to create the remote physical location. If none of these scenarios applies to you, the scheduling service is optional but strongly recommended.

Certificate manager

A certificate manager is required.

The IBM Cloud Pak foundational services Certificate manager is recommended over the Red Hat® OpenShift® cert-manager or a community certificate manager. However, if the cpd-cli detects another certificate manager on the cluster, it will not install the IBM Cloud Pak foundational services Certificate manager.

License Service

The IBM Cloud Pak foundational services License Service is required.

You are required to keep a record of the size of your deployments to report to IBM as requested. If you are using Container Licensing, you can use the License Service to measure Cloud Pak for Data usage.

Scheduling service

The scheduling service is required if you plan to use:

The quota enforcement feature in Cloud Pak for Data
The node scoring feature for pod placement
The Watson Machine Learning Accelerator service
Priority scheduling and co-scheduling in the Analytics Engine powered by Apache Spark service
Remote physical locations
If you plan to use remote physical locations, you must install the scheduling service on the primary cluster and on the remote cluster where you plan to create the remote physical location.

If none of these scenarios applies to you, the scheduling service is optional but strongly recommended.

Identify which services you plan to install in the Cloud Pak for Data instance.

If you are installing the services required to support a specific solution, see the appropriate section for the solution:

Business Analytics solution
Customer Care solution
Data Fabric solution
Data Management

If you are designing your own solution, see the All services section.

Business Analytics solution

The Business Analytics solution supports several use cases. The services that you install depend on the use cases that you want to implement:

Business Intelligence

Service	Component ID
Cognos Analytics	`cognos_analytics`

Planning, Budgeting, and Forecasting

Service	Component ID
Planning Analytics	`planning_analytics`

Customer Care solution

The Customer Care solution supports several use cases. The services that you install depend on the use cases that you want to implement:

Content Intelligence

Service	Component ID
Watson Discovery	`watson_discovery`

Conversational AI

Service	Component ID
watsonx Assistant	`watson_assistant`

Speech

Service	Component ID
Watson Speech services	`watson_speech`

Data Fabric solution

The Data Fabric supports several use cases. The services that you install depend on the use cases that you want to implement:

Customer 360

Service Component ID Notes

Service	Component ID	Notes
Data Virtualization	`dv`	When you install Data Virtualization, the following service is automatically installed: Db2 Data Management Console (`dmc`)
IBM Knowledge Catalog	`wkc`	When you install IBM Knowledge Catalog, the following services are automatically installed: Analytics Engine powered by Apache Spark (`analyticsengine`) Data Refinery (`datarefinery`) If you plan to install the data quality feature, the service automatically installs the following service: DataStage Enterprise (`datastage_ent`) If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (`datastage_ent`).
IBM Match 360 with Watson	`match360`

Data Virtualization

dv

When you install Data Virtualization, the following service is automatically installed:

Db2 Data Management Console (dmc)

IBM Knowledge Catalog

wkc

When you install IBM Knowledge Catalog, the following services are automatically installed:

Analytics Engine powered by Apache Spark (analyticsengine)
Data Refinery (datarefinery)

If you plan to install the data quality feature, the service automatically installs the following service:

DataStage Enterprise (datastage_ent)
If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (datastage_ent).

IBM Match 360 with Watson match360

Optional components

If you want to use dashboards to share analytics results, you can optionally install the Cognos Dashboards component:

Service	Component ID
Cognos Dashboards	`dashboard`

Data Governance and Privacy

Service Component ID Notes

Service	Component ID	Notes
Data Privacy	`dp`	To install this service, you must install both of the following services: Analytics Engine powered by Apache Spark (`analyticsengine`), which is automatically installed by IBM Knowledge Catalog. IBM Knowledge Catalog (`wkc`)
Data Virtualization	`dv`	When you install Data Virtualization, the following service is automatically installed: Db2 Data Management Console (`dmc`)
IBM Knowledge Catalog	`wkc`	When you install IBM Knowledge Catalog, the following services are automatically installed: Analytics Engine powered by Apache Spark (`analyticsengine`) Data Refinery (`datarefinery`) If you plan to install the data quality feature, the service automatically installs the following service: DataStage Enterprise (`datastage_ent`) If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (`datastage_ent`).

Data Privacy

dp

To install this service, you must install both of the following services:

Analytics Engine powered by Apache Spark (analyticsengine), which is automatically installed by IBM Knowledge Catalog.
IBM Knowledge Catalog (wkc)

Data Virtualization

dv

When you install Data Virtualization, the following service is automatically installed:

Db2 Data Management Console (dmc)

IBM Knowledge Catalog

wkc

When you install IBM Knowledge Catalog, the following services are automatically installed:

Analytics Engine powered by Apache Spark (analyticsengine)
Data Refinery (datarefinery)

If you plan to install the data quality feature, the service automatically installs the following service:

DataStage Enterprise (datastage_ent)
If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (datastage_ent).

MLOps and Trustworthy AI

Service	Component ID	Notes
AI Factsheets	`factsheet`	To install this service, you must also install one or both of the following services: IBM Knowledge Catalog (`wkc`) Watson Studio (`ws`)
IBM Knowledge Catalog	`wkc`	When you install IBM Knowledge Catalog, the following services are automatically installed: Analytics Engine powered by Apache Spark (`analyticsengine`) Data Refinery (`datarefinery`) If you plan to install the data quality feature, the service automatically installs the following service: DataStage Enterprise (`datastage_ent`) If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (`datastage_ent`).
Watson Machine Learning	`wml`	To use the experiment builder for AutoAI and Federated Learning, you must install the following service: Watson Studio (`ws`)
Watson OpenScale	`openscale`	To install this service, you must have an external database or you must install one of the following services: Db2 (`db2oltp`) Db2 Warehouse (`db2wh`) EDB Postgres (edb_cp4d)
Orchestration Pipelines	`ws_pipelines`
Watson Studio	`ws`	When you install Watson Studio, the following services are automatically installed: Data Refinery (`datarefinery`) Watson Studio Runtimes (`ws_runtimes`) If you complete the following actions for Watson Studio, the Data Refinery objects and Watson Studio Runtimes objects are automatically included: Mirror images Create OLM artifacts Create the operands

Multicloud Data Integration

Service Component ID Notes

Service	Component ID	Notes
Data Virtualization	`dv`	When you install Data Virtualization, the following service is automatically installed: Db2 Data Management Console (`dmc`)
DataStage Enterprise	`datastage_ent`
IBM Knowledge Catalog	`wkc`	When you install IBM Knowledge Catalog, the following services are automatically installed: Analytics Engine powered by Apache Spark (`analyticsengine`) Data Refinery (`datarefinery`) If you plan to install the data quality feature, the service automatically installs the following service: DataStage Enterprise (`datastage_ent`) If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (`datastage_ent`).

Data Virtualization

dv

When you install Data Virtualization, the following service is automatically installed:

Db2 Data Management Console (dmc)

DataStage Enterprise datastage_ent

IBM Knowledge Catalog

wkc

When you install IBM Knowledge Catalog, the following services are automatically installed:

Analytics Engine powered by Apache Spark (analyticsengine)
Data Refinery (datarefinery)

If you plan to install the data quality feature, the service automatically installs the following service:

DataStage Enterprise (datastage_ent)
If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (datastage_ent).

Optional components

If you want to use Quality Stages, such as the Address Verification stage, you can optionally upgrade from DataStage Enterprise to DataStage Enterprise Plus:

Service	Component ID
DataStage Enterprise Plus	`datastage_ent_plus`

Data Management

The Data Management solution includes various data storage options. Choose the components that support your business needs:

Analytics data sources

Service	Component ID
Data Virtualization	`dv`
Db2 Warehouse	`db2wh`

Transactional data sources

Service	Component ID
Db2	`db2oltp`
Informix	`informix_cp4d`

OEM data sources

Service	Component ID
EDB Postgres	`edb_cp4d`
MongoDB	`mongodb_cp4d`

All services

You can install a custom set of services based on your business needs.

Service	Component IDs	Notes
AI Factsheets	`factsheet`	To install this service, you must also install one or both of the following services: IBM Knowledge Catalog (`wkc`) Watson Studio (`ws`)
Analytics Engine powered by Apache Spark	`analyticsengine`	This service is automatically installed if you install IBM Knowledge Catalog (`wkc`). However, you can install Analytics Engine powered by Apache Spark without installing IBM Knowledge Catalog.
Cognos Analytics	`cognos_analytics`
Cognos Dashboards	`dashboard`	If you are upgrading from Version 4.5 or Version 4.6, the `dashboard` component replaces the `cde` component.
Data Gate	`datagate`	To provision an instance of this service, you must install one of the following services: Db2 (`db2oltp`) Db2 Warehouse (`db2wh`)
Data Privacy	`dp`	To install this service, you must install both of the following services: Analytics Engine powered by Apache Spark (`analyticsengine`), which is automatically installed by IBM Knowledge Catalog. IBM Knowledge Catalog (`wkc`)
Data Product Hub	`dataproduct`	When you install Data Product Hub, the following services are automatically installed: Analytics Engine powered by Apache Spark (`analyticsengine`) Data Refinery (`datarefinery`) IBM Knowledge Catalog (`wkc`)
Data Refinery	`datarefinery`	This service is automatically installed if you install one of the following services: Data Product Hub IBM Knowledge Catalog Watson Studio Important: Do not specify the `datarefinery` component to install or upgrade Data Refinery. The component is automatically installed and upgraded by the `dataproduct`, `wkc` or `ws` components. You cannot install or upgrade Data Refinery independently of these services. If you complete the following actions for Data Product Hub, IBM Knowledge Catalog, or Watson Studio, the Data Refinery objects are automatically included: Mirror images Create OLM artifacts Create the operands
Data Replication	`replication`
DataStage Enterprise	`datastage_ent`
DataStage Enterprise Plus	`datastage_ent_plus`
Data Virtualization	`dv`	When you install Data Virtualization, the following service is automatically installed: Db2 Data Management Console (`dmc`)
Db2	`db2oltp`
Db2 Big SQL	`bigsql`
Db2 Data Management Console	`dmc`	This service is automatically installed if you install Data Virtualization.
Db2 Warehouse	`db2wh`
Decision Optimization	`dods`	To install this service, you must install both of the following services: Watson Machine Learning (`wml`) Watson Studio (`ws`)
EDB Postgres	`edb_cp4d` `postgresql`	The `postgresql` component is automatically installed when you install the `edb_cp4d` component.
Execution Engine for Apache Hadoop	`hee`	To install this service, you must install both the following services: Watson Machine Learning (`wml`) Watson Studio (`ws`)
IBM Knowledge Catalog	`wkc`	When you install IBM Knowledge Catalog, the following services are automatically installed: Analytics Engine powered by Apache Spark (`analyticsengine`) Data Refinery (`datarefinery`) If you plan to install the data quality feature, the service automatically installs the following service: DataStage Enterprise (`datastage_ent`) If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (`datastage_ent`).
IBM Knowledge Catalog Premium	`ikc_premium`	When you install IBM Knowledge Catalog Premium, the following services are automatically installed: Analytics Engine powered by Apache Spark (`analyticsengine`) Data Refinery (`datarefinery`) If you plan to install the data quality feature, the service automatically installs the following service: DataStage Enterprise (`datastage_ent`) If your cluster pulls images from a private container registry, ensure that you mirror the images for DataStage Enterprise (`datastage_ent`).
IBM Knowledge Catalog Standard	`ikc_standard`	When you install IBM Knowledge Catalog Standard, the following service is automatically installed: Analytics Engine powered by Apache Spark (`analyticsengine`)
IBM Match 360 with Watson	`match360`
Informix	`informix_cp4d` `informix`	The `informix` component is automatically installed when you install the `informix_cp4d` component.
MANTA Automated Data Lineage	`mantaflow`	To install this service, you must install one of the following services: IBM Knowledge Catalog IBM Knowledge Catalog Premium IBM Knowledge Catalog Standard
MongoDB	`mongodb` `mongodb_cp4d`	The `mongodb` component is automatically installed when you install the `mongodb_cp4d` component.
OpenPages	`openpages`
Orchestration Pipelines	`ws_pipelines`
Planning Analytics	`planning_analytics`
Product Master	`productmaster`
RStudio® Server Runtimes	`rstudio`	To install this service, you must install the following service: Watson Studio (`ws`)
SPSS Modeler	`spss`	To install this service, you must install the following service: Watson Studio (`ws`)
Synthetic Data Generator	`syntheticdata`
Voice Gateway	`voice_gateway`	To install this service, you must install the following services: watsonx Assistant (`watson_assistant`) Watson Speech to Text (`watson_speech`) Watson Text to Speech (`watson_speech`)
Watson Discovery	`watson_discovery`
Watson Machine Learning	`wml`	To use the experiment builder for AutoAI and Federated Learning, you must install the following service: Watson Studio (`ws`)
Watson Machine Learning Accelerator	`wml_accelerator`	To install this service, you must install the following cluster-wide component: Scheduling service (`scheduler`)
Watson OpenScale	`openscale`	To install this service, you must have an external database or you must install one of the following services: Db2 (`db2oltp`) Db2 Warehouse (`db2wh`) EDB Postgres (edb_cp4d)
Watson Speech services	`watson_speech`
Watson Studio	`ws`	When you install Watson Studio, the following services are automatically installed: Data Refinery (`datarefinery`) Watson Studio Runtimes (`ws_runtimes`) If you complete the following actions for Watson Studio, the Data Refinery objects and Watson Studio Runtimes objects are automatically included: Mirror images Create OLM artifacts Create the operands
Watson Studio Runtimes	`ws_runtimes`	The default runtime is automatically installed or upgraded when you install or upgrade Watson Studio. Upgrades If you want to upgrade all existing runtimes automatically when you upgrade Watson Studio, specify the `ws_runtimes` component when you upgrade Watson Studio. If you do not specify the `ws_runtimes` component when you upgrade Watson Studio, only the default runtime is upgraded. You must upgrade the non-default runtimes manually. Fresh installations Do not specify the `ws_runtimes` component when you install Watson Studio. The default runtime is automatically installed when you install Watson Studio. If you want to use non-default runtimes on your environment, you must install them individually. For details on how to install or upgrade non-default runtimes, see Watson Studio Runtimes. If you complete the following actions for Watson Studio, the Watson Studio Runtimes objects are automatically included: Mirror images Create OLM artifacts
watsonx.ai	`watsonx_ai`	When you install watsonx.ai, the following services are automatically installed: Watson Studio (`ws`) Watson Machine Learning (`wml`)
watsonx Assistant	`watson_assistant`
watsonx Code Assistant for Red Hat Ansible® Lightspeed	`wca_ansible`	When you install watsonx Code Assistant for Red Hat Ansible Lightspeed, the following services are automatically installed: watsonx.ai (`watsonx_ai`)
watsonx Code Assistant for Z	`wca_z`
watsonx Code Assistant for Z Code Explanation	`wca_z_ce`
watsonx.data	`watsonx_data`	When you install watsonx.data, the following services are automatically installed: Analytics Engine powered by Apache Spark (`analyticsengine`)
watsonx.governance	`watsonx_governance`	When you install watsonx.governance, the following services are automatically installed: Watson Machine Learning (`wml`)
watsonx Orchestrate	`watsonx_orchestrate`	When you install watsonx Orchestrate, the following services are automatically installed: watsonx Assistant (`watson_assistant`)

What to do next

Now that you've determined which components to install, you're ready to complete Setting up installation environment variables.