IBM Cloud Pak for Data Version 5.0 includes new features for the platform and for many services. Version 5.0 includes focused end-user experiences for customers who install multiple
solutions on the Cloud Pak for Data control plane,
context-aware documentation recommendations, HTTP proxy support, and a standard method for storing
and accessing custom certificates. Version 5.0 also includes expanded
deployment environment options and the ability to run workloads on remote physical locations.
This release includes enhancements to existing services, new services, and changes to how
services are delivered.
New services
Data Product Hub enables teams to
share governed data assets so that teams can easily access the data that they need.
IBM Knowledge Catalog Standard
offers basic governance tooling for cataloging and AI-augmented data enrichment.
IBM Knowledge Catalog Premium
includes the full governance framework with data privacy, data quality, cataloging, and enrichment
across the data lifecycle with a generative AI layer for enhanced data enrichment.
watsonx Code Assistant for Red Hat® Ansible® Lightspeed helps automation teams create, adopt, and maintain Ansible content more efficiently.
watsonx Code Assistant for Z helps
developers modernize their mainframe applications using a combination of automation and generative
AI.
Renamed services
Db2 Data Gate is now Data Gate
Watson Pipelines is now Orchestration Pipelines
Watson Query is now Data Virtualization
For more information, review the information in the following sections:
The following table lists the new features that were introduced in Cloud Pak for Data Version 5.0.
What's new
What does it mean for me?
Focused experiences for environments with multiple
solutions
IBM Cloud Pak for Data is the
foundation for multiple solutions. Starting in Version 5.0, Cloud Pak for Data includes multiple experiences. The experiences
that are available in your environment depend on the services that you install.
An
experience provides focused access to the tools that you need to complete specific
tasks. For example:
In the Data Product Hub experience, teams can focus
on publishing and sharing data products.
In the watsonx experience,
teams can focus on training, validating, tuning, and deploying generative AI solutions.
Each experience has a dedicated home page. The cards that are available on the home page
help you get started with the solution and give you quick access to the tools that you
need.
Run Cloud Pak for Data workloads on remote clusters
By default, an instance of IBM Cloud Pak for Data runs in a set of projects (namespaces) on
a single Red Hat
OpenShift® Container Platform cluster.
Starting in Cloud Pak for Data Version 5.0, you can expand your Cloud Pak for Data deployment by installing IBM Cloud Pak for Data agents on a remote cluster to create
remote physical locations.
After you set up a remote physical location, you can
register the physical location with the instance of Cloud Pak for Data that you want to expand. Then, you can add the
physical location to a data plane. A data plane is a logical grouping of one or more
physical locations. Users can deploy workloads to a data plane. The workload will be scheduled on
one of the physical locations associated with the data plane.
New privileged monitors provide more insight into your
cluster health
The privileged monitoring service includes new monitors that give
Cloud Pak for Data administrators more insight into the
health of the cluster where Cloud Pak for Data is deployed:
Cluster operator status check (check-cluster-operator-status)
Checks the status of the cluster operators that comprise the Red Hat
OpenShift Container Platform infrastructure to determine whether:
All of the operators are AVAILABLE
Any of the operators are DEGRADED
Network status check (check-network-status)
Checks the status of the PodNetworkConnectivityCheck objects for cluster
resources to determine whether the objects are Reachable.
Node imbalance status check (check-node-imbalance-status)
Checks whether vCPU requests are balanced across nodes or whether one node is supporting a
disproportionately high load.
You can send telemetry data to IBM Software Central, which enables you
to view license data and entitlements across your hybrid cloud deployments in one place. IBM Software Central helps you:
Remain compliant with your license agreements.
Get insight into your use so that you can more accurately predict future software needs and
spending.
Previously, each service used a different method for using custom
certificates. Cloud Pak for Data now has a standard
method for using custom certificates.
If you install the Cloud Pak for Data configuration admission controller, you can
create a secret that contains a set of custom certificates that you can use across multiple
services. After you create the secret, you can inject the secret into Cloud Pak for Data pods so that they have access to the custom
certificates.
You can open the Assist me panel in the Cloud Pak for Data web client to get context-aware documentation
recommendations. Assist me uses embedded
keyword searches to find relevant documentation on ibm.com. You can also use
Assist me to run your own searches.
To
get started with Assist me, click
Assist me in
the web client toolbar.
Validate network connectivity
You can use the cpd-clihealthnetwork-connectivity command to run network health checks
on resources in your Cloud Pak for Data deployment after
installation, before upgrading, and after upgrading.
You can also run the network-connectivity command as part of the runcommand command.
The IBM Cloud Pak for Data control plane
and select services are now available in Polish. For more information, see Language support.
Service enhancements
The following table lists the new features that are introduced for existing services in Cloud Pak for Data Version 5.0:
Software
Version
What does it mean for me?
Cloud Pak for Datacommon core services
9.0.0
This release of Common core services includes the following features:
Use data source definitions to manage and protect data that is accessed from connections
Data source definitions are a new type of asset that you define based on a connection or
connected data asset's endpoints. When you create a data source definition, you can monitor where
your data is stored across multiple projects, catalogs, or multi-node data sources. You can also
apply the correct protection solution (enforcement engine) based on the data source
definition.
The "IBM Watson Query" connection is renamed to "IBM Data Virtualization." Your previous
settings for the connection remain the same. Only the connection name is changed.
New menu terms to open the Platform connections page
Previously the path to the Platform connections page in the navigation
menu was Data > Platform
connections. The new path is Data
> Connectivity. The
Connectivity page has a tab for Platform
connections.
Access more data with new connectors
You can now work with data from these data sources:
Flight service supported by watsonx.data and Data Product Hub
You can now use Flight service with watsonx.data and Data Product Hub to load data securely. To see a complete list of
services that support the Flight service APIs, see
Accessing data sources with
the Flight service.
Version 9.0.0 of the common core services includes various fixes.
If you install or upgrade a service that requires the common core services, the common core services will also be installed or upgraded.
Cloud Pak for Datascheduling service
1.25.0
This release of scheduling service includes the following features and updates:
Schedule workloads on remote physical locations
If you plan to extend your Cloud Pak for Data deployment
with remote physical locations, you must install the scheduling service on the primary Cloud Pak for Data cluster and on the remote physical location. For
more information, see:
This release of Analytics Engine powered by Apache Spark includes the following features:
Use the Sparks labs IDE to develop or debug your own applications
Now you can develop or debug and run your own applications in the new Spark labs IDE, which is
installed as a Visual Studio Code extension. For more information, see Running Spark applications
interactively.
Instance credentials are now masked for better security
The credentials present in Spark Configurations and Environment Variables are automatically
masked to improve security. By default, all V4 APIs mask all the secrets and credentials passed in
Spark configurations and Environment variables in the Instance and
Application APIs. The change is visible from the Instance
Details page.
Updated GET API now returns all applications
The Get Applications List API for Analytics Engine now returns all applications
by default to allow pagination. You can use the query parameter state and
pagination queries to filter the applications in the API.
Data retention feature for administrators
OCP administrators can now retain or delete a designated number of Spark applications and
kernels that are associated with a specific Spark instance from the IBM Analytics Engine metastore.
Auto-scaling Spark workloads
You can now enable the auto-scaling feature for a Spark application by adding the configuration
setting ae.spark.autoscale.enable=true to the existing application
configuration.
A Spark application that has auto-scaling enabled can automatically determine the
number of executors required by the application based on the application's usage.
Separate storage for shuffle data
You can now store shuffle data separately from the compute nodes. Separate storage allows for
more efficient resource utilization. Data is stored in a separate shared volume or object store. For
more information, see Running Spark applications interactively.
Version 5.0.0 of the Analytics Engine powered by Apache Spark service includes various fixes.
This release of Cognos Analytics
includes the following features and updates:
Integration with Planning Analytics
You can now create data server connections to Planning Analytics service instances that are running on Cloud Pak for Data. For details, see Support for Planning Analytics as a Service in the Cognos Analytics documentation.
Cognos Analytics uses CA certificates to connect
You can now use your company's CA certificates on Cloud Pak for Data to validate certificates from your internal
servers and connect to Cognos Analytics. Previously, you
had to manually copy the certificates to the artifacts shared volume before you could use them to
connect to Cognos Analytics. For details, see Creating a secret to
store shared custom certificates.
Audit logging
Cognos Analytics now has auditable events that are
generated and forwarded by the Audit Logging Service to help you detect and prioritize security
threats and data breaches. For details, see Audit events for Cognos Analytics.
The 26.0.0 release of the service provides Version 12.0.3 of the
Cognos Analytics software. For details, see Release 12.0.3 - New and changed features in the Cognos Analytics documentation.
Version 26.0.0 of the Cognos Analytics service includes various fixes.
This release of Cognos Dashboards
includes the following features and updates:
Updated software version
This release of the service provides Version 12.0.3 of the Cognos Analytics dashboards software. For details, see Release 12.0.3 - Dashboards in the Cognos Analytics documentation.
Version 5.0.0 of the Cognos Dashboards service includes various fixes.
This release of Data Gate
includes the following features and updates:
Show the RUNSTAT timestamp for target tables in the web UI
In the Data Gate web UI, you can now view timestamps that show when the most recent RUNSTAT
operation occurred for each target table. With this information, you can check to ensure that
RUNSTAT is running as expected on the target tables.
Version 6.0.0 of the Data Gate service includes various fixes.
This release of Data Privacy
includes the following features and updates:
Data protection rules no longer enforced in projects
Data protection rules are now only enforced either in governed catalogs or by a deep enforcement
solution. A deep enforcement solution is a protection solution to enforce rules on data
that is outside of Cloud Pak for Data when the data source
is integrated with one of these services:
IBM Data Virtualization
IBM Security Guardium Data Protection
IBM
watsonx.data
Assets that are added into projects from a governed catalog no longer have preview, download or
profiling restricted by data protection rules unless you have configured a deep enforcement
solution.
You will be reminded of the revised data protection rule enforcement protocols when you:
Create a data protection rule
Copy an asset from a governed catalog into a project
Defining a data source definition with a protection solution
A protection solution is a method of enforcing the data protection rules either in
governed catalogs or by a deep enforcement solution.
To configure the platform with a deep enforcement solution, you can create a data source
definition to set the data source type. The data source type determines which
types of connections the data source definition can be associated with and your available protection
solution options. For details, see Protection solutions for data source definition.
Tracking data protection rule enforcement decisions
You can now track enforcement decisions as audit events when the Send policy
evaluations to audit logs check box is selected from the Manage rule
settings page. For details, see Audit events
for Data Privacy.
Version 5.0.0 of the Data Privacy service includes various fixes.
This release of DataStage includes the following features and
updates:
Run DataStage jobs in multiple locations
with a remote data plane
You can now deploy on a remote data plane to run DataStage jobs in multiple locations, including in
different geographies or cloud providers, without creating multipleDataStage instances. For more information, see Deploying on a remote data
plane.
Import and export selected asset types
You can now select specific asset types to import or export from a .zip file that contains
DataStage assets. By default, all asset types
are selected.
Set up metrics storage at the project level for your DataStage flows
Name changes for DataStage connections
and connectors
"Apache Cassandra (optimized)" is now "Apache Cassandra for DataStage."
"IBM Db2 (optimized") is now "IBM Db2 for DataStage."
"IBM Netezza Performance Server (optimized)" is now "IBM Netezza Performance Server for
DataStage."
"IBM Watson Query" is now "IBM Data Virtualization."
"Oracle (optimized)" is now "Oracle Database for DataStage."
"Salesforce.com (optimized)" is now "Salesforce API for DataStage."
"Teradata (optimized)" is now "Teradata database for DataStage."
Your previous settings for the connections, connectors, and their associated jobs remain the
same. Only the connection and connector names are changed.
Connect to more data sources in DataStage
You can now include data from these data sources in your DataStage flows:
This release of Data Virtualization includes the following features and updates:
Watson Query is now Data Virtualization
The Watson Query service was renamed to Data Virtualization, and you will notice some changes in the user
interface. The IBM Data Virtualization connector is also
renamed to IBM Data Virtualization. Your previous
settings for the connector remain the same. Only the connector name is changed.
Enforce data protection rules across Cloud Pak for Data
You can now use the new Cloud Pak for Data Data Source
Definitions (DSD) to enforce IBM Knowledge Catalog data
protection rules consistently across Cloud Pak for Data,
regardless of whether you query the object through Data Virtualization or preview it in a catalog or project. A DSD
is automatically created when you provision or upgrade your Data Virtualization instance to Cloud Pak for Data 5.0. For details, see Data protection with data
source definitions. See also Governing virtual data with data protection rules in Data Virtualization.
New supported data source
REST API is now a supported data source in Data Virtualization.
REST API is a generic third-party data source that you access by using an API. This type of data
source requires that you first create a Model file to map the API outputs to table structures in
Data Virtualization.
Generic JDBC driver functionality now supports Databricks using the native driver.
Spark SQL is a third-party data source that
has two authentication options to set a connection: username and password credentials or Kerberos authentication.
Pushdown enhancements to improve query performance
This release of Data Virtualization improves the
performance of queries that use pushdown. Query pushdown is an optimization feature that reduces
query times and memory use. Data Virtualization now
includes the ability to:
Support OLAP functions when you connect to Oracle data sources. This support includes functions
MIN, MAX, SUM, COUNT, COUNT_BIG, ROW NUMBER/ROWNUMBER, RANK, DENSERANK, DENSE_RANK, STDDEV_SAMP,
PERCENTILE_CONT, PERCENTILE_DISC, and PERCENT_RANK when used in the query with the OLAP function
specification. For details, see OLAP specification in the IBM
Db2 documentation.
Common subexpression pushdown to Oracle
data sources.
Use pushdown for various other string functions, including CASTs, TRIM, BITAND, and others.
Query tables from previous Presto and
Databricks catalogs with multiple catalog
support
Virtual tables that you create from Presto
and Databricks catalogs are now fully
accessible. You can run queries on these tables regardless of any changes that you make to the
catalog filters. This means that you do not need to switch back to previous Presto or Databricks catalogs to ensure the functionality of
existing queries. For details on supported data sources, see Supported data sources in
Data Virtualization.
Automatically scale Data Virtualization
instances
You can now automatically scale Data Virtualization
instances to support high-availability or increase processing capacity, rather than manually setting
the size, CPU, and memory resource values after you provision instances. For details, see Scaling Data Virtualization.
Mask multibyte characters for enhanced privacy of sensitive data
You can now perform partial redaction and basic obfuscation of multibyte characters such as
symbols, characters from non-Latin alphabets like Chinese or Arabic, and special characters that are
used in mathematical notation. The rest of the masking methods that involve multibyte characters are
masked with the character “X”. For details, see Masking virtual data in Data Virtualization.
View the data protection rules that are applied to a user
You can now view details about the data protection rules that apply to a Data Virtualization object for a specific user by using the
EXT_AUTHORIZER_EXPLAIN stored procedure. For details, see EXT_AUTHORIZER_EXPLAIN
stored procedure.
Data Virtualization connections in catalogs now
reference the platform connection
When you publish objects to a catalog, the Data Virtualization connections that are created from that
publication now reference the main Data Virtualization
connection in Platform connections. This means that information such as personal credentials only
needs to be defined or updated one time in the Data Virtualization platform connection. All referenced
connections now automatically reflect changes that are made to the main Data Virtualization connection.
Enhanced catalog visibility for Presto and
Databricks
The Presto and Databricks web client now displays the name of the
catalog that you selected in the breadcrumbs of the Explore view, and beside each schema name in the
List view.
Enhanced security for profiling results in Data Virtualization views
To prevent unexpected exposure to value distributions through the profiling results of a view,
all users are denied access to profiling results in Data Virtualization views in all catalogs and projects.
Version 3.0.0 of the Data Virtualization service includes various fixes.
Capabilities that are provided by the Cloud Pak for Data
platform now make it easier for you to enable TLS and connect Db2
Big SQL instances to TLS-enabled Hadoop clusters. For details, see Connecting to a TLS (SSL)-enabled Hadoop cluster.
Version 7.7.0 of the Db2
Big SQL service includes various fixes.
This release of Decision Optimization
includes the following features:
Easier table selection and configuration options when saving Decision Optimization models for
deployment
When you save a model for deployment from the Decision Optimization user interface, you can now review the input and
output schema, and more easily select the tables that you want to include. You can also add, modify
or delete run configuration parameters, review the environment, and the model files used.
Download intermediate solution statistics for Decision Optimization
If you choose to display intermediate solutions in your run configuration, you can now download
the statistics when a Decision Optimization solve is completed. You can view these statistics
locally and compare them with other model solutions. You can also view the last 3 intermediate
solution KPIs in the Explore solution view.
In addition to Python 3.10, you can now use Python 3.11 in your Decision Optimization environment to run and deploy Decision Optimization models that are formulated in
DOcplex in Decision Optimization experiments.
Modeling Assistant models also use Python because DOcplex code is generated when
models are run or deployed.
This release of IBM Knowledge Catalog includes the following features and
updates:
Additional IBM Knowledge Catalog editions
You can continue to use the classic IBM Knowledge Catalog service, or you can choose one of the two
new, separately priced editions of IBM Knowledge Catalog:
IBM Knowledge
Catalog Standard Cartridge
This edition offers basic governance tooling for cataloging and AI-augmented data
enrichment.
IBM Knowledge
Catalog Premium Cartridge
This edition offers the full governance framework with data privacy, data quality, cataloging,
and enrichment across the data lifecycle with a generative AI layer for enhanced data
enrichment.
In addition to governance capabilities as in the classic IBM Knowledge Catalog service, the cartridges provide semantic
and AI-augmented data enrichment:
Recommend descriptive names for data assets and columns based on the collected metadata and a
predefined glossary.
Suggest and assign semantic descriptions for data assets and columns that are easy to
understand. The descriptions are generated based on the surrounding columns and the context of the
data assets.
Generate semantic term assignments for data assets and columns.
You can now add data assets or columns with the new relationship type Validates data
quality of to any type of data quality rule to have the quality score and any data
quality issues reported for this item on the Data quality page. With this
enhancement, data quality rules with externally managed bindings and SQL-based data quality rules
can now also contribute to the quality scores of assets and columns.
Data protection rules are no longer enforced in projects
Data protection rules are now only enforced in governed catalogs or by a deep enforcement
solution. Assets that are added into projects from a governed catalog no longer have preview,
download, or profiling restricted by data protection rules. For more information, see Data protection rules no longer enforced in projects.
Enhanced project list view in catalogs
Now, when you are adding assets from a catalog to a project, you can view more than 100 projects
in your project list page and add up to 50 assets at a time to your project. For more information,
see Add assets from within the catalog.
Enhancements in governance artifacts
You can now make changes to multiple governance artifacts at once. Bulk edits are available when
updating tags and stewards. For more information, see Managing governance artifacts.
Now you can move any category either to the top level or to any other category as a
sub-category. The collaborators are also moved provided they have required permissions on the new
parent category. For more information, see Managing categories.
You can now add custom properties and relationships for reference data sets. For more
information, see Designing reference data sets.
Notifications about changes in governance artifacts, for example, when an artifact is added,
updated, or deleted, can now be forwarded to external applications or users. For more information,
see Forwarding notifications generated by Cloud Pak for Data
services.
Knowledge Accelerators
Additional data classes
There are over 20 new data classes that can be used to identify and classify national
identifiers, tax identifiers and social security identifiers for the additional jurisdictions of
Argentina, Egypt, Finland, Greece, Hong Kong, Ireland, Malaysia, New Zealand, Pakistan, Peru,
Romania, Thailand, Turkey, and United Arab Emirates.
These new data classes supplement previously added data classes to provide an enhanced framework
for identifying and classifying data of particular relevance to data privacy.
The Knowledge Accelerators contain a set of predefined business scopes that group the set of
business terms that are relevant to a specific business topic. Many of these scopes were reorganized
to ensure that they are optimized for viewing in the new Relationship Explorer capability of IBM
Knowledge Catalog. Also, new business scopes were added to Financial Services.
In addition, certain term-to-term relationships across the Knowledge Accelerators were simplified
to improve clarity when viewing them in Relationship Explorer.
Relationship Explorer is now available to help better understand your data. This new feature
helps you to visualize, explore and govern your metadata. Discover how your governance artifacts and
data assets relate with each other in a single view. For more information, see Relationship
Explorer.
Expand DataStage jobs in the lineage
graph
When you are viewing a DataStage job in
the lineage graph, you can expand the job to view all its stages. For more information, see Lineage.
Enhanced security for profiling results in Data Virtualization and watsonx.data views
To prevent unexpected exposure to value distributions through the profiling results of a view,
all users are denied access to profiling results in Data Virtualization and watsonx.data views in all catalogs and
projects.
Version 5.0.0 of the IBM Knowledge Catalog service includes various fixes.
This release of IBM Match 360 includes the following features and updates:
Use mapping patterns to avoid manually mapping new assets
Now IBM Match 360 alerts you when a new data
asset is similar to an existing data asset that is already mapped to your data model. You can save
time and avoid manual mapping by using a mapping pattern to map new data assets that share the same
structure as an existing, mapped asset. Mapping patterns are automatically created from the mapped
data assets in your system. You can manage and apply mapping patterns within your configuration
snapshots.
Figure 1. Using mapping patterns to quickly map assets
This release of Orchestration Pipelines includes the following features:
IBM Watson Pipelines is now IBM
Orchestration Pipelines
The new service name reflects the capabilities for orchestrating parts of the AI lifecycle into
repeatable flows.
Migrate DataStage dependencies
With one click in the toolbar, you can now download or upload projects that contain DataStage dependencies. See Running and
saving pipelines.
Expanded toolkit for annotation styling
You can now apply more style and formatting options to your pipeline comments and annotations.
You can specify font, text color, formatting, and more with HTML or CSS styling. You can also use
more Markdown attributes such as marked text. HTML or CSS styled annotations in Orchestration Pipelines flows are preserved when exported, imported, or
migrated from a DataStage flow. See Getting
started with Orchestration Pipelines for more details.
You can now increase or decrease heights or widths of Orchestration Pipelines nodes by dragging the corners with your mouse. See
Getting started with Orchestration Pipelines for more
details.
Better visualization of conditions with color-coded links
Links now support custom color-coding to view all nodes' status with improved visual
organization. See Creating a pipeline.
Easily merge links between nodes
When you delete a node, its previous or sequential links to other nodes are automatically merged
or deleted. See Creating a pipeline.
Pipeline assets available to add to folders
Orchestration Pipelines flows are now available as assets that
can be added to folders in a Watson Studio
project for better organization and access. Folders are in beta and are not yet supported for use in
production environments. For more information, see Organizing assets with folders (beta).
Version 5.0.0 of the Orchestration Pipelines service includes various fixes.
This release of Planning Analytics
includes the following features and updates:
Integration with Cognos Analytics
You can now create data server connections from Cognos Analytics service instances that are running on Cloud Pak for Data. For details, see Support for Planning Analytics as a Service in the Cognos Analytics documentation.
This release of RStudio Server Runtimes includes the following features:
New Runtime 24.1 for R
You can now use Runtime 24.1, which includes the latest data science frameworks on R 4.3, to run
your code in RStudio. For details on all the available environments, see RStudio
environments.
RStudio with Runtime 23.1 on R 4.2 is supported on IBM Power
The RStudio with Runtime 23.1 on R 4.2 runtime is now supported on the IBM Power (ppc64le) platform. For more information, see
RStudio
environments
Version 9.0.0 of the RStudio Server Runtimes service includes various fixes.
Define splitting points for decision trees in CHAID nodes
You can now customize the properties for the CHAID node to specify the fields that the CHAID
algorithm must choose from when it determines where to split the decision tree. Specifying fields
can control how the decision tree grows by reducing the number of possible splitting points present
in the data. For more information about the CHAID node, see CHAID node.
You can also set the properties for the CHAID node by using Python scripts in SPSS Modeler or the
scripting API for SPSS Modeler. For more information about the node parameters, see chaidnode properties.
Pick actions for nodes from the new context toolbar
A new context toolbar appears when you hover over a node. It shows the most commonly used actions
specific to each type of node, such as graph nodes, import nodes, and modeling nodes. More actions
are available from the overflow menu.
Script diagnostic messages
You can now create a script that uses the new report API method to generate notifications
for error, warning, and information messages about SPSS Modeler flows. These notifications appear in
Messages as well as the Run history.
This release of Watson Discovery includes the following features and updates:
Quickly understand the extracted data with the new Intelligent Document Processing (IDP) project
type
You can now use the new IDP project type to quickly understand what data is extracted from your
documents in a rich document preview. If the extracted data does not meet your requirements, you can
also apply enrichments to improve the data. For details, see Creating projects.
You can now use an API endpoint to send a webhook event for documents ingested in Watson Discovery
The Create collection and Update collection APIs can now send a webhook event to an external application when the
status of ingested documents becomes available or failed. The webhook event allows you to take the
next relevant action on your documents, without getting the document status first with the Get
document details API. For details, see Document status webhook.
You can now use an API endpoint to annotate documents with a model of your choice
The Create an enrichment API can now connect to an external enrichment
application by using a webhook. The new API allows you to use a model of your choice to annotate
documents in Watson Discovery. Through a webhook
interface, you can use custom models, advanced foundation models, or other third-party models to
enrich your documents in a collection. For details, see External enrichment.
Watson Discovery no longer requires increasing
the process ID limit
Starting in Cloud Pak for Data5.0.0, you do not have to increase the process ID limit on your Red Hat
OpenShift Container Platform environment for using Watson Discovery.
Version 5.0.0 of the Watson Discovery service includes various fixes.
This release of Watson Machine Learning
includes the following features:
Forecast more steps with an AutoAI time series model
You can now increase the prediction horizon for a time series model created with AutoAI. For
example, if your model forecasts weather, you can now predict more steps, such as hours or days,
with your model. For more information, see Scoring a time series model.
Presto is now available as a data connection for AutoAI models
You can now connect to Presto as a data source for training an AutoAI experiment when deploying
an AutoAI model. For more information, see Auto AI overview.
Deploy assets with Runtime 24.1
You can now create assets that use software specifications that are compatible with IBM Runtime
24.1. For more information, see Frameworks and software specifications.
Deploy traditional and generative AI assets with the watsonx.ai Python client library
This release of Watson Machine Learning Accelerator
includes the following features:
New deep learning libraries
You can now use the following deep learning libraries with Watson Machine Learning Accelerator:
Python 3.11.5
PyTorch 2.1.2
Tensor Flow 2.14.1
NVIDIA CUDA Toolkit 12.2.0
If you have existing models, update and test your models to use the latest supported
frameworks. For more information, see Supported deep learning frameworks in the Watson Machine Learning Accelerator
documentation.
New NVIDIA GPU Operator version
You can now use the following deep learning libraries with Watson Machine Learning Accelerator:
Version 24.3.0
Version 5.0.0 of the Watson Machine Learning Accelerator service includes various fixes.
This release of Watson
OpenScale includes the following features and updates:
New quality metric for binary classification models
You can now configure the gini coefficient metric when you run quality evaluations for binary
classification models. The gini coefficient metric measures the inequality of model distributions.
This release of Watson Speech services includes the following features and updates:
Acoustic model component scaling
By default, the AM-patcher microservice remains scaled down when no training is in progress, to
optimize cluster resource usage and allocation. You can now use the speech-cr
custom resource file to scale the component into different sizes: small, medium, or large. See how
to update that Custom Resource (CR) file at Sizing for acoustic model training.
Version 5.0.0 of the Watson Speech to Text service includes various fixes.
This release of Watson Studio includes the following features:
Tag projects for easy retrieval
You can now assign tags to projects to make them easier to group or retrieve. Assign tags when
you create a new project or from the list of all projects. Filter the list of projects by tag to
retrieve a related set of projects. For more information, see Creating a
project.
Version 9.0.0 of the Watson Studio service includes various fixes.
This release of Watson Studio Runtimes includes the following features:
Runtime 24.1 is now available for use with Python and R
You can now use Runtime 24.1, which includes the latest data science frameworks on Python 3.11
and on R 4.3, to run your code in Watson Studio Jupyter notebooks and in RStudio. For more
information about the available environments, see Environments
.
A new version of Jupyter notebooks editor is now available
If you're running your notebook in environments that are based on Runtime 23.1 and 24.1, you
can now:
Automatically debug your code
Automatically generate a table of contents for your notebook
Toggle line numbers next to your code
Collapse cell contents and use side-by-side view for code and output, for enhanced
productivity
This release of watsonx.ai includes the following features:
Red Hat
OpenShift AI is now a prerequisite for watsonx.ai
Watsonx.ai now requires Red Hat
OpenShift AI to be installed as a prerequisite foundation
layer on the cluster. Red Hat
OpenShift AI provides
enhanced support for serving generative AI models and improving the efficiency of prompt tuning.
IBM text embedding support for enhanced text matching and retrieval
You can now use the IBM text embeddings API and IBM embedding models for transforming input text
into vectors to more accurately compare and retrieve similar text. You can use the following IBM
Slate embedding models:
slate-125m-english-rtrvr
A foundation model provided by IBM that generates embeddings for various inputs such as queries,
passages, or documents. The training objective is to maximize cosine similarity between a query and
a passage.
slate-30m-english-rtrvr
A foundation model provided by IBM that is trained to maximize the cosine similarity between two
text inputs so that embeddings can be evaluated based on similarity later. The
slate-30m-english-rtrvr model is a distilled version of the
slate-125m-english-rtrvr model.
Use training data from connected data sources in Tuning Studio
You can now train your foundation models in Tuning Studio by importing training data from a
separate data source by using a data connection asset. You can use the following data connection types:
You can now use the following foundation models for inferencing from the Prompt Lab in watsonx.ai:
allam-1-13b-instruct
A bilingual large language model for Arabic and English provided by the National Center for
Artificial Intelligence and supported by the Saudi Authority for Data and Artificial Intelligence.
You can use the allam-1-13b-instruct foundation model for general purpose tasks in
the Arabic language, such as classification, extraction, question-answering, and for language
translation between Arabic and English.
granite-7b-lab
A foundation model from the IBM Granite family that is tuned with a novel alignment tuning
method from IBM Research.
llama-3-8b-instruct
An accessible, open large language model provided by Meta that contains 8 billion parameters and
is instruction fine-tuned to support various use cases.
llama-3-70b-instruct
An accessible, open large language model provided by Meta that contains 70 billion parameters
and is instruction fine-tuned to support various use cases.
merlinite-7b
A foundation model provided by Mistral AI and tuned by IBM. The merlinite-7b
foundation model is a derivative of the Mistral-7B-v0.1 model that is tuned with a novel alignment
tuning method from IBM Research.
mixtral-8x7b-instruct-v01
A foundation model that is a pre-trained generative sparse mixture-of-experts network provided
by Mistral AI.
Work with InstructLab foundation models in Prompt Lab
InstructLab is an open-source initiative by Red
Hat and IBM that provides a platform for augmenting
the capabilities of a foundation model. The following foundation models support knowledge and skills
that are contributed from InstructLab:
Create detached deployments for external prompt templates
You can now deploy a prompt template for an LLM hosted by a third-party provider, such as Google
Vertex AI, Azure OpenAI, or AWS Bedrock. Use the deployment to explore evaluations for the output
generated by the detached prompt template. You can also track the detached deployment and detached
prompt template in an AI use case as part of your governance solution. See Creating a detached
deployment for an external prompt.
Use the Node.js SDK to add generative AI function to your applications
This beta release of the Node.js SDK helps you to do many generative AI tasks programmatically,
including inferencing foundation models. For more information, see Node.js SDK.
Version 9.0.0 of the watsonx.ai service includes various fixes.
This release of watsonx Assistant includes the following features:
Conversational search
The new conversational search feature has a built-in retrieval-augmented generation(RAG)
solution that helps your watsonx Assistant to extract
an answer from the highest-ranked query results and returns a text response to the user. For more
information, see Conversational search in the watsonx Assistant documentation.
Integration of Elasticsearch to the search feature
You can now integrate Elasticsearch to the search feature in your watsonx Assistant. With Elasticsearch, your watsonx Assistant can perform different types of searches such
as metric, structured, unstructured, and semantic with higher accuracy and relevance by making use
of enterprise content. The data analytics engine in Elasticsearch expands the scope of search
integration to larger data sets in watsonx Assistant.
For more information about Elasticsearch search integration, see Elasticsearch search integration setup in the watsonx Assistant documentation.
Behavioral tuning for conversational search
You can now optimize your conversational search behavior with the Tendency to say “I
don’t know” option in the conversational search settings. This option can help to reduce
Large Language Model (LLM) hallucinations and provide higher fidelity answers for conversational
search by tuning your assistant's tendency to fall back to the “I don’t know” answer. For more
information, see Behavioral tuning in the watsonx Assistant documentation.
Streaming response for conversational search
You can now use streaming response in your watsonx Assistant for conversational search. With the help of
watsonx.ai capabilities, streaming
response can provide continuous and real-time responses. For more information, see Streaming response in the watsonx Assistant documentation.
Overwrite all or skip all when you copy actions to another watsonx Assistant
You can now choose to overwrite all references or skip all references when you copy actions from
one watsonx Assistant into another. For more
information, see Copying an action to another assistant in the watsonx Assistant documentation.
Add a custom result filter for the Watson Discovery search integration
You can now filter your search result in the Watson Discovery search integration by adding custom text
strings in the Custom result filter field in Search integration.
For more information, see Configure the search for Watson Discovery in the watsonx Assistant documentation.
Configure search routing
You can configure the search routing for your watsonx Assistant when no matches are available for the
customer response. For more information, see Configuring the search routing when no action matches in the watsonx Assistant documentation.
Conversational skills
You can now use conversational skills in your watsonx Assistant to begin tasks or workflows. You must
register a pro code conversational skill provider on your watsonx Assistant instance and begin building skill-backed
actions to fit your use cases. For more information, see Conversational
skills API documentation.
Service monitors
Your watsonx Assistant can now use service
monitors to monitor the health of your watsonx Assistant instances. For more information, see Installing service monitors.
Version 5.0.0 of the watsonx Assistant service includes various security
fixes.
New page for Bring Your Own JAR (BYOJ) process for SAP HANA data source
Users can now use a new dedicated section Driver manager under new
Configurations page to manage drivers for SAP HANA data source. Each of these
drivers undergo a series of validation.
In addition to registering external Spark engines, you can now provision native Spark engine in
watsonx.data. With native Spark engine,
you can manage Spark engine configuration, manage access to Spark engines, and view applications by
using REST API endpoints from watsonx.data.
You can now use Query Optimizer, to improve the performance of queries that are processed by the
Presto (C++) engine. If Query
Optimizer determines that optimization is feasible, the query undergoes rewriting; otherwise, the
native engine optimization takes precedence.
The mixed case feature flag, which allows to switch between case sensitive and case insensitive
behavior in Presto (Java), is
available. The flag is set to OFF by default and can be set to ON during the deployment of watsonx.data.
Semantic automation for data enrichment leverages generative AI with IBM Knowledge Catalog to
understand your data on a deeper level and enhance data with automated enrichment to make it
valuable for analysis.
This release of watsonx.governance includes the following features:
Assess use cases for EU AI Act applicability
By using the new EU AI Act applicability assessment, you can complete a simple questionnaire to
assess your AI use cases and determine whether they are within the scope of the EU AI Act. The
assessment can also help you to identify the risk category that your use cases align to:
prohibited, high, limited, or
minimal.
Create detached deployments for governing prompts for externally hosted large language models
(LLMs)
A detached prompt template is a new asset for evaluating a prompt template for an LLM that is
hosted by a third-party provider, such as Google Vertex AI, Azure OpenAI, or AWS Bedrock. The
inferencing that generates the output for the prompt template is done on the remote model, but you
can evaluate the prompt template output by using watsonx.governance metrics. You can also track the
detached deployment and detached prompt template in an AI use case as part of your governance
solution.
When you evaluate prompt templates in your watsonx.governance deployment spaces or projects, you can
now run generative AI quality evaluations to measure how well your model performs
retrieval-augmented generation (RAG) tasks with the following new metrics:
Faithfulness
Answer relevance
Unsuccessful requests
Results from these new evaluations are captured in factsheets in AI use cases.
This release of watsonx Orchestrate includes the following features and updates:
New AI-assisted conversational skills
Now, you can integrate any of your watsonx Orchestrate skills with a conversational based AI
assistant. You can use this conversational skill with input and it starts the action on your
requested tasks. Your administrator must connect apps on team skill sets from AI assistants. Then,
builders can use the AI assistant builder to create actions to be used in AI assistants. For
details, see Conversational skills.
Conversational search
The new conversational search feature is watsonx.ai powered built-in retrieval-augmented
generation (RAG) solution that helps your watsonx Orchestrate to extract an answer from the highest-ranked
query results and returns a text response to the user. For details, see Conversational search in the watsonx Assistant documentation and watsonx Orchestrate
features documentation.
Automatically completing data in tables for skill inputs
When you start a skill, if the skill input is a table, now you can upload an .xls or .csv file
to automatically complete the data in the table. For details, see Filling tables in skill inputs automatically.
Number of suggested skills increased to 10
When you add a new skill to a skill flow, you get up to 10 suggested skills of what is the next
best skill to add.
Mapping inputs of skills that dynamically generate a list of options
Map which values you want to show as options of a list of options based on the value that you
provide for another input field. For example, if you have a skill that lists the projects within a
project management application for specific organizations, you can use the
x-ibm-ui-extension property to dynamically get the projects and list them as a list
of options based on the organization that users input to another field. For details, see Mapping inputs of the skill that generate the list of
options.
Version 2.0.0 of the watsonx Orchestrate service includes various fixes.
Data Product Hub is a self-service solution suitable for
organizations to share data products among their teams. A data product contains a governed
collection of data assets and is curated to be accessible and reusable. Data Product Hub enables
producers to easily create, share, and govern data products with consumers and ensures that teams
can quickly access the data that they need.
Accessing data quickly
Consumers can find the content they need by using the governed inventory of data products on
Data Product Hub or by requesting a new, unique data
product.
Resolving data silos
Producers can use metadata to create data products from both IBM and third-party tools. This
integration optimizes accessibility and prevents silos as consumers can access all data on one
secure location.
Empowering trust and compliance
All data products are associated with a data contract that outlines the terms and conditions of
usage. These contracts provide assurance on data security and compliance for both producers and
consumers as data products are shared across many regions and business domains.
IBM Knowledge Catalog Premium is a
generative AI enabled service that includes a complete governance framework with data privacy, data
quality, cataloging, and automated metadata enrichment. IBM Knowledge Catalog Premium uses trusted Slate and Granite
foundation models to:
Recommend descriptive names for tables and columns based on the contents of the tables and
columns.
Suggest and assign semantic descriptions for the contents of tables and columns based on the
context and content of the columns.
Complete semantic term assignment for tables and columns.
In addition, IBM Knowledge Catalog Premium includes:
Enhanced data protection features to help control risk and support compliance with privacy
regulations.
Extensive data quality features designed to deliver trusted data to the enterprise and support
regulatory compliance requirements.
IBM Knowledge Catalog Standard is a
generative AI enabled service that is designed to support foundational data governance use cases. It
includes core features such as a glossary, catalogs, workflows, data discovery, natural language
search, profiling, and automated metadata enrichment. IBM Knowledge Catalog Standard uses generative AI to:
Recommend descriptive names for tables and columns based on the contents of the tables and
columns.
Suggest and assign semantic descriptions for the contents of tables and columns based on the
context and content of the columns.
Complete semantic term assignment for tables and columns.
watsonx Code Assistant for Red Hat Ansible Lightspeed
Separately priced
IBM
watsonx Code Assistant for Red Hat
Ansible Lightspeed is a new generative AI service
engineered to help automation teams create, adopt, and maintain Ansible content more efficiently.
Use watsonx Code Assistant for Red Hat Ansible Lightspeed to:
Write Ansible Playbooks with AI-generated recommendations
Use IBM foundational models to get code recommendations in your Visual Studio Code development
environment
Create task prompts from natural language requests
Tune the IBM base code model on your data so that it generates code suggestions that are
customized for your enterprise standards
IBM
watsonx Code Assistant for Z is a
new service that helps developers to modernize their mainframe applications using a combination of
automation and generative AI. Use watsonx Code Assistant for Z
to:
Analyze applications with IBM Application Discovery and Delivery Intelligence
Refactor monolithic COBOL applications into services
Generate Java services based on COBOL, including classes and methods
Generate JUnit tests to validate that the Java service is semantically equivalent to the
COBOL
Cloud Pak for Data Version 5.0 can run on an IPv4/IPv6 dual-stack network. For more information on enabling
dual-stack networking, see Converting to IPv4/IPv6 dual-stack
networking
in the Red Hat
OpenShift Container Platform documentation:
If you are
upgrading to Cloud Pak for Data Version 5.0, you can enable dual-stack networking before you upgrade. When the pods come
back up after upgrade, the pods are dual-stack enabled.
More control over how your usage is reported for
licensing
You are required to keep a record of the size of deployments to report to IBM
as requested. If you plan to install multiple solutions on a single instance of Cloud Pak for Data, you can use node pinning to
ensure that you are compliant with your license terms. Node pinning uses node affinity to determine
where the pods for each solution can be placed.
To determine whether node pinning is appropriate
for your environment, see Node
planning.
Previous releases
Looking for information about previous releases? See the following topics in IBM® Documentation: