What's new in watsonx.data
Read about the new features and enhancements in the current and previous releases of IBM® watsonx.data.
watsonx.data Developer edition
watsonx.data on Red Hat® OpenShift®
IBM watsonx.data is a new open architecture lakehouse that combines the elements of the data warehouse and data lakes. The best-in-class features and optimizations available on the watsonx.data make it an optimal choice for next generation data analytics and automation. It is released in three versions:
- Software
- Developer
- Cloud
IBM watsonx.data Version 2.0.0
A new version of watsonx.data was released in June 2024.
- Azure Data Lake Storage Gen2 (ADLS), Azure Blob and Google Cloud Storage
- You can now use the following storage types:
- You can now add Azure Blob, Azure Data Lake Storage Gen2 (ADLS), and Google Cloud Storage to watsonx.data.
- You can now use Azure Data Lake Storage (ADLS) Gen1 and Gen2 to store your data while submitting Spark applications.
For more information, see Adding a storage-catalog pair.
- New Arrow Flight service based data sources
-
You can now use the following data sources with Arrow Flight service:
- Greenplum
- Salesforce
- MariaDB
- Apache Derby
For more information, see Arrow Flight service.
- New data sources
-
You can now use the following data sources:
- Cassandra
- BigQuery
- ClickHouse
- Apache Pinot
For more information, see Adding a database-catalog pair.
- New page for Bring Your Own JAR (BYOJ) process for SAP HANA data source
- Users can now use a new dedicated section Driver manager under new
Configurations page to manage drivers for SAP HANA data source. Each of these
drivers undergo a series of validation.
For more information, see SAP HANA.
- Apache Ranger policies
- IBM watsonx.data now supports Apache Ranger
policies to allow integration with Presto engines.
For more information, see Apache Ranger policy.
- Provision Spark as a native engine
- In addition to registering external Spark engines, you can now provision native Spark engine in
watsonx.data. With native Spark engine, you can
manage Spark engine configuration, manage access to Spark engines, and view applications by using
REST API endpoints from watsonx.data.
For more information, see Native Spark engine.
- Query Optimizer to improve query performance
- You can now use Query Optimizer, to improve the performance of queries that are processed by the
Presto (C++) engine. If Query Optimizer determines that optimization is feasible, the query
undergoes rewriting; otherwise, the native engine optimization takes precedence.
For more information, see Query Optimizer overview.
- New name for Presto engine in watsonx.data
- Presto is renamed to Presto (Java).
- New engine (Presto C++) in watsonx.data
- You can provision a Presto (C++) engine ( version 0.286) in watsonx.data to run SQL queries on your data source
and fetch the queried data.
For more information, see Presto (C++) overview.
- API Customization feature
- You can now use catalog and engine API Customization for Presto (Java) and Presto (C++) engines
in watsonx.data.
For more information, see IBM API docs.
- Mixed case feature flag for Presto (Java) engine
- The mixed case feature flag, which allows to switch between case sensitive and case insensitive
behavior in Presto (Java), is available. The flag is set to OFF by default and can be set to ON
during the deployment of watsonx.data.
For more information, see Presto (Java) mixed-case support overview.
- Using proxy to access S3 and S3 compatible buckets
- External applications and query engines can access the S3 and S3 compatible buckets managed by
watsonx.data through an S3 proxy.
For more information, see Using CAS proxy to access S3 and S3 compatible buckets.
- Semantic automation for data enrichment
- Semantic automation for data enrichment leverages generative AI with IBM Knowledge Catalog to
understand your data on a deeper level and enhance data with automated enrichment to make it
valuable for analysis.
For more information, see Semantic automation for data enrichment in watsonx.data.
- Hive Metastore (HMS) access in watsonx.data
- You can now fetch metadata information for Hive Metastore by using REST APIs instead of getting the information from the engine details. HMS details are used by external entities to integrate with watsonx.data. You must have an Admin, Metastore Admin, or Metastore Viewer role to run the API.
- Manage resource quota limits for your Spark engine
You can now manage the resource usage quota for the Spark engine in Cloud Pak for Data by using the REST API or from the Spark engine details page.
For more information, see Managing resource quota.
- Version upgrade
-
- Presto (Java) engine is now upgraded to version 0.286.
- Milvus service is now upgraded to version to 2.4.0. Important features include:
- Better Performance (Low Memory Utilisation)
- Support Sparse Data
- Inbuilt SPLADE Engine for Sparse Vector Embedding
- BGE M3 Hybrid (Dense+Sparse) Search
- Command to retrieve ingestion history
- You can now retrieve the status of all ingestion jobs that are submitted by using the
ibm-lh get-status --all-jobs
CLI command. You can retrieve the status of all ingestion jobs that are submitted. You get the history records that you have access to.For more information, see Options and parameters supported in ibm-lh tool.
- New operations for Db2 data source
- You can perform the following operations for
BLOB
andCLOB
data types for Db2 data source:INSERT
CREATE
CTAS
ALTER
DROP
- New data types for data sources
- The following new data types are now available for some data sources. You can access these data
types on the Data manager page under the Add column
option.
BLOB
- Db2
- Teradata
- Oracle
- MySQL
- SingleStore
CLOB
- Db2
- Teradata
- Oracle
BINARY
- SQL Server
- MySQL
Because the
numeric
data type is not supported in watsonx.data, you can use thedecimal
data type as an equivalent alternative to thenumeric
data type for Netezza data source.You can now use the
BLOB
andCLOB
data types with theSELECT
statement in the Query workspace to build and run queries against your data for Oracle and SingleStore data sources.You can now use the
BLOB
andCLOB
data types for MySQL and PostgreSQL data sources as equivalents toLONGTEXT
,BYTEA
, andTEXT
because these data types are not compatible with Presto (Java). These data types are mapped toCLOB
andBLOB
in Presto (Java) if data sources have existing tables withLONGTEXT
,TEXT
, andBYTEA
data types.- MySQL (
CLOB
as equivalent toLONGTEXT
) - PostgreSQL (
CLOB
as equivalent toTEXT
) - PostgreSQL (
BLOB
as equivalent toBYTEA
) - Netezza (
decimal
as equivalent tonumeric
) - Oracle (
BLOB
andCLOB
with theSELECT
statement) - SingleStore (
BLOB
andCLOB
with theSELECT
statement)