What's new in watsonx.data
Read about the new features and enhancements in the current and previous releases of IBM® watsonx.data.
watsonx.data Developer edition
watsonx.data on Red Hat® OpenShift®
IBM watsonx.data Version 1.1.4
A new version of watsonx.data was released in April 2024.
This release includes the following features and updates:
- Uploading description files for Apache Kafka data source
- The Apache Kafka data source stores data as byte messages that producers
and consumers must interpret. To query this data, consumers must first map it into columns. Now you
can upload topic description files that convert raw data into a table format. Each file must be a
JSON file that contains a definition for a table.
To upload these JSON files from the UI, go to the overview page of the Apache Kafka database that you registered and select the Add topic option.
For more information, see Apache Kafka.
- New data sources
- The following new data sources are now available:
- Oracle
- Amazon Redshift
- Informix
- Prometheus
- New BINARY data type for data sources
- In the Query workspace, you can now use the BINARY data type and its equivalents with the SELECT
statement to build and run queries against your data for the following data sources:
- Db2 (BINARY data type)
- Snowflake (BINARY data type)
- PostgreSQL (BYTEA data type)
- Teradata (VARBYTE data type)
- Test SSL connections
- You can now test SSL connections for the MongoDB and SingleStore data sources.
- Kerberos authentication for HDFS connections
- You can now enable Kerberos authentication for secure Apache Hadoop Distributed File System (HDFS) connections. For more information, see HDFS.
- Presto engine version upgrade
- The Presto engine is now upgraded to version 0.285.1.
IBM watsonx.data Version 1.1.3
A new version of watsonx.data was released in March 2024.
- New data type for some data sources
-
You can now use the BINARY data type with the SELECT statement in the Query workspace to build and run queries against your data for the following data sources:
- Elasticsearch
- SAP HANA
- SQL Server
- MySQL
New data types: BLOB and CLOB are available for MySQL, PostgreSQL, Snowflake, SQL Server, and Db2 data sources. You can use these data types only with SELECT statements in the Query workspace to build and run queries against your data.
- Delete data by using the DELETE FROM feature for Iceberg data sources
-
You can now delete data from tables in Iceberg data sources by using the DELETE FROM feature.
You can specify the table propertydelete mode
for new tables by using eithercopy-on-write
mode ormerge-on-read
mode (default). For more information, see SQL statements.
- ALTER VIEW statement for Iceberg data source
- You can now use the following SQL statement in the Query workspace to build and run queries
against your data for ALTER VIEW:
ALTER VIEW name RENAME TO new_name
- Upload SSL certificates for Netezza Performance Server data sources
- You can now browse and upload the SSL certificate for SSL connections in Netezza Performance Server data sources. The valid file formats for SSL certificate are .pem, .crt, and .cer. You can upload SSL certificates by using the Adding a database-catalog pair option in the Infrastructure manager.
- Query data from Db2 and Watson Query
- You can now query nicknames that are created in Db2 and virtualized tables from Watson Query instances.
- Use data from Apache Hudi catalog
- You can now connect to and use data from Apache Hudi catalog.
- New storage type Hadoop Distributed File System (HDFS)
- You can now use new storage type HDFS. For more information, see Adding storage-catalog pair.
- Add bucket feature is now available in Add storage option
-
You can now add a storage and attach a bucket to the respective storage type in the Infrastructure manager in the UI.
- Add Milvus as a service in watsonx.data
-
You can now provision Milvus as a service in watsonx.data with the following features:
- Provision different storage variants such as starter, medium, and large nodes.
- Assign Admin or User roles for Milvus users: User access policy is now available for Milvus users. Using the Access Control UI, you can assign Admin or User roles for Milvus users and also grant, revoke, or update the privilege.
- Configure the Object storage for Milvus to store data. You can add or configure a custom bucket and specify the username, password, region, and bucket URL.
For more information, see Milvus.
- Load data in batch by using the ibm-lh ingestion tool in the client package
- You can now use the
ibm-lh
ingestion tool to run batch ingestion procedures in noninteractive mode, from outside theibm-lh-tools
container, by using theibm-lh-client
package. For more information, see ibm-lh commands and usage.
- Creating schema by using bulk ingestion in web console
- You can now create a schema by using the bulk ingestion process in the web console, if the schema is not previously created.
- Use time-travel queries in Apache Iceberg tables
- You can now run the following time-travel queries by using branches and tags in Apache Iceberg
table snapshots:
- SELECT *FROM <table name> FOR VERSION AS OF 'historical-tag'
- SELECT *FROM <table name> FOR VERSION AS OF 'test-branch'
- Data ingestion is possible only through the Spark engine on the web console
- Iceberg copy loader method of ingesting data is no longer available on the web console. Now, data ingestion through Spark engine is the only available method of ingestion through web console. For more information, see Ingesting data by using Spark.
- Access Cloud Object Storage without credentials
- You can now access your Cloud Object Storage bucket without credentials, by using the CAS endpoint. For more information about getting CAS endpoint, see Getting CAS endpoint.
IBM watsonx.data Version 1.1.2
A new version of watsonx.data was released in February 2024.
- SSL connection for data sources
- You can now enable SSL connection for the following data sources by using the Add database user
interface to secure and encrypt the database connection :
- Db2
- PostgreSQL
- IBM Data Virtualization Manager for z/OS
For IBM Data Virtualization Manager for z/OS and PostgreSQL, select Validate certificate to validate whether the SSL certificate that is returned by the host is trusted.
For the IBM Data Virtualization Manager for z/OS data source, you can choose to provide the hostname in the SSL certificate.
For more information, see Adding a database-catalog pair.
- Secure ingestion job history
- Now users can view only their own ingestion job history. Administrators can view the ingestion job history for all users.
- HTTP proxy enabled for watsonx.data cluster
- HTTP proxy is supported on watsonx.data to establish outbound connections securely. watsonx.data components can now be used in an air-gapped environment as HTTP proxy is enabled on a cluster. This enables secure communication with external resources that are hosted on internet through standard HTTP proxy mechanism. However, for components like MinIO, there are a few limitations. For more information, see Known issues and limitations.
- New data types BLOB and CLOB for SAP HANA and Teradata data sources
- New data types BLOB and CLOB are available for SAP HANA and Teradata data sources. You can use these data types only with SELECT statements in the Query workspace to build and run queries against your data.
- SQL enhancements
- You can now use the following SQL statements in the Query workspace to build and run queries
against your data:
- Apache Iceberg data sources
-
- CREATE VIEW
- DROP VIEW
- MongoDB data sources
-
- DELETE
- Create a new table during data ingestion
-
Previously, you had to have a target table in watsonx.datafor ingesting data . Now, you can create a new table directly from the source data file (available in parquet or CSV format) by using data ingestion through the watsonx.data user interface. You can create the table by using the following methods of ingestion:
- Ingesting data by using Iceberg copy loader. For more information, see Ingesting data by using Iceberg copy loader.
- Ingesting data by using Spark. For more information, see Ingesting data by using Spark.
- Perform ALTER TABLE operations on a column
- With an Iceberg data source, you can now perform ALTER TABLE operations on a column for the
following data type conversions:
- int to bigint
- float to double
- decimal to decimal, where the source decimal type has fewer digits than the converted decimal type.
- Better query performance by using sorted files
- With an Iceberg data source, you can generate sorted files, which reduce the query result
latency and improve the performance of Presto.
Data in the Apache Iceberg table is sorted during the writing process within each file. You can configure the order to sort the data by using the
sorted_by table
property. When you create the table, specify the array of columns that are involved in sorting.
- Exposing Hive metastore port details (Developer edition)
- You can now expose the Hive metastore port details outside the watsonx.data developer edition's host to facilitate connection from external applications (services outside of docker or Podman), such as the integration with Db2, and Spark to watsonx.data.
IBM watsonx.data Version 1.1.1
A new version of watsonx.data was released in December 2023.
- Audit logging
- IBM
watsonx.data now integrates with the Cloud Pak
for Data audit logging service. Auditable events for watsonx.data are forwarded to the security
information and event management (SIEM) solution that you integrate with.
For more information, see Audit events.
- Use self-signed certificates and CA certificates to connect to object stores
- Previously, watsonx.data could connect to HTTPS endpoints that used certificates signed by well-known certificate authorities, such as IBM Cloud® Object Storage and Amazon S3. Now, you can connect to object stores that use self-signed certificates or certificates that are signed by other certificate authorities. For more information, see Connecting to external object stores over https.
- Integration with Db2® and Netezza®
- You can now register Db2 or Netezza engines with valid console URL. You can use the metastore URL shown in Engine detail page to sync the respective engines with appropriate bucket catalog-based table.
- IBM Data Virtualization Manager for z/OS® connector
- You can use the new IBM Data Virtualization Manager for z/OS® connector to read and write IBM Z® without having to move, replicate, or transform the data. For more information, see Connecting to an IBM Data Virtualization Manager (DVM) data source.
- Better memory management
- Metastore caching and metadata caching (header and footer caching) are now enabled by default to optimize the memory usage. Also, now you can create a local staging directory to optimize the use of resources during data operations. For more information, see ../../wxd/admin/enhance_qry.html and Configuring a local staging directory.
- Presto case-sensitive behavior
- The Presto behavior is changed from case-insensitive to case-sensitive. Now, you can provide the object names in original case format as in the database. You can also create Schemas, Tables, and Columns in mixed case that is, uppercase and lowercase through Presto if the database supports it.
- Teradata connector is enabled for multiple ALTER TABLE statements
- Teradata connector now supports the
ALTER TABLE RENAME TO
,ALTER TABLE DROP COLUMN
,ALTER TABLE RENAME COLUMN column_name TO new_column_name
statements. - Removal of development
(*-devel)
packages - For security reasons, the
*-devel
packages are removed from watsonx.data. If you are already using the development packages, the programs that use the development packages cannot be compiled . For any queries, contact IBM Support. - SSL is enabled for PostgreSQL
- Now, ingestion can use mounted certificates when connecting to PostgreSQL.
IBM watsonx.data Version 1.1.0
A new version of watsonx.data was released in November 2023.
- Time-travel and roll-back queries
- You can now run the following time-travel queries to access historical data in Apache Iceberg tables:
SELECT <columns> FROM <iceberg-table> FOR TIMESTAMP AS OF TIMESTAMP <timestamp>;
SELECT <columns> FROM <iceberg-table> FOR VERSION AS OF <snapshotId>;
You can use time-travel queries to query and restore data that was updated or deleted in the past.
You can also roll back an Apache Iceberg table to any existing snapshot.
- Capture historical data about Presto queries
- The Query History Monitoring and Management (QHMM) service captures historical data about Presto
queries and events. The historical data is stored in a MinIO bucket and you can use the data to
understand the queries that were run and to debug the Presto engine.
For more information, see Monitoring and managing diagnostic data.
- Improved query performance with caching
- You can use the following types of caching to improve Presto query performance:
- Metastore caching
- File list caching
- File metadata caching
For more information, see Enhancing query performance through caching.
- Capture Data Definition Language (DDL) changes
- You can now capture and track the DDL changes in watsonx.data by using an event listener.
For more information, see Capturing DDL changes.
- Ingest data by using Spark
- You can now use the IBM Analytics Engine powered by Apache Spark to run ingestion jobs in watsonx.data.
For more information, see Ingesting data by using Spark.
- Integration with Db2 and Netezza Performance Server
- You can now register Db2 or Netezza Performance Server engines in watsonx.data console.
For more information, see Registering an engine.
- New connectors
- You can now use connectors in watsonx.data to
establish connections to the following types of databases:
- Teradata
- Delta Lake
- Elasticsearch
- SAP HANA
- SingleStoreDB
- Snowflake
For more information, see Adding a database.
- New method for integrating with IBM Knowledge Catalog
-
Now, you can use the
ZenApiKey
authorization method to integrate watsonx.data and IBM Knowledge Catalog. For more information, see Integrating with IBM Knowledge Catalog.