Integrating Cloud Pak for Data System with watsonx.data

With Netezza Performance Server for Cloud Pak for Data as a Service and watsonx.data you can connect to the Hive Metastore (HMS) which is a data lake metastore server and query from Apache Iceberg tables that live on your data lake S3 object store.

Overview

Cloud Pak for Data System has integrated with IBM watsonx.data to expand its capabilities and become a Lakehouse. To build a hybrid cloud experience, on-premises organizations can use IBM watsonx.data on the IBM Storage Fusion HCI system download. Through this integration, users of Cloud Pak for Data System can read and write to Apache Iceberg tables kept on reasonably priced object storage that is compatible with S3 on the SF HCI system. It also allows combining Iceberg tables with Netezza Performance Server native tables. Whether on-premises or in the cloud, data access and sharing across different Netezza Performance Server instances are made easier by shared metadata, storage, and open table Iceberg data formats. watsonx.data facilitates the deployment of AI workloads, lower-cost ETL, and uniform governance for organizations. Moreover, it offers better security, engines that are appropriate for the job.

Note: HMS (Hive Metastore) connection setup still requires command-line configuration.

General information

  • Use Apache Iceberg tables with the parquet file format.
  • SELECT, CREATE, INSERT, CREATE TABLE, CTAS, DROP, DELETE, TRUNCATE, CREATE VIEW and Snapshot/Time travel are supported.
  • Iceberg tables cannot be read if they were altered and underwent Iceberg schema evolution.
  • LZ4 data compression format is supported.
  • NETEZZA_SCHEMA is the default schema when you connect to a data lake database. It is a regular schema that contains Netezza objects like tables, external tables, and sequences. NETEZZA_SCHEMA, DEFINITON_SCHEMA, and INFORMATION_SCHEMA schema names are all reserved, and schemas of those names in the metastore are not exposed to NPSaaS users.
  • The following datatypes are not supported:
    • timestamptz
    • uuid
    • struct
    • list
    • map
Note: Indirect access by using the Netezza Performance Server host is the supported option with the ICPDS 1.0.8.x release.

To externalize the SPUs for direct access, contact support for the switch configuration and then perform the steps listed in the Externalizing SPU IP address.