Overview

IBM® watsonx.data is a new open architecture lakehouse that combines the elements of the data warehouse and data lakes. The best-in-class features and optimizations available on the watsonx.data make it an optimal choice for next generation data analytics and automation.

watsonx.data Developer edition

watsonx.data on Red Hat® OpenShift®

watsonx.data SaaS on AWS

watsonx.data is a unique solution that allows co-existence of open source technologies and proprietary products. It offers a single platform where you can store the data or attach data sources for managing and analyzing your enterprise data.

Use watsonx.data to store any type of data (structured, semi-structured, and unstructured) and make that data accessible directly for Artificial Intelligence (AI) and Business Intelligence (BI). You can also attach your data sources to watsonx.data, which helps to reduce data duplication and cost of storing data in multiple places. It uses open data formats with APIs and machine learning libraries, making it easier for data scientists and data engineers to use the data. watsonx.data architecture enforces schema and data integrity, making it easier to implement robust data security and governance mechanisms.

Key features

  • An architecture that fully separates compute, metadata, and storage to offer ultimate flexibility.

  • Multiple engines such as Presto and Spark that provide fast, reliable, and efficient processing of big data at scale.

  • Open formats for analytic data sets, allowing different engines to access and share the data at the same time.

  • Data sharing between watsonx.data, Db2® Warehouse, and Netezza Performance Server or any other data management solution through common Iceberg table format support, connectors, and a shareable metadata store.

  • Built-in governance that is compatible with existing solutions, including IBM Knowledge Catalog.

  • Cost-effective, simple object storage is available across hybrid-cloud and multicloud environments.

  • Integration with a robust ecosystem of IBM’s best-in-class solutions and third-party services to enable easy development and deployment of key use cases.