Delivering superior price-performance and enhanced data management for AI with IBM watsonx.data

21 May 2024

5 min read

Businesses have accelerated their use of AI in the past year. Gartner has projected that by 2026, more than 80% of enterprises will have deployed AI APIs or generative AI-powered applications in production environments—up from less than 5% in 2023. However, the ability of an enterprise to get value from AI relies on the availability and quality of its underlying data. To unlock the full value of data for AI, enterprises must be able to navigate their complex IT landscapes to enable data access, optimize price-performance of workloads for scale and prepare and deliver governed data for AI.

IBM® watsonx.data™ enables enterprises to scale AI and analytics with their own data wherever it resides. It is a core component of the IBM watsonx™ AI and data platform, which enables enterprises to create custom AI applications for their specific business needs, access and manage all data sources and accelerate the implementation of responsible AI workflows—all on one platform.

Watsonx.data enables enterprises to unlock value in their existing data by connecting to existing storage and analytical environments. It also allows them to prepare their data for AI use cases and cost-optimize workloads with multiple fit-for-purpose query engines and low-cost object storage.

To that end, we’re excited to announce new and upcoming updates to IBM watsonx.data at Think 2024, our annual event that brings together over 5,000 technology pioneers and leaders. So, what’s new?

Deliver superior price performance

IBM watsonx.data with Presto C++ v0.286 and query optimizer on IBM Storage Fusion HCI, tested internally by IBM, was able to deliver better price performance compared to Databrick’s Photon engine, with equal query runtime at less than 60% of the cost, derived from public 100 TB TPC-DS Query benchmarks.*

Presto, the open source Linux Foundation project, is a key engine for watsonx.data. Presto C++ is the latest in the development of Presto 2.0, the next-generation version of Presto being developed by Meta, IBM and others that run Presto with Velox, an open source C++ native acceleration library designed to be composable across compute engines. IBM has key maintainers in the Velox project, with contributions to the development of Presto 2.0, including the Parquet and Iceberg readers and support for filesystems. Query optimizer integrates enterprise-proven query compilation technology coupled with advanced query rewrite and cost-based optimization techniques. In other words, watsonx.data has been enhanced for fast query time performance at optimized costs.

Unlock transactional mainframe data for AI and analytics

We announced the upcoming launch of IBM Data Gate for watsonx, a technology that revolutionizes the way organizations synchronize, analyze and build AI models from data originating on IBM Z®. In a 2022 Celent report commissioned by IBM, it has been estimated that globally, 70% of bank, cards and payments transaction value run on IBM zSystems™ environments.1 IBM clients will be able to unlock transactional mainframe data for AI and analytics with IBM Data Gate for watsonx™ integrated with IBM watsonx.data.

With this valuable transactional data, organizations can identify fraud, understand constituent behavior, client buying journeys and client attrition and build predictive AI models to understand, anticipate and advance business outcomes. By bringing transactional data originating on the mainframe into an open, governed data lakehouse like watsonx.data, enterprises can readily build AI models to ultimately help grow revenue, enhance productivity and manage cost.

Unify and share data across IBM’s databases for new AI applications

From mobile banking applications to connected cars, clients rely on popular IBM databases to store their most critical data across the hybrid cloud, powering applications and analytics that operate their business every single day. Watsonx.data can help clients tap this valuable data for AI. IBM Db2 Database, Db2 Warehouse, Netezza and Informix are expected to offer upcoming new integrations with watsonx.data and support for open formats like Apache Iceberg to unify and share a single copy of data and metadata across the hybrid cloud without needing to migrate or re-catalog. Also, clients can query data from their IBM databases across multiple engines to prepare data for AI.

On-premises database clients can modernize to hybrid cloud deployments and enable flexibility for AI with like-for-like SaaS compatibility. To support hybrid cloud application modernization, IBM and AWS also introduced a consumption-based license for Amazon RDS for Db2, simplifying workload management and accelerating time to market with on-demand licenses and faster cloud provisioning.

Accelerate data discovery and insights with a semantic layer–no SQL required

We introduced the upcoming launch of the semantic layer as part of IBM Knowledge Catalog and embeddable into IBM watsonx.data. The semantic layer uses large language models (LLM) to create a unified data context across IBM Data and AI tools. Powered by watsonx, the semantic layer cannot only enrich data but also provide automation tools to assist teams rapidly exploring and processing data.

When embedded into IBM watsonx.data, the semantic layer can generate data enrichments that enable clients to find and understand previously cryptic, structured data across their data estate in natural language through semantic search to accelerate data discovery and unlock data insights faster–no SQL required.

Scale repeatable data packaging and delivery for your AI use cases

We announced IBM Data Product Hub, a new data sharing solution that is expected to be available in June 2024 to help enterprises accelerate data-driven outcomes by streamlining data sharing between internal data producers and data consumers for access to data. What does this mean for IBM watsonx.data users? They will be able to connect to IBM watsonx.data for unified access to disparate data sources and pull in relevant metadata to create the core of what will become a repeatable, governed data product. That data product can then be used to deliver the right data for various AI use cases across your organization at scale, without repeated, manual workflows slowing you down.

Create a knowledge base to enhance the relevance and precision of your AI

Using your trusted, governed data is essential for the accuracy and relevance of AI applications. That is why we recently launched an integrated vector database based on the open source Milvus in IBM watsonx.data. Now, watsonx clients can unify, curate and prepare vectorized embeddings for their generative AI applications at scale across their trusted, governed data. This helps enhance the relevance and precision of AI outputs, including chatbots, personalized recommendation systems and image similarity search applications. It allows clients to seamlessly connect to their trusted data in watsonx.data from IBM watsonx.ai™ or another AI tool.

How can you get started with IBM watsonx.data today?

Try watsonx.data yourself with a free trial. Are you interested in learning more about any of our upcoming product updates? Join our upcoming webinar recapping our Think announcements or book a meeting with an IBM watsonx.data product specialist.

 

Author

Fariya Syed-Ali

Senior Product Marketing Manager, watsonx.data

Footnotes

* Based on IBM internal testing of Presto C++ 0.286 on a hyper-converged infrastructure setup with 1 master + 75 worker nodes, 1009 vCPUs, 18 TB memory, 344.8 TB of filesystem storage, distributed RAID and 50 GB network compared to public Databricks 100TB TPC-DS Query benchmarks published in 2021 with 1 master + 256 worker nodes, 2112 vCPUs, 16.1 TB Memory, 528.2 TB of total storage and 10 GB Network. Pricing calculations are based on IBM watsonx.data pricing as of 7 May 2024 and Databricks published pricing for Photon as of 7 May 2024. Results are based on testing conditions and pricing as of the dates shown. Actual costs and performance can vary depending on individual client configurations and conditions. Results are derived from the Databricks SQL 8.3 benchmark and as such is not comparable to published Databricks SQL 8.3 benchmark results, as results do not comply with the Databricks SQL 8.3 benchmark specification.

1 Celent report: “OPERATIONALIZING FRAUD PREVENTION ON IBM Z16, Neil Katkov”, 04/05/2022, commissioned by IBM