Today’s modern technology landscape is experiencing an explosion of data. Organizations need to be able to trust and access this data to generate meaningful insights. Enter IBM Cloud Pak® for Data 5.0, the newest release of the cloud-native insight platform that integrates the tools needed to collect, organize and analyze data within a data fabric architecture.

IBM Cloud Pak for Data 5.0 enhances users’ data strategies by including these new features

  • Immersive Experience: Customers can now streamline their IT and day 2 operations with the Immersive Experience feature, which brings the IBM watsonx™ and Cloud Pak for Data products together on a single platform.   
  • Remote data planes: Users can now run their workloads where their data resides with the ability to provision a single Cloud Pak for Data instance (control plane) that can support multiple lightweight remote data planes across diverse clouds, geographical regions and on-premises environments, providing organizations with enhanced compliance, performance and cost-effectiveness.   
  • Relationship Explorer: Within the IBM Knowledge Catalog, users can use a knowledge graph database to present a visual map of relationships between data assets and governance artifacts.   
  • Data Product Hub: IBM’s newest product, Data Product Hub, helps customers break down data silos, streamline data sharing and automate the delivery of data products to data consumers across the organization.   
  • IBM Knowledge Catalog Cartridges: Generative AI-enabled, modern data intelligence solutions with new features powered by enterprise-grade foundation models that boost the productivity of data practitioners by automatically assigning business context to enterprise data at scale.  

Data Fabric provided by IBM Cloud Pak for Data 

IBM enables customers to build a data fabric architecture via Cloud Pak for Data, the platform that provides composable services spanning data integration, data governance, data observability, master data management and data lineage use cases.  

The IBM Software team is excited to announce the next version of Cloud Pak for Data: version 5.0, the 15th feature release. This version offers customers new platform features and enhancements in addition to new service features and enhancements that span the entirety of the data fabric portfolio.   

Platform features and enhancements 

Immersive Experience  

With the new release of the Immersive Experience feature within IBM Cloud Pak for Data 5.0, customers can now use IBM AI and data fabric platforms in tandem without the need for complex integrations or separate management systems.

The Immersive Experience facilitates integration between watsonx, the AI and data platform designed to help businesses scale and accelerate their AI and data-driven initiatives, and Cloud Pak for Data within a single Red Hat® OpenShift® cluster and namespace.

This innovative approach brings these two products together seamlessly. It enables the user to toggle back and forth between technologies within a single platform, enabling the organizations to streamline their IT and Day 2 operations. Refer to the picture for the user experience provided by Immersive Experience on Cloud Pak for Data: 

Furthermore, when installed together, both the watsonx and Cloud Pak for Data brands maintain their own distinct user experience, which can be compared to having two separate tools with the convenience of a single platform.  

This experience is achieved through perspectives, customized views that start based on your wanted features. In addition, the node pinning or resource pool technology included in the release supports better allocation of licenses. It also facilitates the coexistence of multiple products, so users can focus on driving innovation and growth without worrying about technical complexities. More information on this new feature can be explored.  

Remote data planes   

In Cloud Pak for Data 5.0, the remote data plane unlocks new possibilities for data engineers. Users are no longer limited to performing their data jobs in one location. The innovation of the remote data plane brings processing capabilities to the data, circumventing costly and sometimes impossible data transfer while fully complying with data sovereignty laws.  

By allowing workloads to be moved closer to where data resides, the remote data plane consolidates, expands and refines workloads throughout on-premises and multiple cloud ecosystems, prioritizing compliance, performance and cost-effectiveness.   

By consolidating multiple Cloud Pak for Data instances into one instance and running pipelines where the data resides, remote data planes on Cloud Pak for Data 5.0 provide these benefits and can be further explored here.  

The advantages of using remote data planes include: 

  • Reduced data movement and network latency.  
  • Minimized costs with lowered operational expenses and egress charges. 
  • Improved data security and control for data sovereignty as processing occurs at the data source.  
  • Optimized resource usage with consolidated instances.  
  • Boost pipeline performance as a result.  

Connectivity 

Equally significant to the listed updates are the connectivity options available in Cloud Pak for Data 5.0. This version offers connectivity at the platform and individual service levels with immersive connectivity to different data sources.  

With an improvement of over 100 connectors and various formats supported, Cloud Pak for Data 5.0 offers variety and flexibility for customers, and the option to use generic JDBC and the Connector SDK to build custom connectors.  

Also, now available in Cloud Pak for Data 5.0 are platform-wide certified connector support for Apache Iceberg, Delta Lake table format and Milvus vector database, enabling seamless connectivity and unlocking new customer possibilities. These new connectors along with those that are existing, are all tested to help ensure seamless connectivity between Cloud Pak for Data and over 100 data sources.   

Data fabric service features and enhancements

As previously mentioned, one of the solution areas that the IBM Cloud Pak for Data platform addresses is helping customers build a data fabric architecture. The platform is composed of a modular set of integrated data fabric service components that automate integration, metadata management and data governance. In addition to the new platform features and enhancements of IBM Cloud Pak for Data, there are several notable updates to the data fabric services available in version 5.0.  

Data integration  

One key data fabric service available on Cloud Pak for Data is IBM® DataStage®, the industry-leading data integration solution that supports various combinations of extract, transform and load (ETL) patterns that move and transform data for AI readiness.  

DataStage plays a critical role in the launch of Cloud Pak for Data 5.0 because it is built to use the new remote data plane capability. Explore the new product features available on Cloud Pak for Data 5.0. 

Data governance  

IBM is dedicated to enhancing the productivity of data users. This commitment is further indicated by the enhancements to the IBM Knowledge Catalog offering in Cloud Pak for Data 5.0.  

With the new IBM Knowledge Catalog Standard and IBM Knowledge Catalog Premium Cartridges, IBM delivers generative AI-enabled, modern data intelligence solutions to help organizations scale data governance and boost the productivity of data practitioners by automatically assigning business context to enterprise data at scale.  

This enriched metadata context can then be used to automate searchability, streamline access control and improve reporting, unlocking the full potential of self-service AI and analytics.  

The IBM Knowledge Catalog Standard Cartridge includes core features such as a glossary, catalog, workflow and automated metadata enrichment, enabling effective use of data assets and fostering greater automation of data management and unification of business metadata.  

The IBM Knowledge Catalog Premium Cartridge builds on the Standard Cartridge’s features with added capabilities, including robust data protection and extensive data quality features to support regulatory compliance and deliver trusted data to the enterprise.  

The Cloud Pak for Data 5.0 launch encompasses another novel feature of IBM Knowledge Catalog, which is Relationship Explorer. Relationship Explorer offers a powerful solution to address the challenge of data literacy and governance as data estates grow in complexity by using a knowledge graph database to present a visual map of relationships between data assets and governance artifacts.  

This feature allows data stewards and compliance officers to identify sensitive data locations, visualize policy and rule flows, and assess the impact of changes in governance assignments. For a detailed exploration of Relationship Explorer, read the blog post.   

Data governance and data sharing  

IBM’s newest offering on Cloud Pak for Data 5.0, Data Product Hub provides a data-sharing solution to enable organizations to accelerate the enterprise-wide sharing of reusable data products in a governed manner. Data producers can now create and share actively managed data products, sourced from disparate source systems with data consumers across the organization.  

Data Product Hub allows data users to own the entire data product lifecycle, from the onboarding to the retirement of a data product. With Data Product Hub, data consumers can quickly discover and use data across domains, without worrying about compliance, security and data quality.  

Learn more about how Data Product Hub simplifies the onboarding, sharing, discovery and delivery of reusable data products, no matter where the data resides by reading the blog.  

Data quality enabled by a Data Fabric Architecture  

It is well known that a robust data strategy is critical to AI implementations. Organizations require reliable data for robust AI models and accurate insights. However, the current technology landscape presents unparalleled data quality challenges. Gartner reports that through 2025, 30% of generative AI projects will be abandoned after proof of concept due to poor data quality. As organizations embrace generative AI to transform business decision-making, the quality of data used in AI will be a crucial determinant of success.    

Organizations can help ensure the quality of their data and help break down data silos by implementing a data fabric architecture. IBM’s data fabric provides organizations with a trusted data foundation, enabling clients to automate data discovery, enrichment and protection with our data governance and quality capabilities, employing various data integration styles to deliver reliable data for AI workflows. This composable architecture allows IBM to meet clients wherever they are in their data journey.   

IBM’s data fabric architecture is composed of these 5 entry points  

  1. Data governance: Automate management of data lifecycles with governance, security and lineage for self-service data consumption.   
  2. Data integration: Provide readily consumable and properly governed data to your teams anytime and anywhere.  
  3. Data observability: Deliver reliable data by detecting data incidents earlier and resolving them faster with continuous data observability.  
  4. Master data management: Drive faster and more scalable insights by delivering a comprehensive view of entity data across an enterprise.  
  5. Data lineage: Provide a record of data throughout its lifecycle, including source information and any data transformations that have been applied.  

One of the vehicles through which IBM helps customers build a data fabric architecture is Cloud Pak for Data. With the release of IBM Cloud Pak for Data 5.0 and the new features, specifically Immersive Experience, Remote Data Planes, Relationship Explorer and Data Product Hub, along with the rest of the components of the IBM Data Fabric architecture, customers can optimize their modern data workloads and scale analytics and AI with prepared quality data.  

Try IBM Cloud Pak for Data 5.0 for free Book a meeting Register for the Data Product Hub Webinar on 16 July 2024

More from Artificial intelligence

Responsible AI is a competitive advantage

3 min read - In the era of generative AI, the promise of the technology grows daily as organizations unlock its new possibilities. However, the true measure of AI’s advancement goes beyond its technical capabilities. It’s about how technology is harnessed to reflect collective values and create a world where innovation benefits everyone, not just a privileged few. Prioritizing trust and safety while scaling artificial intelligence (AI) with governance is paramount to realizing the full benefits of this technology. It is becoming clear that…

Taming the Wild West of AI-generated search results

4 min read - Companies are racing to integrate generative AI into their search engines, hoping to revolutionize the way users access information. However, this uncharted territory comes with a significant challenge: ensuring the accuracy and reliability of AI-generated search results. As AI models grapple with "hallucinations"—producing content that fills in gaps with inaccurate information—the industry faces a critical question: How can we harness the potential of AI while minimizing the spread of misinformation? Google's new generative AI search tool recently surprised users by…

Are bigger language models always better?

4 min read - In the race to dominate AI, bigger is usually better. More data and more parameters create larger AI systems, that are not only more powerful but also more efficient and faster, and generally create fewer errors than smaller systems. The tech companies seizing the news headlines reinforce this trend. “The system that we have just deployed is, scale-wise, about as big as a whale,” said Microsoft CTO Kevin Scott about the supercomputer that powers Chat GPT-5. Scott was discussing the…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters