Overview of IBM Cloud Pak for Data

IBM® Cloud Pak for Data is a cloud-native solution that enables you to put your data to work quickly and efficiently.

Your enterprise has lots of data. You need to use your data to generate meaningful insights that can help you avoid problems and reach your goals.

But your data is useless if you can't trust it or access it. Cloud Pak for Data lets you do both by enabling you to connect to your data, govern it, find it, and use it for analysis. Cloud Pak for Data also enables all of your data users to collaborate from a single, unified interface that supports many services that are designed to work together.

Cloud Pak for Data fosters productivity by enabling users to find existing data or to request access to data. With modern tools that facilitate analytics and remove barriers to collaboration, users can spend less time finding data and more time using it effectively.

And with Cloud Pak for Data, your IT department doesn't need to deploy multiple applications on disparate systems and then try to figure out how to get them to connect.

Run anywhere

Cloud Pak for Data can run on your Red Hat® OpenShift® cluster, whether it's behind your firewall or on the cloud.
On the cloud
If you have an OpenShift deployment on IBM Cloud, AWS, Microsoft Azure, or Google Cloud, you can deploy Cloud Pak for Data on your cluster.
On premises
Prefer to keep your deployment behind a firewall? You can run Cloud Pak for Data on your private, on-premises cluster.

If most of your enterprise data lives behind your firewall, it makes sense to put the applications that access your data behind your firewall to prevent accidentally sharing your data.

The Cloud Pak for Data data fabric

A data fabric is an architectural pattern for managing highly distributed and disparate data. Because it is designed for hybrid and multi-cloud data environments, a data fabric supports the decoupling of data storage, data processing, and data use. With the intelligent knowledge catalog capabilities, you can elevate data into enterprise assets that are governed globally regardless of where the data is stored, processed, or used. Catalog assets are automatically assigned metadata that describes logical connections between data sources and enriches them with semantics so that you can provide business-ready data for your applications, services, and users.

The data fabric architecture that is provided by Cloud Pak for Data enables your organization to accelerate data analysis for better, faster insights.

With the capabilities of the Cloud Pak for Data data fabric architecture, you can:
  • Simplify and automate access to data, across multi-cloud and on-premises data sources, without moving data.
  • Universally safeguard the use of all data, regardless of source.
  • Provide business users with a self-service experience for finding and using data.
  • Use AI-powered capabilities to automate and orchestrate the data lifecycle.
The following diagram shows the five main capabilities of the data fabric and the connectivity between the platform and existing data sources.
The five capabilities are: the knowledge core, data self-service, data integration, governance, and unified lifecycle. They are interconnected, and the data integration capability provides connectivity to cloud and on-premises data sources.
Metadata-based knowledge core
Data stewards enrich data with metadata that describes the data and informs the semantic search for data. They curate data into catalogs by using automated discovery and classification. They can further enrich data assets by creating and assigning custom governance artifacts, such as business vocabulary. They can also import ready to use collections of metadata from industry-specific Knowledge Accelerators.

Components: Watson™ Knowledge Catalog service, Knowledge Accelerators

Data self-service in catalogs
Data scientists and other business users can find the data that they need in data catalogs that contain data from across the enterprise. They can use AI-powered semantic search and recommendations that consider asset metadata, browse for data, or view their peers’ highly rated assets. They copy data assets from a catalog into a project, where they collaborate to prepare, analyze, and model the data.

Components: Watson Knowledge Catalog service

Automated data integration
Data engineers and other users prepare your data for consumption. They can provide access to data in your existing data architecture and automate data preparation. They can integrate and virtualize data for faster, simpler querying. They can automate the bulk ingestion, cleansing, and complex transformations of data to regularly publish updated data assets. They can push down the processing of the data to the location of the data.

Components: Cloud Pak for Data platform, Data Refinery tool, Data Virtualization service, DataStage® service

Unified data governance, security, and compliance
Data stewards can create data protection rules to automatically enforce uniform data privacy across the platform. Data masking deidentifies sensitive data to provide data security while it preserves data utility and prevents the need for multiple copies of the data. Data stewards can import ready to use compliance metadata from Knowledge Accelerators.

Components: Watson Knowledge Catalog service, Knowledge Accelerators

Unified lifecycle
Users can design, build, test, orchestrate, deploy to production, and monitor different types of data pipelines in a unified way. Users can create or find data assets, search for them across the platform, and move them across workspaces. Users can orchestrate data transformations and other actions by scheduling jobs that run automatically.

Components: Cloud Pak for Data platform

Ready for AI

To be competitive and successful, your enterprise must leverage the power of artificial intelligence.

Cloud Pak for Data helps you climb the AI ladder by providing a suite of services that support you in your journey to AI.

Collect
Cloud Pak for Data helps you connect to your data, no matter where it lives. Cloud Pak for Data includes a Connections page that lists connections that can be used by multiple services. Some services support additional data sources that you can connect to from the service. The platform makes it simple to access your data.
Organize
The Watson Knowledge Catalog service helps you organize your data through data classification and governance. With the Watson Knowledge Catalog service, you can develop an information architecture that is on-point and ready to keep up with the scale of your data.
Analyze
Cloud Pak for Data also includes numerous analytics services that can help you generate scalable insight on demand. For example, with Cloud Pak for Data you can use:
  • Cognos® Dashboards, which enables you to create stunning dashboards to quickly visualize data.
  • SPSS® Modeler (premium service), which enables you to create flows to prepare and blend data, build and manage models, and visualize the results.
Infuse
With Cloud Pak for Data you can make AI a part of your standard operating procedure. Whether you want to build smarter apps with premium Watson services, deploy machine learning models into production at scale with Watson Machine Learning, or infuse your AI with trust and transparency with Watson OpenScale.

There are many more services that you can install on Cloud Pak for Data. For a complete list, see Services.

With Cloud Pak for Data, raw data becomes trusted data that you can analyze to gain insights and maximize business outcomes.

Support for your data lifecycle

Your data isn't static. Your machine learning models shouldn't be static either. As data is added to your on-premises and cloud data sources, you need to continually test and tune your machine learning models to ensure that they give you valuable insight. But you need to make sure that you're working with high-quality data, which is where the data governance and data integration and preparation services that you can install on Cloud Pak for Data come in.

You know the old adage: Garbage in, garbage out. If your data is poor, your results aren't meaningful. By bringing data stewards and data engineers together with your data scientists, you can ensure that your data is ready for analysis.

Additionally, you can ensure that any analytics assets that your data scientists create, such as models, notebooks, and Shiny apps are included in a data catalog so that they can be governed and maintained like any other data assets in your enterprise.

With Cloud Pak for Data, you can continuously discover new, valuable insights as data is added to your ecosystem.

Modern and modular

Cloud Pak for Data provides a modern data and analytics architecture that is elastic, scalable, and reliable. The end-to-end platform means that you can spend less time managing your data and more time using it to grow your business.

You can choose which services you install on Cloud Pak for Data so that you can use your resources wisely. Whether you want to modernize your data landscape, generate real-time insights to drive business transformations, or deliver exceptional, AI-augmented customer experiences, Cloud Pak for Data has a solution that can propel your business forward.

If you want to become a data-driven enterprise, Cloud Pak for Data should be at the center of your data and analytics ecosystem.

Choose the right edition for your needs

There are two editions of Cloud Pak for Data that you can choose from:
  • Enterprise Edition
  • Standard Edition

    Standard Edition places limits on the number of virtual processor cores (VPCs) that you can have in your cluster. For specific information on the limits, contact IBM Sales.