Supported data sources

You can connect to many data sources in Cloud Pak for Data. Some services support connections to data sources that are defined at the platform-level, while other services use connections that are specific to the service.

Ways to connect to your data
Connectors
Other data sources
Data files
Connecting to data sources (by service)

Ways to connect to your data

Use the following list to choose a method to connect to your data for your use case.

Creating connections at the platform level

In general, platform-level connections simplify the process of creating and maintaining connections. You create the connection and then multiple services can refer to the connection. If you update the connection, the changes are automatically picked up by the projects that use the connection.

You can create platform-level connections from the Platform connections page. These connections can be used by various services across the platform. However, the Platform connections page is available only if the Cloud Pak for Data common core services are installed.

For more information, see Connecting to data sources at the platform level.

Consider creating connections at the platform level if the following statements are true:

The services support platform-level connections.
The same connection needs to be used by multiple services or instances or across multiple projects.
You have the appropriate permissions to create platform-level connections.
You must have the Editor or Admin role on the Platform connections page. For more information, see Managing collaborators on platform connections.

Platform connections are visible to all platform users. However, only users with the credentials for the data source can use the connection.

If you don't see the type of data source that you want to connect to, a Cloud Pak for Data administrator can create a custom JDBC connector for the data source. If you are connecting to only one data source and users do not need a repeatable method to connect to it, you can create a Generic JDBC connection.

Not all services support the same types of connections. If you want to use a connection from the Platform connections catalog, the list of connections is filtered based on the types of connections that the service supports. For example, if you are using a connection to add a data source to a project, only connections that are supported for projects are displayed.

Creating connections at the service level

Create connections at the service level, if any of the following statements are true:

The service that you are using does not support platform-level connections.
You don't have the appropriate permissions to create platform-level connections.
You don't want the connection to be included in the Connections catalog for security reasons.

For more information, see Connecting to data sources at the service level.

Connectors

The following table lists the data sources that you can connect to from Cloud Pak for Data.

Connector	IBM Knowledge Catalog, Watson Studio	SPSS Modeler	DataStage	Data Virtualization
Amazon RDS for MySQL	✓ See note	✓	✓	✓
Amazon RDS for Oracle	✓ See note		✓	✓
Amazon RDS for PostgreSQL	✓ See note	✓	✓	✓
Amazon Redshift	✓ See note	✓	✓	✓
Amazon S3	✓ See note	✓	✓	✓
Apache Cassandra	✓ See note	✓	✓
Apache Cassandra for DataStage			✓
Apache Derby	✓ See note	✓	✓	✓
Apache HBase			✓
Apache HDFS	✓ See note	✓	✓
Connector	IBM Knowledge Catalog, Watson Studio	SPSS Modeler	DataStage	Data Virtualization
Apache Hive	✓ See note	✓	✓	✓
Apache Impala	✓ See note	✓	✓	✓
Apache Kafka	✓ See note		✓
Box	✓ See note	✓	✓
DataStax Enterprise			✓
Dremio	✓ See note		✓
Dropbox	✓ See note	✓	✓
Elasticsearch	✓ See note		✓
Exasol	✓ See note	✓	✓
File system			✓	✓
FTP (remote file system transfer)	✓ See note	✓	✓
Generic JDBC	✓ See note		✓	✓
Connector	IBM Knowledge Catalog, Watson Studio	SPSS Modeler	DataStage	Data Virtualization
Generic S3	✓ See note		✓
Google BigQuery	✓ See note	✓	✓	✓
Google Cloud Pub/Sub			✓
Google Cloud Storage	✓ See note	✓	✓
Google Looker	✓ See note	✓	✓
Greenplum	✓ See note	✓	✓	✓
HDFS via Execution Engine for Hadoop	✓ See note	✓
Hive JDBC			✓	✓
Hive via Execution Engine for Hadoop	✓ See note	✓
HTTP	✓ See note	✓	✓
Connector	IBM Knowledge Catalog, Watson Studio	SPSS Modeler	DataStage	Data Virtualization
IBM Cloud Data Engine	✓ See note	✓	✓
IBM Cloud Databases for MongoDB	✓ See note	✓	✓	✓
IBM Cloud Databases for MySQL	✓ See note	✓	✓	✓
IBM Cloud Databases for PostgreSQL	✓ See note	✓	✓	✓
IBM Cloud Object Storage	✓ See note	✓	✓	✓
IBM Cloud Object Storage (infrastructure)	✓ See note	✓
IBM Cloudant	✓ See note	✓
IBM Cognos Analytics	✓ See note	✓	✓
IBM Data Virtualization	✓ See note	✓	✓
IBM Data Virtualization Manager for z/OS	✓ See note	✓	✓	✓
Connector	IBM Knowledge Catalog, Watson Studio	SPSS Modeler	DataStage	Data Virtualization
IBM Db2	✓ See note	✓	✓	✓
IBM Db2 for DataStage			✓
IBM Db2 Big SQL	✓ See note	✓	✓	✓
IBM Db2 for i	✓ See note	✓	✓	✓
IBMDb2 for z/OS	✓ See note	✓	✓	✓
IBM Db2 on Cloud	✓ See note	✓	✓	✓
IBM Db2 Warehouse	✓ See note	✓	✓	✓
IBM Informix	✓ See note	✓	✓	✓
IBM Match 360	✓ See note		✓
Connector	IBM Knowledge Catalog, Watson Studio	SPSS Modeler	DataStage	Data Virtualization
IBM MQ			✓
IBM Netezza Performance Server	✓ See note	✓	✓	✓
IBM Netezza Performance Server for DataStage			✓
IBM Planning Analytics	✓ See note	✓	✓	✓
IBM Product Master	✓ See note
IBM SPSS Analytic Server	✓ See note	✓
IBM watsonx.data Presto	✓ See note		✓
Impala via Execution Engine for Hadoop	✓ See note	✓
MariaDB	✓ See note	✓	✓	✓
Microsoft Azure Blob Storage	✓ See note	✓	✓
Microsoft Azure Cosmos DB	✓ See note	✓	✓
Microsoft Azure Data Lake Storage	✓ See note	✓	✓
Microsoft Azure Databricks	✓ See note		✓
Connector	IBM Knowledge Catalog, Watson Studio	SPSS Modeler	DataStage	Data Virtualization
Microsoft Azure File Storage	✓ See note	✓	✓
Microsoft Azure SQL Database	✓ See note	✓	✓	✓
Microsoft Power BI (Azure)	IBM Knowledge Catalog
Microsoft Power BI Desktop	IBM Knowledge Catalog
Microsoft SQL Server	✓ See note	✓	✓	✓
Microsoft SQL Server Integration Services	IBM Knowledge Catalog
Microsoft SQL Server Reporting Services	IBM Knowledge Catalog
MicroStrategy	IBM Knowledge Catalog
Milvus	✓ See note
MinIO	✓ See note	✓	✓	✓
MongoDB	✓ See note	✓	✓	✓
MySQL (My SQL Community Edition) (My SQL Enterprise Edition)	✓ See note	✓	✓	✓
OData	✓ See note
ODBC		✓	✓
Oracle	✓ See note	✓	✓	✓
Oracle Database for DataStage			✓
Oracle Business Intelligence Enterprise Edition	IBM Knowledge Catalog
Oracle Data Integrator	IBM Knowledge Catalog
PostgreSQL	✓ See note	✓	✓	✓
Presto	✓ See note	✓	✓	✓
Connector	IBM Knowledge Catalog, Watson Studio	SPSS Modeler	DataStage	Data Virtualization
Qlik Sense	IBM Knowledge Catalog
Salesforce.com	✓ See note	✓	✓	✓
Salesforce API for DataStage			✓
SAP ASE	✓ See note	✓	✓	✓
SAP BAPI			✓
SAP BusinessObjects	✓ See note		✓
SAP Bulk Extract			✓
SAP Delta Extract			✓
SAP HANA	✓ See note	✓	✓	✓
SAP IDoc			✓
SAP IQ	✓ See note	✓	✓
SAP OData	✓ See note	✓	✓	✓
SingleStoreDB	✓ See note	✓	✓
Snowflake	✓ See note	✓	✓	✓
Storage volume	✓ See note	✓	✓
Tableau	✓ See note	✓	✓
Teradata	✓ See note	✓	✓	✓
Teradata database for DataStage			✓
Vertica	✓ See note		✓

Note: In the IBM Knowledge Catalog, Watson Studio column, this table shows the data sources that are supported in catalogs and projects. Some tools for these services support only a subset of those data sources. Follow the link for a specific data source to see the list of tools that support that data source. See also Supported connectors by tool.

Other data sources

An administrator can upload JDBC drivers to enable connections to more data sources. See Importing JBDC drivers for data sources.

The Data Virtualization service supports connections that are established by using third-party JDBC drivers.

Data files

In addition to using data from remote data sources or integrated databases, you can use data from files. You can work with data from the following types of files.

Type of data file	Supported in
Avro	DataStage IBM Knowledge Catalog SPSS Modeler Watson Studio
CSV	DataStage Decision Optimization IBM Knowledge Catalog SPSS Modeler Data Virtualization Watson Studio
JSON	DataStage Decision Optimization (JSON tabular form) IBM Knowledge Catalog Data Virtualization Watson Studio
Microsoft Excel spreadsheets	DataStage IBM Knowledge Catalog SPSS Modeler Data Virtualization Watson Studio
ORC	DataStage Data Virtualization
Parquet	DataStage IBM Knowledge Catalog Data Virtualization Watson Studio
SAS	SPSS Modeler Watson Studio (Data Refinery)
SAV	DataStage SPSS Modeler
TSV	DataStage IBM Knowledge Catalog Data Virtualization Watson Studio (Data Refinery)
XML	DataStage Decision Optimization (XML tabular form) SPSS Modeler

Connecting to data sources (by service)

Use the following resources to create connections in your application.

Cognos Dashboards

You can use CSV files, Microsoft Excel spreadsheets, connected data assets, and Data Virtualization assets as data sources for a dashboard. You must add all of these data sources to a project before you can use them as data sources.

Add data sources to a dashboard by clicking the Add a source (+) button in the Selected sources pane.

For more information, see Supported data sources for Cognos Dashboards.

Data Refinery

You can cleanse and refine tabular data with a graphical flow editor tool called Data Refinery. To refine data, you must add connections to your data sources and you must understand source file limitations. For more information, see Refining data (Data Refinery) and Supported data sources for Data Refinery.

Data Virtualization

You can create connections that can be used to virtualize data from the following locations:

The Platform connections page
The Data sources page in the Data Virtualization service.

For more information, see Connecting to data sources in Data Virtualization.

DataStage

DataStage uses connectors on the DataStage canvas to work with remote data sources. To connect to the data source, you need to create a connection asset for the associated DataStage connector before you can use it in DataStage.

For instructions on connecting to a remote data source in DataStage, see Connecting to a data source in DataStage.
For the list of available DataStage connectors, see Supported data sources in DataStage.
To add a local file such as a CSV file, see Adding data to a project.

Db2 Big SQL

You can create connections to query data from an object store or a remote Hadoop cluster. You connect to a data source when you provision a Db2 Big SQL instance.

For more information, see Creating a service instance for Db2 Big SQL.

Decision Optimization

You can use CSV, JSON (tabular form), XML (tabular form) or connected assets to build and deploy Decision Optimization models.

For more information, see Supported data sources for Decision Optimization.

IBM Knowledge Catalog

You can create connections that can be used in the catalog or in projects and connections that can be used to curate data. In general, you can create connections from the Platform connections page. In addition, you can create connections as follows:

Connections that can be used in a catalog from the catalog Assets page. For more information, see Adding a connection asset to a catalog.
Connections that can be used in projects from the Assets page of the project. For more information, see Adding data to an project.
Connections that can be used for metadata import in projects when you create the metadata import asset. For more information, see Managing metadata imports.

IBM watsonx.ai

When you tune a foundation model from the Tuning Studio, you add samples of foundation model input-and-output pairs as training data. You can get these samples from a JSON or JSONL file that you store in a connected data store or from tabular data that is stored in a connected database.

You must create the connection to the data source that you want to use before you can access it from the Tuning Studio. You can then add data directly from the connected data source or from a data asset that you create with data from the connected data source.

For more information about supported data sources, see Data formats for tuning foundation models.

You can create connections that can be used in projects from the following locations:

The Platform connections page
The Assets page of the project

For more information, see Adding data to a project.

SPSS Modeler

Data sources in the SPSS Modeler service support read-only access, read/write access, and SQL pushback.

The SPSS Modeler service also supports several other file types.

For more information, see Supported data sources for SPSS Modeler.

Synthetic Data Generator

Data sources in the Synthetic Data Generator service support read-only access and read/write access.

The Synthetic Data Generator service also supports several other file types.

For more information, see Supported data sources for Synthetic Data Generator.

Watson Machine Learning Accelerator

You can create connections that can be used in projects from the following locations:

The Connections page
The Assets page of the project

You can also add data from files. To add data from files, go to the Assets page of the project.

For more information, see Adding data to an project.

Watson Studio

Ideally, use data that is already in a catalog. Search for the data you want in a catalog and add it to a project.

Alternatively, you can create connections that can be used in projects from the following locations:

The Connections page
The Assets page of the project

You can also add data from files. To add data from files, go to the Assets page of the project.

For more information, see Adding data to an project.

Data Product Hub

You can add connections to access a broad selection of data sources to create and deliver data products.

For more information, see Connectors for Data Product Hub.