Connecting to data sources

You can connect to your data sources in IBM Cloud Pak® for Data at the platform level or at the service level.

Connecting to data sources at the platform level
Connecting to data sources at the service level

Connecting to data sources at the platform level

You can create connections that can be used by various services across the platform. Any user who has access to the platform can see these connections. However, only users with the credentials for the data source can use a connection.

These platform-level connections are available from the Platform connections page. However, the Platform connections page is available only if the Cloud Pak for Data common core services are installed.

Currently, the following services can use connections from the Platform connections page:

Cognos® Analytics
DataStage®
Watson™ Knowledge Catalog
Watson Query
Watson Studio
Many of the tools that work with Watson Studio can use data from these connections after the connection is added to a project.

Restriction: Not all services support the same connections. Most services support a subset of the connections that are supported by the platform. For more information, see Connecting to data sources (by service).

The Platform connections page is a specialized view of the Platform assets catalog. The connections that are defined on the Platform connections page are also included in the Platform assets catalog.

The Platform connections page shows the list of connections that can be used by various services on the platform. At a minimum, all users have the Viewer role on the catalog, which means that they can see the connections that are defined. For more information, see Managing collaborators on platform connections.

Required permissions: To create a platform-level connection, you must be an Editor or Administrator on the Platform assets catalog.

Tip: Work with your data source administrator to ensure that you have the correct information to connect to your data source.

Watch this video to see how to create a platform-level connection.

This video provides a visual method as an alternative to following the written steps in this documentation.

To create a platform-level connection:

Log in to the Cloud Pak for Data web client.
From the navigation menu, select Data > Platform connections.
Click New connection.
Select the data source that you want to connect to.
The following connections have additional requirements that must be met before you can use them:

Generic JDBC

If you want to connect to an unsupported data source by creating a Generic JDBC connection, a Cloud Pak for Data administrator must upload the JDBC drivers for that data source. For more information, see Importing JDBC drivers for data sources.

Storage volume

If you want to connect to a storage volume, such as an external NFS server or a persistent volume claim, a user with the Create service instances permissions must add the volume to Cloud Pak for Data. For more information, see Managing storage volumes.
Enter a name and description for the connection.
Enter the details for the connection.
The connector that you are creating determines the information that you must specify. Typically, a connection requires either:
- A hostname and port number
- A URL
You might also need to specify the database that you want to connect to.
Enter your credentials for the connection.
- If prompted, specify whether you want to use personal or shared credentials. You cannot change this option after you create the connection.
  
  Personal
  
  With personal credentials, each user must specify their own credentials to access the connection. Each user's credentials are saved but are not shared with any other users. Use personal credentials instead of shared credentials to protect credentials. For example, if you use personal credentials and another user changes the connection properties (such as the hostname or port number), the credentials are invalidated to prevent malicious redirection.
  
  Shared
  
  With shared credentials, all users access the connection with the credentials that you provide. The default setting is Shared. Shared credentials can potentially be retrieved by a user who has access to the connection asset. Because the credentials are shared, it is difficult to audit access to the connection, to identify the source of data loss, or identify the source of a security breach. An administrator can disable shared credentials.
- The connector determines the credentials that you must specify. Typically, a connection requires a username and password or an API key and secret key. Some data sources allow you to connect anonymously.
- You might need to specify how you want to provide your credentials. The options that are available depend on how the platform is configured.
  
  Enter credentials manually
  
  With this option, you manually enter your credentials in the web client. The platform stores these credentials and uses them to authenticate you.
  This is the default method for entering credentials. However, an administrator can optionally disable this method. For more information, see Requiring users to use secrets for credentials when creating connections.
  
  Use secrets from a vault
  
  With this option, you select the secrets that contain the appropriate credentials. For example, if you need to specify your username and password, select the secret that contains your username and the secret that contains your password. The platform uses the secrets (which are stored in a vault) to authenticate you.
  If you are using secrets from an external vault, you must have the appropriate permissions to connect to external vaults or an administrator must share the appropriate secrets with you. For more information, see Managing secrets and vaults.
  
  Use my platform login credentials
  
  With this option, the platform uses your platform credentials to authenticate you.
  This option is available only if the data source is a service that is deployed on the instance of Cloud Pak for Data where you are creating the connection.
Set Mask sensitive credentials retrieved through API calls to On if you want to prevent users, including the owner of the connection, from retrieving unmasked sensitive credentials through API calls. This setting has no effect on the connection form itself. The following tools support this setting:
- Analytics Engine powered by Apache Spark
- AutoAI (Watson Machine Learning)
- Data Refinery (Watson Studio)
- Decision Optimization
- IBM® Match 360 with Watson
- Notebooks (Watson Studio). Update any notebooks that reference the connection through API calls to use the Flight service.
If applicable, specify the SSL information required to connect to your data source.
Some data sources require you to use SSL for secure communication. Other data sources support it but do not require it. Ensure that you understand what information you need to provide to communicate securely with your data source:
- If you specified a port number that is configured to accept SSL connections, ensure that you select The port is configured to accept SSL connections
- If the data source uses a self-signed certificate, you must specify the contents of the certificate to enable secure communication between Cloud Pak for Data and the data source.
- If your data source uses chained certificates, you can specify the contents of multiple certificates.
Some services can use an SSL certificate that is stored as a secret. If you are using secrets from an external vault, you must have the appropriate permissions to connect to external vaults or an administrator must share the appropriate secrets with you. For more information, see Managing secrets and vaults.

Connecting to data sources at the service level

Typically, if you create a connection at the service level, the connection is accessible only from the service where it is created.

Service	Learn more
Cognos Dashboards	You can use CSV files, Microsoft Excel spreadsheets, connected data assets, and Watson Query assets as data sources for a dashboard. You must add all of these data sources to a project before you can use them as data sources. Add data sources to a dashboard by clicking the Add a source (+) button in the Selected sources pane. For more information, see Supported data sources for Cognos Dashboards.
DataStage	DataStage uses connectors on the DataStage canvas to interact with remote data sources. To connect to the data source, you need to create a project connection asset for the associated DataStage connector before you can use it in DataStage. For instructions on connecting to a remote data source in DataStage, see Connecting to a data source in DataStage. For the list of available DataStage connectors, see Supported data sources in DataStage. To add a local file such as a CSV file, see Adding data to a project.
Watson Knowledge Catalog	You can create connections that can be used in the catalog and connections that can be used to curate data. Add connections that can be used in a catalog from the catalog Overview page. You can create new connections or pick from existing platform-level connections. For more information, see Adding a connection asset to a catalog (Watson Knowledge Catalog). When you publish a data asset to a catalog, the connection is published along with it, unless the connection exists in the catalog. For connections that can be used to curate data, you can create connections as follows: From the Platform connections page. You can pick from those platform-level connections when you set up a metadata import. When you set up a new metadata import from a project's Assets page.
Watson Query	You can create connections that can be used to virtualize data from the following locations: The Platform connections page The Data sources page in the Watson Query service For more information, see Adding data sources (Watson Query).
Watson Studio	Ideally, you should use data that is already in a catalog. Search for the data you want in a catalog and add it to a project. Alternatively, you can create connections that can be used in projects from the following locations: The Platform connections page Among other data sources, you can connect to storage volumes to add other content, including temporary files. By using storage volumes, data files can be shared across projects and between collaborators in git-based projects. For details, see Manage storage volumes and Managing persistent volume instances with the Volumes API. The Assets page of the project You can also add data from files. To add data from files, go to the Assets page of the project. The initial storage limitation of assets is 100 GB across all projects, spaces, and catalogs. For more information, see Adding data to a project.