Supported data sources for metadata import, metadata enrichment, and data quality rules (Watson Knowledge Catalog)
The following table lists the data sources from which you can import metadata, against which you can run metadata enrichment or data quality rules, and to which you can write the output of data quality rules.
- Required permissions
- Users must be authorized to access the connections to the data sources. For metadata import, the user running the import must have the SELECT or a similar permission on the databases in question.
For metadata import with one of the Discover goals, certain connectors are supported only if the advanced metadata import feature is enabled. These connectors are marked with an asterisk (*). A MANTA Automated Data Lineage for IBM Cloud Pak for Data license key is not required for such imports. For lineage capture, the advanced metadata import and the knowledge graph features must be enabled, and a MANTA Automated Data Lineage for IBM Cloud Pak for Data license key must be installed.
For running metadata enrichment or data quality rules and for writing rule output, a corresponding connection asset must exist in the project. If the asset types imported from a specific connection don't allow for enrichment or running data quality rules, not applicable (abbreviated to N/A) is shown in the Metadata enrichment and rules-related columns. A dash (—) in a column indicates that the data source is not supported for this purpose.
By default, data quality rules and the underlying DataStage flows support standard platform connections. Not all connectors that were supported in traditional DataStage and potentially used in custom DataStage flows are supported in Watson Knowledge Catalog.
In general, the following data formats are supported:
- All: Tables from relational and nonrelational data sources
- Metadata import: Any format from file-based connections to the data sources and tool-specific formats from connections to external tools. For Microsoft Excel workbooks, each sheet is imported as a separate data asset. The data asset name equals the name of the Excel sheet.
- Metadata enrichment: Tabular: CSV, TSV, Avro, Parquet, Microsoft Excel (For workbooks uploaded from the local file system, only the first sheet in a workbook is profiled.)
- Data quality rules: Tabular: Avro, CSV, Parquet, ORC; for data assets uploaded from the local file system, CSV only
Connector | Metadata import (discovery) |
Metadata import (lineage) |
Metadata enrichment | Bindings in rules created from data quality definitions | SQL-based rules | Output tables |
---|---|---|---|---|---|---|
Amazon RDS for MySQL | ✓ | — | ✓ | — | — | — |
Amazon RDS for Oracle | ✓ | — | — | — | — | — |
Amazon RDS for PostgreSQL | ✓ | — | ✓ | — | — | — |
Amazon Redshift | ✓ | ✓ | ✓ | ✓ | ✓ | — |
Amazon S3 | ✓ | — | ✓ | ✓ 4 | — | — |
Apache Cassandra | ✓ | — | ✓ | ✓ | ✓ | — |
Apache HDFS | ✓ | — | ✓ | ✓ | — | — |
Apache Hive | ✓ | ✓ 7 | ✓ | ✓ | ✓ | ✓ 5 |
Apache Kafka | ✓ | — | — | — | — | — |
Box | ✓ | — | ✓ | — | — | — |
Cloudera Impala | ✓ | — | ✓ | — | — | — |
FTP | ✓ | — | — | — | — | — |
Generic S3 | ✓ | — | ✓ | — | — | — |
Google BigQuery | ✓ | ✓ 6 | ✓ | ✓ | ✓ | — |
Greenplum | ✓ | ✓ | ✓ | ✓ | ✓ | — |
Connector | Metadata import (discovery) |
Metadata import (lineage) |
Metadata enrichment | Bindings in rules created from data quality definitions | SQL-based rules | Output tables |
IBM Cloud Data Engine | ✓ | — | ✓ | — | — | — |
IBM Cloud Databases for MongoDB | ✓ | — | ✓ | — | — | — |
IBM Cloud Databases for MySQL | ✓ | — | ✓ | — | — | — |
IBM Cloud Databases for PostgreSQL | ✓ | ✓ |
✓ | — | — | — |
IBM Cloud Object Storage | ✓ | — | ✓ | — | — | — |
IBM Cognos Analytics 8 | ✓ | ✓ | — | — | — | — |
IBM Data Virtualization Manager for z/OS 1 | ✓ | — | ✓ | ✓ | ✓ | — |
IBM Db2 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
IBM Db2 Big SQL | ✓ | — | ✓ | — | — | — |
IBM Db2 for i | ✓ | — | — | — | — | — |
IBM Db2 for z/OS | ✓ | — | ✓ | — | — | — |
IBM Db2 on Cloud | ✓ | ✓ |
✓ | — | — | ✓ |
IBM Db2 Warehouse | ✓ | — | ✓ | ✓ | ✓ | — |
IBM Match 360 | — | — | — | ✓ | — | — |
IBM Informix | ✓ | — | ✓ | — | — | — |
IBM Netezza Performance Server | ✓ | ✓ | ✓ | ✓ | ✓ | — |
IBM Watson Query | ✓ | — | ✓ | ✓ | ✓ | — |
Connector | Metadata import (discovery) |
Metadata import (lineage) |
Metadata enrichment | Bindings in rules created from data quality definitions | SQL-based rules | Output tables |
MariaDB | ✓ | — | ✓ | — | — | — |
Microsoft Azure Data Lake Storage | ✓ | — | ✓ | ✓ | — | — |
Microsoft Azure SQL Database | ✓ | — | ✓ | — | — | — |
Microsoft Power BI (Local) (*) | ✓ | ✓ | N/A | N/A | N/A | N/A |
Microsoft Power BI (Azure) (*) | ✓ | ✓ | N/A | N/A | N/A | N/A |
Microsoft SQL Server | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Microsoft SQL Server Integration Services (*) |
✓ | ✓ | — | — | — | — |
Microsoft SQL Server Reporting Services (*) |
✓ | ✓ | — | — | — | — |
MongoDB | ✓ | — | ✓ | ✓ | ✓ | — |
MySQL | ✓ | — | ✓ | ✓ | ✓ | — |
Oracle 2 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Oracle Business Intelligence Enterprise Edition (*) |
✓ | ✓ | — | — | — | — |
Oracle Data Integrator (*) |
✓ | ✓ | — | — | — | — |
Connector | Metadata import (discovery) |
Metadata import (lineage) |
Metadata enrichment | Bindings in rules created from data quality definitions | SQL-based rules | Output tables |
PostgreSQL | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Presto | ✓ | — | ✓ | ✓ | ✓ | — |
Qlik Sense (*) |
✓ | ✓ | — | — | — | — |
Salesforce.com | ✓ | — | ✓ 3 | — | — | — |
SAP ASE | ✓ | — | ✓ | ✓ | ✓ | — |
SAP HANA | ✓ | — | ✓ | ✓ | ✓ | — |
SAP IQ | ✓ | — | ✓ | — | — | — |
SAP OData | ✓ | — | ✓ | — | — | — |
Snowflake | ✓ | ✓ | ✓ | ✓ | ✓ | — |
Storage volume |
✓ | — | — | — | — | — |
Tableau (*) | ✓ | ✓ | N/A | N/A | N/A | N/A |
Teradata | ✓ | ✓ | ✓ | ✓ | ✓ | — |
watsonx.data |
✓ | — | — | — | — | — |
Notes:
1 With Data Virtualization Manager for z/OS, you add data and COBOL copybooks assets from mainframe systems to catalogs in IBM Cloud Pak for Data. Copybooks are files that describe the data structure of a COBOL program. Data Virtualization Manager for z/OS helps you create virtual tables and views from COBOL copybook maps. You can then use these virtual tables and views to import and catalog mainframe data from mainframes into IBM Cloud Pak for Data in the form of data assets and COBOL copybook assets.
The following types of COBOL copybook maps are not imported: ACI, Catalog, Natural
When the import is finished, you can go to the catalog to review the imported assets, including the COBOL copybook maps, virtual tables, and views. You can use these assets in the same ways as other assets in Cloud Pak for Data.
For more information, see Adding COBOL copybook assets.
2 Table and column descriptions are imported only if the connection is configured with one of the following Metadata discovery options:
- No synonyms
- Remarks and synonyms
3 Some objects in the SFORCE schema are not supported. See Salesforce.com.
4 CSV files only.
5 Required minimum version for metadata-enrichment output tables is release 3.0.0.
6 Connections must be configured with the authentication method Account key (full JSON snippet)
.
7 Hive connections with Kerberos authentication require some prerequisite configurations. See Configuring Hive with Kerberos for lineage imports.
8 Cognos Analytics connections that use secrets from a vault as credentials cannot be used for metadata import.
Other data sources
An administrator can upload JDBC drivers to enable connections to more data sources. See Generic JDBC.
Metadata import for discovery supports connections that are established by using third-party JDBC drivers.
Metadata enrichment also can be run on data assets from connections that are established by using third-party JDBC drivers for the following data sources:
- Amazon Redshift
- Snowflake
- Teradata
You can run data quality rules on data assets from connections that are established by using third-party JDBC drivers for the following sources:
- Apache Cassandra
- Apache Hive
- Apache Kudu
- Databricks
If the advanced metadata import feature is enabled, metadata import can also import these types of data to catalogs:
-
Starting with Cloud Pak for Data 4.7.2, business intelligence assets from the following tools:
- Microsoft SQL Server Analysis Services
- Statistical Analysis System (SAS)
-
Data integration assets from the following tools:
- DataStage on Cloud Pak for Data
- Informatica PowerCenter
- InfoSphere DataStage
- Talend
-
Data models that were created in the following tools:
- ER/Studio
- erwin Data Modeler
- SAP PowerDesigner
No MANTA Automated Data Lineage for IBM Cloud Pak for Data license key is required for importing data models or data integration assets without lineage.
Learn more
- Importing metadata
- Enriching your data assets
- Creating rules from data quality definitions
- Creating SQL-based rules
Parent topic: Curation