Setting up reporting for Watson Knowledge Catalog (Watson Knowledge Catalog)
You can generate reports to get insights about the Watson Knowledge Catalog data. The data from which you generate reports is saved in an external database. This data captures information about catalogs, projects, and governance artifacts.
Example reports can show the following information:
- The list of catalog assets with no business terms assigned
- The number of catalog assets that are grouped by tag and asset type
- The list of catalog assets with a particular business term is assigned
- The list of assets where a particular user is assigned as a steward
The external database into which the Watson Knowledge Catalog data is written is a relational database. You can use SQL to query for the needed data. You can connect to a reporting tool to generate reports, for example to Cognos Analytics. Sample queries are provided in the Sample queries section.
To ensure that sensitive data is protected, you can send data to various schemas on the database, and restrict access to these schemas.
Supported database types are:
-
Db2
-
PostgreSQL version 12, and later
Only top-level categories can be enabled for reporting. Subcategories use the same schema as their top-level categories.
Prerequisites
A nonvaulted connection to a supported type of database is required. If no connection is defined in the platform connections, contact your administrator to configure Platform assets catalog.
Required permissions
To configure the reporting, you must have these user roles:
- Reporting Administrator
- Manager
If you have the Administrator role for the IBM Cloud account, you have permissions to generate reports.
Configure the reporting settings
To set up the reporting:
- Go to Administration > Catalogs > Reports setup.
- Select a connection to a database.
- Select the default schema. All data is written to the selected schema by default, but you can edit the schema for a particular catalog, project, or category. Make sure that the database users have write access to the selected schemas.
- Enable the catalogs, projects, categories, or data protection rules for which you want to run the reports.
- Click Save settings.
- Click Start reporting.
The data is sent to the selected database, and you can start generating reports with SQL queries. Refer to the data model diagram to get started with the queries.
The data is automatically synchronized between Watson Knowledge Catalog and the database. Any change in the catalog, project, category, or data protection rule that is enabled for reporting is reflected on the database.
You can modify the settings. You can change the schema, and enable or disable data for reporting. The reports are updated after you save the new settings and update the reporting.
If the synchronization fails after you modify the settings, the synchronization is still established for the previous settings. In some cases, synchronization for a particular item fails and the rest items that are in queue are not synchronized as well. You can choose to restart the synchronization for only items that failed, for items that failed and items in queue, and for all configured items. Depending on the option you choose, the process might take a while.
If you want to remove the reporting data from the database and start with a new configuration, complete these steps:
- Click Stop reporting. The data is no longer synchronized and it is deleted from the database. The existing reporting settings are retained.
- Click Reset settings. The settings are restored to the default state. You can then define a new connection and configure the reporting in a different way.
Data model
The following image shows the data model that you can use to create SQL queries to generate reports on Watson Knowledge Catalog data.
For more information about the SQL tables, see:
- Description of Db2 SQL tables that are used for Watson Knowledge Catalog reporting
- Description of PostgreSQL SQL tables that are used for Watson Knowledge Catalog reporting
You can use a database storage estimator to calculate the size of your target database. Add your own data and the estimated storage size of the database is calculated automatically. Download the database storage estimator template.
Sample queries
Check the following use cases for reporting and the example SQL queries to create the reports.
Number of governance artifacts grouped by the type of the artifact:
SELECT ARTIFACT_TYPE, COUNT(*) TOTAL FROM GOVERNANCE_ARTIFACTS GROUP BY ARTIFACT_TYPE
Number of catalog assets grouped by type:
SELECT ASSET_TYPE, COUNT(*) AS TOTAL FROM CONTAINER_ASSETS GROUP BY ASSET_TYPE
Number of catalog assets that are grouped by tag and the type of the asset:
SELECT TAG_NAME, ASSET_TYPE, COUNT(*) TOTAL FROM ASSET_TAGS,CONTAINER_ASSETS WHERE ASSET_TAGS.ASSET_ID = CONTAINER_ASSETS.ASSET_ID GROUP BY TAG_NAME,ASSET_TYPE
List of catalog assets that do not have a description:
SELECT * FROM CONTAINER_ASSETS WHERE DESCRIPTION IS NULL OR DESCRIPTION = '' AND CONTAINER_TYPE='catalog'
List of governance artifacts that do not have a description:
SELECT * FROM GOVERNANCE_ARTIFACTS WHERE DESCRIPTION IS NULL
Glossary term distribution across assets in catalog:
SELECT GA.NAME, AGGR_TABLE.TOTAL FROM GOVERNANCE_ARTIFACTS AS GA LEFT OUTER JOIN (SELECT ASSOCIATED_ARTIFACT_ID, COUNT(*) AS TOTAL FROM GOVERNANCE_ARTIFACT_CATALOG_ASSOCIATIONS WHERE ASSOCIATED_ARTIFACT_TYPE = 'glossary_term' GROUP BY ASSOCIATED_ARTIFACT_ID) AS AGGR_TABLE ON GA.ARTIFACT_ID = AGGR_TABLE.ASSOCIATED_ARTIFACT_ID WHERE GA.ARTIFACT_TYPE = 'glossary_term'
List of catalog assets with no business terms assigned:
SELECT HEADER, COUNT(*) AS TOTAL FROM (SELECT CASE WHEN GOVERNANCE_ARTIFACT_CONTAINER_ASSOCIATIONS.ASSOCIATED_ARTIFACT_ID IS NULL THEN 'NO TERMS' ELSE 'HAS ATLEAST ONE TERM' END AS HEADER FROM CONTAINERS,CONTAINER_ASSETS LEFT OUTER JOIN GOVERNANCE_ARTIFACT_CONTAINER_ASSOCIATIONS ON CONTAINER_ASSETS.ASSET_ID = GOVERNANCE_ARTIFACT_CONTAINER_ASSOCIATIONS.ASSET_ID AND GOVERNANCE_ARTIFACT_CONTAINER_ASSOCIATIONS.ASSOCIATED_ARTIFACT_TYPE = 'glossary_term' WHERE CONTAINERS.CONTAINER_ID=CONTAINER_ASSETS.CONTAINER_ID AND CONTAINERS.CONTAINER_TYPE='catalog') GROUP BY HEADER
List of catalog assets with a particular user assigned as a collaborator:
SELECT CONTAINER_ASSETS.CONTAINER_ID, CONTAINER_ASSETS.ASSET_ID, CONTAINER_ASSETS.NAME, CONTAINER_ASSETS.ASSET_TYPE FROM ASSET_COLLABORATORS,CONTAINER_ASSETS WHERE USER_ID = 'IBMid-50DW59BCGD' AND CONTAINER_ASSETS.CONTAINER_TYPE='catalog' AND CONTAINER_ASSETS.ASSET_ID = ASSET_COLLABORATORS.ASSET_ID AND CONTAINER_ASSETS.CONTAINER_ID = ASSET_COLLABORATORS.CONTAINER_ID
Percentage of governance artifacts with stewards assigned:
SELECT COUNT(*) AS TOTAL_COUNT, ARTIFACT_WITH_STEWARD_COUNT FROM GOVERNANCE_ARTIFACTS LEFT OUTER JOIN (SELECT COUNT(*) AS ARTIFACT_WITH_STEWARD_COUNT FROM GOVERNANCE_ARTIFACTS LEFT OUTER JOIN GOVERNANCE_ARTIFACT_STEWARDS ON GOVERNANCE_ARTIFACTS.ARTIFACT_ID = GOVERNANCE_ARTIFACT_STEWARDS.ARTIFACT_ID WHERE GOVERNANCE_ARTIFACT_STEWARDS.USER_ID IS NOT NULL) ON 1=1 GROUP BY ARTIFACT_WITH_STEWARD_COUNT
List of related terms of a particular business term:
SELECT TEMPTABLE.NAME AS END1_NAME, RELATIONSHIP_TYPE, SGA.NAME AS END2_NAME FROM (SELECT GA.NAME, BTA.END2_ARTIFACT_ID, BTA.RELATIONSHIP_TYPE FROM BUSINESS_TERM_ASSOCIATIONS AS BTA LEFT OUTER JOIN GOVERNANCE_ARTIFACTS AS GA ON GA.ARTIFACT_ID = BTA.END1_ARTIFACT_ID ORDER BY BTA.END1_ARTIFACT_ID) AS TEMPTABLE LEFT OUTER JOIN GOVERNANCE_ARTIFACTS AS SGA ON TEMPTABLE.END2_ARTIFACT_ID = SGA.ARTIFACT_ID
Hierarchy for a particular top-level category:
WITH RPL (END1_CATEGORY_ID, END2_CATEGORY_ID,DEPTH) AS (SELECT ROOT.END1_CATEGORY_ID, ROOT.END2_CATEGORY_ID, 0 FROM CATEGORY_ASSOCIATIONS ROOT WHERE ROOT.END1_CATEGORY_ID = 'b11e64a2-5466-4c60-bfc5-121c7a80703d' UNION ALL SELECT CHILD.END1_CATEGORY_ID, CHILD.END2_CATEGORY_ID, PARENT.DEPTH + 1 FROM RPL PARENT, CATEGORY_ASSOCIATIONS CHILD WHERE PARENT.END2_CATEGORY_ID = CHILD.END1_CATEGORY_ID AND DEPTH < 1000) SELECT TEMP.DEPTH, TEMP.NAME AS END1_NAME, CATEGORIES.NAME AS END2_NAME, TEMP.END1_CATEGORY_ID AS END1, TEMP.END2_CATEGORY_ID AS END2 FROM (SELECT END1_CATEGORY_ID, NAME, END2_CATEGORY_ID, DEPTH FROM RPL LEFT OUTER JOIN CATEGORIES ON RPL.END1_CATEGORY_ID = CATEGORIES.CATEGORY_ID) AS TEMP LEFT OUTER JOIN CATEGORIES ON CATEGORIES.CATEGORY_ID = TEMP.END2_CATEGORY_ID ORDER BY DEPTH;
Number of policies that are assigned to a particular governance rule:
SELECT ENFORCEMENT_RULES.NAME, COUNT FROM (SELECT ENFORCEMENT_RULES.RULE_ID AS RULE_ID, COUNT(ARTIFACT_ENFORCEMENT_RULE_ASSOCIATIONS.ARTIFACT_ID) AS COUNT FROM ENFORCEMENT_RULES LEFT OUTER JOIN ARTIFACT_ENFORCEMENT_RULE_ASSOCIATIONS ON ENFORCEMENT_RULES.RULE_ID = ARTIFACT_ENFORCEMENT_RULE_ASSOCIATIONS.RULE_ID GROUP BY ENFORCEMENT_RULES.RULE_ID) AS TEMPTABLE LEFT OUTER JOIN ENFORCEMENT_RULES ON ENFORCEMENT_RULES.RULE_ID = TEMPTABLE.RULE_ID
Trend of data quality score of a catalog asset over a given period of time:
For Db2:
SELECT CA.NAME AS ASSET_NAME, TEMP.TECH_START AS UPDATE_TIME, TEMP.TECH_START,TEMP.ASSET_ID, TEMP.TECH_END, QUALITY_SCORE FROM CONTAINER_ASSETS AS CA,(SELECT CONTAINER_DATA_ASSETS.ASSET_ID,CONTAINER_DATA_ASSETS.QUALITY_SCORE,CONTAINER_DATA_ASSETS.TECH_START, CONTAINER_DATA_ASSETS.TECH_END FROM CONTAINER_DATA_ASSETS FOR SYSTEM_TIME FROM '2021-08-19-00.00.00.000000000000' TO '9999-12-30-00.00.00.000000000000' WHERE CONTAINER_DATA_ASSETS.ASSET_ID = '85a89a9c-8fd3-4306-8931-cd15c41ae1d4' AND QUALITY_SCORE IS NOT NULL) AS TEMP WHERE TEMP.ASSET_ID = CA.ASSET_ID -- NOT NULL
For PostgreSQL:
select
CA.NAME as ASSET_NAME,
TEMP.TECH_START as UPDATE_TIME,
TEMP.TECH_START,
TEMP.ASSET_ID,
TEMP.TECH_END,
TEMP.QUALITY_SCORE
from
CONTAINER_ASSETS as CA,
(
select
CONTAINER_DATA_ASSETS.ASSET_ID,
CONTAINER_DATA_ASSETS.QUALITY_SCORE,
CONTAINER_DATA_ASSETS.TECH_START,
CONTAINER_DATA_ASSETS.TECH_END
from
CONTAINER_DATA_ASSETS
where
CONTAINER_DATA_ASSETS.ASSET_ID = '130ab82a-c3a4-4fbc-880b-51bc49517ebe'
and CONTAINER_DATA_ASSETS.QUALITY_SCORE is not null
union
select
HIST_CONTAINER_DATA_ASSETS.ASSET_ID,
HIST_CONTAINER_DATA_ASSETS.QUALITY_SCORE,
HIST_CONTAINER_DATA_ASSETS.TECH_START,
HIST_CONTAINER_DATA_ASSETS.TECH_END
from
HIST_CONTAINER_DATA_ASSETS
where
HIST_CONTAINER_DATA_ASSETS.ASSET_ID = '130ab82a-c3a4-4fbc-880b-51bc49517ebe'
and HIST_CONTAINER_DATA_ASSETS.QUALITY_SCORE is not null
) as temp
where
TEMP.ASSET_ID = CA.ASSET_ID
Learn more
- Description of Db2 SQL tables that are used for Watson Knowledge Catalog reporting
- Description of Postgres SQL tables that are used for Watson Knowledge Catalog reporting
- Projects
- Catalogs
- Categories
- Data protection rules
- Governance artifacts
- Connecting to data sources
Parent topic: Administering a catalog