Defining database schemas as identical

You can define a database schema as identical to other database schemas or data file folders. Tables and columns that are contained by each of the identical assets are also defined as identical when their names match.

Before you begin

You must have the Information Governance Catalog Information Asset Administrator role.

About this task

Database and data file asset types can be imported into the catalog by different means, such as by a connector and by a bridge. As a result, they can exist in the catalog as different assets even though they are identical.

For example, a metadata administrator might use IBM® InfoSphere® Metadata Asset Manager to import into the catalog Schema1 of database DW on host DataServer and Schema1 of database DW on host BIServer. A data analyst imports IBM InfoSphere Information Analyzer Schema1 of database DW on host ProdDOS. The data analyst does not realize that the metadata administrator already imported this schema. All three database schemas exist as distinct entities in the catalog. When you specify that the database schemas are identical, you enable lineage to continue the data flow. To merge the database schemas and eliminate distinct entities in the catalog, see Merging assets in IBM InfoSphere Metadata Asset Manager.

Similarly, you can indicate which data file folder on HDFS is the storage location for a HIVE database schema. Data lineage will show the Hive database schema, database table, and database columns, assuming that this identity is the preferred representation.

If a database schema is selected as the preferred schema, it is the one that is displayed in the lineage report. Otherwise, which of the identical database schemas is displayed in the lineage report is arbitrary. Therefore, it is best to define a preferred database schema.

Procedure

  1. Click Administration.
  2. Expand Lineage Management.
  3. Click Lineage Administration.
  4. Click Same as Database Schemas in the left pane.
  5. In the Same as Database Schemas pane, select the database schema that you need to mark as identical to another asset, and then click Define Identical Data Sources.
  6. In the Same as Data Sources list, select either database schema or data file folder as the asset type. Then, select the asset that you want to define as identical to the database schema that you selected in step 5.
  7. Click Save.
    In the Details page of the database schema or data file folder, the identical asset is added to the Same as Data Sources list.