Manta Flow Informatica EDC User Documentation

Definition of Exported Entities

Manta Flow analyzes SQL scripts (meaning procedures, view definitions, macros, ad-hoc scripts, etc.) and ETL, analytical, and reporting tools. It then imports metadata about every interesting SQL statement (basically, that means every SQL query which directly transfers data — inserts, updates, deletes, etc.), all ETL transformations, and all analytical models and reports to Informatica EDC. The export is based on the latest revision in the Manta repository. Manta Flow creates new assets for each transformation described. These new assets have hierarchical parents, so they are well arranged and easy to find. These assets are also connected to the database or file columns whose data has been transferred.

The documentation below utilizes screenshots from IBM Automatic Data Lineage to illustrate which entities are and are not exported to EDC. The ones that are not exported are tagged with a red X. Please note that the lineage through a non-exported object is exported if the non-exported object is between two exported assets. See the first screenshot for an example.

Entities Exported by Connector Type

No alt text provided

No alt text provided

No alt text provided

Entities That Are Not Exported

Oracle specific exception - external system cursors (created in external systems such as a Java application) don't require target assets .

No alt text provided

No alt text provided

Exceptions

No alt text provided

Export Model

The diagrams below describe a general domain model of Informatica EDC — the classes and their associations that Automatic Data Lineage uses in EDC. Most of the classes are not used directly. Instead, inherited classes with more specific semantic meanings are used.

For EDC in versions prior to 10.4.1:

No alt text provided

As of EDC version 10.4.1:

No alt text provided

The diagrams below show the definite classes used by Manta for export to EDC.

For EDC in versions prior to 10.4.1:

No alt text provided

As of EDC version 10.4.1:

No alt text provided

Export Modes

At the moment, two modes of export to EDC are available in Automatic Data Lineage.

Standard Export

All the data exported to EDC shares the same context, thus all the data assets in EDC that are connected by lineage can ultimately be displayed in one lineage diagram in EDC. This approach has both pros and cons. It is easy to see all parts of the environment that are connected in one diagram. But if the diagram is too large, it might take a long time to render it, and the diagram can be congested and hard to navigate.

If indirect lineage export is turned on (as of Informatica EDC version 10.4.0 EBF-17357), indirect lineage in Manta terminology (i.e., lineage through the WHERE condition, GROUP BY, HAVING, JOIN clauses) is exported as EDC control flow.

Below is an example of how assets are exported to EDC when standard export is used.

No alt text provided

Detailed Lineage Export

If the parts of the environment that are connected are too large to be effectively rendered in EDC, it is better to use detailed lineage export mode. In this case, the lineage between data assets is exported in separate contexts. The lineage diagrams in EDC have two levels.

If indirect lineage export is turned on (as of Informatica EDC version 10.4.0 EBF-17357), indirect lineage in Manta terminology (i.e., lineage through the WHERE condition, GROUP BY, HAVING, JOIN clauses) is exported as EDC control flow.

Below is an example of how assets are exported to EDC when detailed lineage export is used.

No alt text provided

Context Lineage Export

Context lineage export is an export of all database transformation objects (not just the last ones) that does not mix up the lineage for different calls of the transformations.

This feature requires EDC version 10.4.1 or newer.

Below is an example of how assets are exported to EDC when context lineage export is used.

No alt text provided

Browse Assets

To find an EDC resource containing assets uploaded by Automatic Data Lineage, use the search options on the EDC main screen. At the moment, Automatic Data Lineage creates:

The data assets loaded by EDC (typically database data assets) are in the EDC native resource (with the same name without any suffix). The data assets loaded by Automatic Data Lineage (typically reports and analytical tool assets) are in the Manta Scripts resource in EDC. If the export of node source code at input level (manta.iedc.exportInputLevelSourceCode) is turned on, an Expression attribute containing the SQL code is exported for procedure, function, script, etc. providing input-level code.

  1. Find the EDC resource.

No alt text provided

  1. Choose the type of objects you want to start browsing (in this case Procedures).

No alt text provided

  1. Find a procedure by name.

No alt text provided

  1. Navigate to the child assets as long as needed.

No alt text provided

  1. Some assets along the way have additional attributes filled in by Manta. Below is a database statement with an Expression attribute containing the SQL code of the statement and a statement column asset with the Expression attribute containing extracted transformation logic for a particular column.

No alt text provided

No alt text provided

  1. The following shows an Expression attribute containing the input-level SQL code of an Insert Into statement contained in an insert.sql script.

No alt text provided

Lineage Diagrams

Lineage diagrams can be rendered for particular column-level or table-level data assets (viable for both standard export and detailed lineage export) or for leaf-level (transformation column) or operation-level (e.g., statement) transformation assets (viable only for standard export).

  1. Lineage diagrams are available under the Lineage and Impact tab on the asset overview screen.

No alt text provided

  1. Based on the starting asset, a table-level or column-level lineage diagram is shown. By default, EDC will hide some of the assets that are part of the lineage graph.

No alt text provided

  1. To see all the objects that are part of the diagram, adjust the Lineage (for upstream data lineage) and Impact (for downstream data lineage) sliders. The diagrams below show the full detail for standard export mode (1) and detailed lineage export mode (2).

No alt text provided

No alt text provided

  1. [Detailed lineage export mode only:] To see the detailed lineage context entry points press the Show Transformation Logic in Lineage button.

No alt text provided

  1. [Detailed lineage export mode only:] Then click on the orange circle to enter the detailed lineage context of a particular association.

No alt text provided

No alt text provided

Transformation Sources and Target View

As of Informatica EDC version 10.2.2hf1, a tabular summary of the source and target assets (tables, views, synonyms, etc.) for a selected transformation is available.

Limitations: This feature only relates to the table level, not to the column level. Only direct (immediate) sources and targets are displayed. Transitive sources and targets are not supported.

  1. Display the Lineage and Impact of a transformation asset at the table level.

  2. Click the Open the Tabular Asset Summary icon.

No alt text provided

  1. Click Asset Lineage Summary to see a list of the source assets of the transformation.

No alt text provided

  1. Click Asset Impact Summary to see a list of the target assets of the transformation.

No alt text provided

Indirect Lineage View

In this paragraph, "indirect lineage" means indirect lineage in Manta terminology (i.e., lineage through the WHERE condition, GROUP BY, HAVING, JOIN clauses) that is exported to EDC as control flow.Please do not confuse it with "indirect link" in EDC terminology, which means a link with at least one hidden node.

Indirect lineage is supported as of Informatica EDC version 10.4.0 EBF-17357.

Indirect lineage is not displayed in diagrams, but it is possible to view this type of lineage in an asset summary view.

  1. Display the Lineage and Impact of a starting asset at the table or column level.

  2. Click the Open the Tabular Asset Summary icon.

No alt text provided

  1. Select the Asset Control Summary tab.

No alt text provided

You will see the indirect lineage represented by a table of assets Controlling or Controlled by the starting one, where:

Context Lineage View

Context lineage is supported as of Informatica EDC version 10.4.1.

  1. Find and display a contextual transformation (i.e., a transformation that is not the last one before a target asset) as previously described.

  2. Choose the context whose lineage you want to display and click Lineage and Impact.

No alt text provided

You will see detailed lineage within the selected context.

No alt text provided

Export Statistics

The export log file contains information about the file sizes for EDC. This can be used as an indication that many assets have been exported and will be uploaded to EDC.

2020-05-11 20:15:24.588 0 INFO  eu.profinit.manta.dataflow.repository.exporter.client.FileLogger Decompressed file exportReport.json with size 166 bytes.
2020-05-11 20:15:24.589 0 INFO  eu.profinit.manta.dataflow.repository.exporter.client.FileLogger Decompressed file lineage.csv with size 104980 bytes.
2020-05-11 20:15:24.590 0 INFO  eu.profinit.manta.dataflow.repository.exporter.client.FileLogger Decompressed file lineage.set.csv with size 9976 bytes.
2020-05-11 20:15:24.591 0 INFO  eu.profinit.manta.dataflow.repository.exporter.client.FileLogger There is a compressed file objects.csv with uncompressed size 382533 bytes.
2020-05-11 20:15:24.592 0 INFO  eu.profinit.manta.dataflow.repository.exporter.client.FileLogger There is a compressed file links.csv with uncompressed size 35548 bytes.
2020-05-11 20:15:24.592 0 INFO  eu.profinit.manta.dataflow.repository.exporter.client.FileLogger There is a compressed file lineage.csv with uncompressed size 105092 bytes.
2020-05-11 20:15:24.593 0 INFO  eu.profinit.manta.dataflow.repository.exporter.client.FileLogger Decompressed file objects.zip with size 17471 bytes.

The export to EDC generates a report of cases and numbers of objects for which applicable mappings and/or a default mapping were used. The report is saved as a JSON file located in mantaflow\cli\output\{technology}\{outputFolder}\iedc\exportReport.json.

Below is a sample report showing, first, an ORCL database with a DWH and HR using a default mapping to an IEDC resource and, second, an INFASUPER schema using a user-defined mapping.

{
  "Using the default mapping for assets with no user-defined mapping" : {
    "ORCL/DWH" : 804,
    "ORCL/HR" : 137
  },
  "Used IEDC mappings" : {
    "ORCL_INFASUPER|ORCL|INFASUPER|" : 12
  }
}