Cross-platform lineage explained

Overview

In complex data environments, data flows across multiple platforms and systems; for example, ETL tools usually implement data pipelines that move data from one database to another. Similarly, reporting tools present aggregated data that comes from a database or data lake. To connect the lineage across different technologies and provide complete visibility into the data pipeline, IBM Manta Data Lineage needs to understand and correctly map the (database) connections that ETL, reporting, and modeling tools point to.

Manta Data Lineage implements several heuristics to automatically map the connection based on the information available in the connection that comes from the ETL, reporting, or modeling tool itself and the details available in Manta Data Lineage database connection. If the selected database was scanned by Manta Data Lineage but heuristics cannot match it to the appropriate connection, a user-defined alias mapping override is needed. The best way to do this is by using Automapping UI. These cases often manifest in the following ways only after a Manta Data Lineage scan is done.

To establish lineage across these systems, Manta Data Lineage uses three mapping techniques:

These methods work by connecting calling systems (system from which the connection is established, such as ETL, reporting, or modeling tools) to receiving systems (like databases).

The following diagram contains an example of connecting the calling and receiving systems.

No alt text provided

Dictionary mapping is a fully automated process and is the initial method used to establish a connection between calling and receiving systems. For details about technology-specific mapping algorithms, see Specific Mapping Algorithms.

When a connection object cannot be automatically matched to a scanned receiving system by using dictionary mapping, alias mapping is used if it is defined. Automapping UI contains a list of suggested user-defined alias mappings to be defined by users. Review the list first before you manually create alias mappings.

Terminology

Calling system : The system that initiates a connection to another system, typically to read from or to write to it. It can be for example an ETL tool or a Business Intelligence (BI) platform, or a database, in case of Database links or Linked servers.

Receiving system : The system that receives the connection that was initiated by the calling system. It is usually a database or data storage system.

Connection object : A collection of properties (such as technology type, connection string, server name, database name) that the calling system uses to establish a connection to the receiving system. These properties are logged during analysis and are critical for mapping the alias to the correct system. For details about the connection object, see Detailed Information About Connection Mappings.

Connection ID : The specific identifier within the connection object that is used for alias mapping. It is typically the connection string or another unique attribute that Manta Data Lineage uses to match the alias with a scanned system. It links the calling system (PowerBI) to the correct receiving system (SQL Server) within Manta Data Lineage.

Automapping (as of R42)

When automapping is used, schemas and tables in the query are compared with those in the scanned dictionaries. If a match is found, the calling system is temporarily linked to the matching receiving system, and the connection is automatically mapped.

Automapping is used only when a single, clear match is found, which makes the process reliable, and the risk of false positives is minimized.

Limitations of Automapping:

Important: Because of the limitations, always review proposed automappings and configure them as user-defined connection alias mappings.

Automapping is disabled by default in versions R42.0 - R42.6. You can enable it in Configuration > CLI > Common > Common Config > Enable dictionary automapping.

Starting in version R42.7, automapping is enabled by default.

Automapping UI (as of R42.7)

The Automapping UI provides a centralized view of all aliases that Manta Data Lineage detected that do not have a user-defined mapping. It contains information about whether aliases were automatically mapped to receiving systems or not. You can also find there the alias that is used in the connection ID field.

You can access automapping in Manta Admin UI.

The table contains details about aliases found during a scan. The Status column contains information about the status of each alias:

The Count column shows the number of unique occurrences where the scenario + calling system + alias + status combination was detected during the scan. Start creating user-defined aliases for the entries that have the highest count number as those resolve most of the cross-system lineage gaps.

Each time the connection associated with an alias is scanned, the status, count, and receiving system details are updated. If a user defines an alias mapping for that alias and rescans the connection, the corresponding row is removed.

Configuring user-defined alias mappings

After each scan, review the automapping results and configure user-defined alias mappings.

  1. Review automapping results. In case of the Review status, select the correct receiving system for that particular connection ID.
  2. Configure user-defined alias mappings for all aliases that you identified during the review. These are usually all aliases with the Automapped status and the selected aliases with the Review status.
    1. Copy alias details from the Alias column.
    2. Click the Alias Mapping button to open existing alias mapping section.
    3. Create a new alias mapping. Use the connection details that you copied earlier to create a new user-defined alias mapping. Alternatively, copy the connection details from the log messages.
      A connection ID cannot be mapped to multiple systems, because later only one randomly selected mapping is used.
    4. Rescan the calling system to map every instance of the alias to the appriopriate receiving system. Rescanning also removes all rows related to the alias from the Automapping UI table.

Example scenarios for automapping

Scenario 1

Scenario 2

User-defined alias mapping

You manually define the alias mappings by explicitly associating the connection ID (alias) with the appropriate database (identified by its dictionary ID). You can define alias mappings in Admin UI > Configuration > CLI > Target Database technology > Alias Mapping. Before you start defining alias mappings manually, review the suggestions on Automapping UI.

To configure mappings manually, use the connectionString value from the following log message in the Connection ID field for the connection that you want to be recognized under this name. You can use serverName if connectionString is null.

2022-07-11 17:42:42.388 [CLI] 0 WARN  eu.profinit.manta.dataflow.generator.common.query.impl.DataflowQueryServiceImpl [Context: \ZenDesk Metrics.pbix]
DATABASE_CONNECTION_ERRORS CANNOT_FIND_SYSTEM_FOR_CONNECTION "ODBC dsn=ENT_ODS"
User message: Cannot find any matching system for Connection [type=ODBC, connectionString=dsn=ENT_ODS, serverName=null, databaseName=null, schemaName=null, userName=null].
Technical message: Cannot find any matching system for Connection [type=ODBC, connectionString=dsn=ENT_ODS, serverName=null, databaseName=null, schemaName=null, userName=null].
Solution: Identify the particular database technology that "dsn=ENT_ODS" points to and if the technology is supported by Manta, configure it as a new connection or configure the manual mapping for that database technology in Manta Admin UI.
Impact: SINGLE_INPUT
2022-05-11 20:00:32.769 [main] WARN eu.profinit.manta.connector.mssql.MssqlReaderProvider [Context: \IMEDW\wf_ZenDesk_Metrics.xml] 
DICTIONARY_MANTA_MAPPING_ERRORS NO_MAPPING_FOR_CONNECTION
User message: Connection in use could not be automatically mapped to one of the database connections configured in Manta.
Technical message: There is no mapping for the connection Connection [type=MSSQL, connectionString=null, serverName=CTC2EDDB01, databaseName=EDW, schemaName=null, userName=].
Solution: Identify the particular database technology MSSQL leading to "null" and configure it as a new connection or configure the manual mapping for that database technology in Manta Configurator.
Impact: SINGLE_INPUT

No alt text providedNo alt text provided