SSIS Scanner Guide
IBM Automatic Data Lineage is a powerful data lineage platform that simplifies data management by supporting automated lineage harvesting of SQL Server Integration Services (SSIS) projects/packages. This guide provides steps to set up SSIS lineage harvesting.
Step 1: Configure the Connection
Create a new connection in Admin UI http://localhost:8281/manta-admin-gui/app/index.html?#/platform/connections/ to enable automated extraction and lineage analysis of SSIS projects/packages by Automatic Data Lineage. You need one SSIS
connection in Automatic Data Lineage for each SSIS server that you want to connect to. The connection requirements and privileges can be found in
SSIS Integration Requirements.
Mandatory Properties to Configure
-
Connection Name: To ensure proper identification across all technologies, the connection name should be unique. This is a user-defined parameter for identifying the SSIS connection only within Admin UI. This value will be displayed in Admin UI under the list of all SSIS connections.
-
SSIS Instance Name (
ssis.extractor.server) -
To enable automated extraction of the packages and/or projects from the SSIS Catalog, configure the following.
-
Connection String (
ssis.extractor.url), deployment method (ssis.extractor.enableProjects,ssis.extractor.enablePackages) -
Authentication Type: See MS SQL Server Authentication Types for more details on individual options.
-
Optional Properties
-
To control the scope of the extraction, use
ssis.extractor.include,ssis.extractor.exclude,ssis.dataflow.include, andssis.dataflow.excludeto restrict the extraction on the folder or even the dataflow level. -
Select the SSIS environment that should be used to pull variable values for lineage by specifying it in
ssis.extractor.environment.name.
See the SSIS Resource Configuration page for the full list and a detailed explanation of the properties that can be configured for the scanner.
Step 1.1: Disabling SSIS Automated Extraction for Manually Provided Packages/Projects
Note that both projects and packages can be stored in various locations, including the file system or version control system, as well as the SSIS catalog in the SQL Server database. In instances where projects/packages are not stored on the SQL Server database, Automatic Data Lineage requires manual placement of the .ispac/.dtsx files on the Automatic Data Lineage server in the appropriate directory for ingestion and analysis. In such cases, it is necessary to disable the SSIS extraction and place the package/project files in the proper location.
-
Create a new SSIS connection and only configure the SSIS Connection Name and SSIS Instance Name. (Manta Admin UI may require that the Connection String and Username fields also be configured. In this situation, enter placeholder values for the required fields as they will not be leveraged during scan time.)
-
Disable the
ssisExtractorScenarioby creating a custom workflow that skips the execution of the SSISExtractor Scenario. If using Manta Orchestration API scripts to call specific scenarios, do not invoke thessisExtractorSecnario. -
Create the directory
manta/cli/temp/ssis/<SSIS_instance_name>/and place all the.dtsxfiles there (if in package deployment mode) and/or.ispacfiles (if in project deployment mode).
It is expected that the content provided will be the same as the files produced by automated extraction. This means that any additional files are ignored, especially configurations for package deployment models (
.dtsConfig files) that are not supported. (If necessary, use the parameter value override described later to set the correct values.)
Optional Additional Steps
Step 2: Overriding SSIS Variables and Parameter Values (Optional)
During the runtime of SSIS jobs, variables and parameters are read from the database. However, Automatic Data Lineage lineage analysis might encounter limitations as it doesn't have access to these runtime values and uses the default values defined in the jobs instead. This can lead to incomplete or unexpected lineage results. To address this, define manual override values for the variables and parameters to be used specifically during lineage analysis. When a variable affects lineage analysis and Automatic Data Lineage can't resolve a default value, an override is mandatory. Overrides are permitted only for variables that are explicitly declared in the package. For example, System Variables, which are implicitly defined, can't be overridden.
You can find user-defined overrides in Admin UI > Configuration > CLI > SSIS > SSIS Variable Overrides having the following structure:
|
Column |
Description |
Examples / allowed values |
|---|---|---|
|
Scope mask |
Regular expression to determine the scope of the variable to apply this value to The expression is matched against the full hierarchical path to the variable's parent object (starting with the folder and separated by “
For the path
|
folder1/Project1.ispac/Package1 Package2/SqlTask1 |
|
Qualified name |
Qualified name of the variable or parameter |
User::var1 |
|
Mode |
DEFAULT: Initialize the variable to this value on package/project load OVERRIDE: Use the configured value whenever the variable's value is read, regardless of any assignments to this value, property expressions, etc. |
DEFAULT |
|
Data type |
Data type to convert the value to Use STRING for dates If NULL is selected, the value column is ignored |
STRING |
|
Value |
The value to use |
Hello |
Step 3: Overriding Component Data Lineage (Optional)
Within Automatic Data Lineage, you have the capability to specify or override the lineage between inputs and outputs of components. This functionality proves useful in scenarios where the component's lineage analysis is not supported or may yield incomplete results. For instance, unsupported components may exhibit default behavior where inputs are connected to outputs based on matching names or through a generic connection mechanism, potentially leading to inaccurate lineage. Additionally, programmable components with unanalyzed code can benefit from overriding the default lineage to achieve more precise results.
Be aware that this feature is not designed to address unsupported sources and targets. It primarily focuses on overriding lineage within a component, specifically between its input and output ports. Therefore, if you encounter issues connecting your SSIS package to the underlying database, this feature is not applicable as it does not handle the broader connectivity aspect.
If a manual lineage definition is present for the given component, Automatic Data Lineage will not analyze the component at all and will instead use the lineage as defined manually. You cannot define anything but the component's input/output column for the manual definition source/target (i.e., you cannot use it to connect an input column to a database column, etc.). If there are multiple data flows to be defined through a single component, put each source-target mapping on a separate line with the same component path.
You can find user-defined overrides in Admin UI > Configuration > CLI > SSIS > SSIS Variable Overrides having the following structure:
|
Column |
Description |
Examples / allowed values |
|---|---|---|
|
Full path to component |
The full hierarchical path to the component (starting with any folder(s), project file name, or package name and separated by " For package deployment, use the package file name for the project path segment |
folder1/Project1.ispac/Package1/Data Flow Task/Script Component folder1/Package1.dtsx/Package1/Data Flow Task/Script Component |
|
Source column |
Name of the input column or its
If the component has multiple inputs, qualify any ambiguous column names with an input name |
LastName Package1\Data Flow Task\Script Component.Inputs[Input 0].Columns[LastName] Input 0.LastName |
|
Target column |
Name of the output column or its
If the component has multiple outputs, qualify any ambiguous column names with an output name |
LastName Package1\Data Flow Task\Script Component.Outputs[Output 0].Columns[LastName] Output 0.LastName |
|
Edge type |
Type of edge to use (direct or filter) If not provided, direct edge is used |
DIRECT |
|
Description (optional) |
A description of the transformation Note that the description is added to the target column; if multiple manual definitions use the same target column and provide a description, all descriptions will be merged and used for the target column |
Currency validation and conversion |
Step 4: Parameter Values Configured on the Server (Optional)
For a project deployment model, the parameter values can be configured on the SSIS server after the project is deployed using a literal value or variable value sourced from the configured SSIS environment folder|name (https://learn.microsoft.com/en-us/answers/questions/256900/ssis-use-environment-configuration-in-visual-studi ). These values are extracted next to the project file in a file with the same name as the project but with the .properties suffix. For example, extraction of the project MyProject will produce a MyProject.ispac file with a project definition and a MyProject.properties file with parameter values. These values will be used during
data flow generation to override the default values of the project or package parameters.
There are some limitations on the extraction of these values.
-
Parameters marked as sensitive cannot be extracted. Their values are encrypted and inaccessible.
-
Parameter values can only be overridden for a single instance of a package execution. This value is not stored on the server.
It is possible to manually edit the properties file with parameter values after the extraction and to set the value for the parameters that cannot be extracted. Caution: This change will be lost after the next automatic extraction overrides the
file. The format of the file is
parameterName=parameterValue, where the parameterName for the project parameters only consists of the parameter name. For the package parameter, the parameterName is composed of the package name (including
the .dtsx suffix) and the parameter name separated by a dot.
Example:
projectParam=value1
MyPackage.dtsx.param1=value2
This is a configured project parameter called projectParam with a value of value1 and a package parameter of param1 in the package
MyPackage.dtsx with a value of value2.