Azure Data Factory

With Databand, you can track the execution of your Azure Data Factory (ADF) pipelines. Such tracking is done by a monitor, which scans your ADF factories every few seconds and reports on collected metadata from any runs of published pipelines (debug runs are excluded). With the collected metadata, you can enable powerful alerting to notify your data team on the health of your pipeline runs.

Collected metadata

Databand monitors the following metadata types:

Pipeline metadata

  • Pipeline state and duration
  • Activity state and duration
  • Pipeline and activity source code
  • Activity input and output JSON

Data set metadata

  • Paths, schemas, and record counts for all reads and writes
  • Various metrics that are calculated by ADF such as copyDuration, throughput, usedDataIntegrationUnits, and more

Alerting capabilities

Check the table to see what alerting functions are supported.

Table 1. Alerting functions that are supported by ADF integration

Alert type Supported Notes
Pipelines
Pipeline state
Pipeline duration
dbt test
Schema change
Task state
Task duration
Custom task metric Databand collects various metrics from ADF such as copyDuration, throughput,usedDataIntegrationUnits, and more. Users can alert on any of these metrics.
Data sets
Missing operation
Tables data quality
Operations data quality
Data delay

Integrating ADF with Databand

Integrating ADF with Databand consists of the following steps:

  1. Registering an Azure app
  2. Adding the integration in Databand

You can also edit an ADF integration in Databand.

Prerequisites

Before you begin the integration process, you need to have:

  • An active Azure subscription
  • Access to the Azure portal
  • Permissions to manage Azure resources, including the ability to register an app with Microsoft Entra ID

Known Issues

  • Databand monitors only runs for the pipelines that were published and triggered either manually or on schedule. Debugging runs that are triggered through the authoring interface are not monitored.

  • Currently, calculating operational lineage across your ADF pipelines is only supported for the following connectors:

    • Amazon Redshift
    • Amazon S3
    • Azure Data Lake Storage Gen2
    • Snowflake

    Additional connectors are onboarded to these functions throughout the integration's lifecycle. If you have an immediate need to support a specific connector for operational lineage calculations, contact your IBM account representative.

  • For Databand to fully monitor data flow operations, the data flow activity in your pipeline must have the Logging level option set to Verbose.