Apache Airflow

Databand provides various monitoring, alerting, and analytical functions that help you monitor the health and reliability of your Airflow DAGs. Databand monitors multiple Airflow instances by providing a centralized tracking system for company-wide DAGs.

You can use DAG tracking functions for extra visibility into:

  • Metadata from operators
  • Task code, logs, and errors
  • Data processing engines (such as Redshift and Spark)

Go the data collection cheat sheet to check what metadata is tracked and how tracking metadata can be configured.

Architecture of Airflow tracking by Databand

Databand tracks all operators and can capture runtime information from every .execute() call within any Airflow operator. Everything that happens in the boundaries of the .execute() function is tracked, for example:

  • Operator start and end time
  • User metrics emitted from the code
  • User exceptions
  • Source code (optional)
  • Logs (optional)
  • Return value (optional)

You can use all functions from Tracking Python inside your operator implementation the moment Databand is integrated with your cluster.

You can also use Airflow Syncer, which syncs execution metadata from the Airflow database.

Some of the operators cause "remote" execution, so the connection between Airflow operator and subprocess execution must be established. Databand supports multiple Spark-related operators, Bash, and some other operators. For more information, see Tracking Sub-Process and Remote Tasks.

Databand Architecture Airflow Sync as DAG

Setting up an Airflow integration

To integrate Databand with your Airflow environment:

  1. Install dbnd-airflow-auto-tracking, Databand's runtime tracking Python package, on your Airflow cluster.
  2. Install the Airflow monitor DAG.
  3. Add and configure the Airflow integration in the Databand application.

Known issues

Because Databand integrates with different Apache Airflow deployment types, the following sections explain what special steps need to be taken depending on the used platform: