Installing on Airflow cluster

To provide observability over your Airflow DAGs, Databand integrates with different Apache Airflow deployment types. For the steps specific to your type of Airflow deployment, use one of the following links:

Before you proceed with the integration, make sure you have network connectivity between Apache Airflow and Databand (from Apache Airflow to the Databand Server).

Standard Apache Airflow cluster

Installing new Python packages on managed Airflow environments triggers an automatic restart of the Airflow scheduler.

Follow the Installing Python SDK manual to install the Databand dbnd-airflow-auto-tracking Python package into your cluster's Python environment. To avoid being automatically redirected to a different version of the Python package in case of its updates, pin the version of the Python package by running the following command:

pip install dbnd-airflow-auto-tracking==<version number>

Check the Python repository for the most recent version of the Python package.

You might have to change your Dockerfile or requirements.txt.

Airflow 2.0+ support

For Databand tracking to work properly with Airflow 2.0+ you need to disable lazily loaded plug-ins. To do so, change core.lazy_load_plugins=False configuration setting through setting the environment variable AIRFLOW__CORE__LAZY_LOAD_PLUGINS=False.

For more information, see the Airflow's plug-ins documentation website.

Airflow 2.1.0, 2.1.1, 2.1.2 - Airflow HTTP communication

If you use Airflow 2.1.0, 2.1.1, or 2.1.2, verify that the apache-airflow-providers-http package is installed, or consider upgrading your Airflow.

Astronomer

With Astronomer you can build, run, and manage data pipelines-as-code at enterprise scale.

You can install the dbnd-airflow-auto-tracking library by customizing the Astronomer Docker image, rebuilding it, and deploying it. Check the manual for more details on how to do that.

In your Astronomer folder, add the following line to your requirements.txt file:

dbnd-airflow-auto-tracking==REPLACE_WITH_DATABAND_VERSION

Redeploying the Airflow image triggers a restart of your Airflow scheduler.

Astronomer Airflow URL

To get the Airflow URL:

  1. Go to the Astronomer control page and select the Airflow deployment.
  2. Click Open Airflow and copy the URL without the /home suffix. The URL is provided in the following format:

http://deployments.{your_domain}.com/{deployment-name}/airflow.

The Astronomer UI shows your {deployment-name} as Release Name.

Release Name in the Astronomer Airflow UI When you create a Databand Airflow integration for Airflow deployed on Astronomer:

  1. Select OnPrem Airflow as the Airflow mode.
  2. Enter the Airflow URL: (http://deployments.{your_domain}.com/{deployment-name}/airflow) in the Airflow URL field.

Amazon Managed Workflows

Amazon Managed Workflows is a managed Apache Airflow service that enables setting up and operating end-to-end data pipelines in the AWS cloud at scale. To integrate with AWS:

  1. Go to AWS MWAA > Environments > {mwaa_env_name} > DAG code in Amazon S3 > S3 Bucket: Displaying S3 Bucket in AWS
  2. In MWAA’s S3 bucket, update your requirements.txt file.
  3. Install the package by entering the following code:
dbnd-airflow-auto-tracking==REPLACE_WITH_DATABAND_VERSION
  1. Update the requirements.txt version in the MWAA environment configuration. Check the table to see the list of supported platform versions.

Table 1. A list of supported platform versions for an integration with AWS

Platfrom Supported version Notes
Apache Airflow 1.10.15 Supported only with Python 3.7
Apache Airflow 2.2.x to 2.7.x

Saving this change to your MWAA environment configuration triggers a restart of your Airflow scheduler.

For more information on integration, see Installing Python dependencies - Amazon Managed Workflows for Apache Airflow. For Databand installation details, check Installing DBND.

MWAA URL

The Airflow URL can be located in the AWS Console. To get the URL, go to AWS MWAA > Environments > {mwaa_env_name} > Details > Airflow UI.

The URL is provided in the following format: https://<guid>.<aws_region>.airflow.amazonaws.com The URL format displayed in the MWAA Environment

Google Cloud Composer

With Google Cloud Composer, which is a fully managed data workflow orchestration service, you can author, schedule, and monitor pipelines.

Before you integrate Cloud Composer with Databand, make sure you have your Cloud Composer URL.

To integrate with Google Cloud Composer:

  1. Update your Cloud Composer environment's PyPI package with the REPLACE_WITH_DBND_VERSION entry:
dbnd-airflow-auto-tracking==REPLACE_WITH_DBND_VERSION
  1. Provide the Databand version that you're using (for example, 0.61.1) instead of REPLACE_WITH_DBND_VERSION.

A line of code for such a version would look like the following:

dbnd-airflow-auto-tracking==0.61.0

See Installing DBND for more details. Saving this change to your Cloud Composer environment configuration triggers a restart of your Airflow scheduler.

Updating Google Cloud Composer with the relevant Databand version

For more information, see Installing a Python dependency from PyPI.

For Databand tracking to work properly with Airflow 2.0+, you need to disable lazily loaded plug-ins. To disable lazily loaded plug-ins, use the configuration setting: core.lazy_load_plugins=False. You can also disable lazily loaded plug-ins in Google Cloud Composer. Look at the following screenshot for more details:

Disabling lazy loaded plug-ins in Google Cloud Composer

Cloud Composer URL

The Cloud Composer URL can be found in the GCloud Console: Composer > {composer_env_name} > Environment Configuration > Airflow web UI.

The URL is provided in the following format: https://<guid>.appspot.com