dbt

Overview

dbt operations can be tracked with Databand in two ways:

  1. Use the Databand Python SDK to track either dbt Cloud jobs or dbt Core commands that were triggered with Python or through a Python-based orchestration tool, such as Airflow.
  2. Connect your dbt Cloud account with the Databand dbt monitor to track all jobs that were run.

Tracking dbt with the Databand Python SDK

You can collect metadata from your dbt Cloud jobs or dbt Core commands by using functions from the Databand Python SDK as shown in the following section. Tracking dbt in this way rolls up your dbt metadata in the context of a Databand pipeline. When you use Airflow to orchestrate your dbt jobs, the dbt metadata is displayed within its Airflow DAG in the Databand UI. In the absence of an orchestration tool, you can manually create a tracking context in Python by using the dbnd_tracking function from the Databand Python SDK.

Prerequisites

You must have the following information to track dbt with the Databand Python SDK.

  1. The dbnd Python library must be installed in your Python environment.

  2. A dbt Cloud service token. Follow the instructions in the dbt service account documentation to create one. The minimum permission that is required for integration with Databand is Read-only.

  3. Your dbt Cloud account ID. Your account ID is typically displayed as part of your URL while you are logged in to your dbt Cloud account. It is the number immediately following accounts in the URL.

    dbt cloud account id

    Alternatively, you can go to Account settings, then Account in the dbt Cloud UI and find your account ID in the Account information section.

  4. Your dbt API URL. The API URL is found either by using the URL from your Cloud environment, or by going to Account settings then Access URLs. If the Access URLs field is absent, your dbt account did not migrate yet and you can still use the dbt Cloud URL.

  5. A tracking context. The tracking context is required so that your dbt metadata can be displayed in a pipeline in the Databand UI. If you are orchestrating dbt with Airflow, a tracking context is already created for you. Implement the following functions in your DAG after you run the dbt commands. If you are using a different orchestration tool, you can create a tracking context manually by using dbnd_tracking() from the Python SDK.

Tracking dbt Cloud jobs

To track a dbt Cloud job, you need the following information:

  • Your dbt Cloud account ID
  • A dbt Cloud service token
  • The run ID of the job you want to monitor
  • Your Cloud API URL
from dbnd import collect_data_from_dbt_cloud

dbt_cloud_account_id = 47874
dbt_cloud_api_token = "a1b2c3d4e5f6g7h8"
dbt_cloud_run_id = 163925522
dbt_api_url = "https://ab123.us1.dbt.com"

# Your code for starting your dbt Cloud job and then awaiting its completion
...

# Invocation of Databand dbt Cloud tracking function
collect_data_from_dbt_cloud(
    dbt_cloud_account_id=account_id,
    dbt_cloud_api_token=dbt_cloud_api_token,
    dbt_job_run_id=dbt_cloud_run_id
    dbt_api_url=dbt_api_url
)

Tracking dbt Core commands

Unlike dbt Cloud, dbt Core is run one command at a time and does not have the concept of a job. In this case, use Databand to track each individual command that was run by dbt. The only information required to track dbt Core metadata is your dbt project directory.

To ensure that Databand always collects dbt metadata, build the trigger of your dbt command and the invocation of the Databand tracking function into try/finally blocks. In this case, even if your dbt command fails, Databand collects your dbt metadata.

from dbnd import collect_data_from_dbt_core

dbt_project_dir = "/opt/airflow/dags/dbt/dbnd"

try:
    # Your code for starting your dbt Core command and then awaiting its completion
    ...

finally:
    # Invocation of Databand dbt Core tracking function
    collect_data_from_dbt_core(dbt_project_dir)

Tracking dbt Cloud jobs Using the Databand dbt monitor

You can use the Databand dbt Cloud monitor to track jobs by directly monitoring your dbt Cloud account. This option allows Databand to track your dbt jobs regardless of how they are triggered.

Prerequisites

You must have the following information to track dbt Cloud jobs in the Databand dbt monitor.

  1. A dbt Cloud service token. Follow the instructions in the dbt service account documentation to create one. The minimum permission that is required for integration with Databand is Read-only.

  2. Your dbt Cloud account ID. Your account ID is typically displayed as part of URLs while logged in to your dbt Cloud account. It is the number immediately following accounts in the URL.

    dbt cloud account id

    Alternatively, you can go to Account settings, then Account in the dbt Cloud UI and find your account ID in the Account information section.

  3. Your dbt API Cloud URL. The API URL is found either by using the URL from your Cloud environment, or by going to Account settings then Access URLs. If the Access URLs field is absent, your dbt account did not migrate yet and you can still use the dbt Cloud URL.

Configuring the monitor

You can create a new dbt monitor through the Databand UI by going to the Integrations page, clicking, Add Integration, and selecting dbt.

If you are tracking dbt Cloud jobs that are orchestrated with Python by using the collect_data_from_dbt_cloud function, do not configure a dbt monitor in the Databand UI. Tracking the same dbt Cloud job run twice is not supported.