Snowflake
If you are a Snowflake user, you can use Databand to monitor and collect metadata from your Snowflake tables, or you can log your dataset operations using Snowflake SDK.
Monitoring Snowflake tables
With Databand, you can monitor tables, schemas, and databases from Snowflake and collect metadata about the size, structure, freshness, and overall quality of your data. The following steps explain how to configure Snowflake and connect to Databand to begin monitoring your tables.
Creating a service account in Snowflake
Use the following SQL statements to create a Databand service account that enables monitoring in Databand. This account is created with a dedicated role and the minimum required permissions to collect relevant metadata. To help ensure that the quality of service in your existing Snowflake data operations is not impacted by Databand monitoring, create a dedicated warehouse for the Databand service account. For more information, see the Snowflake documentation about warehouses.
-- Set the current role for grants
USE ROLE ACCOUNTADMIN;
-- Create a new role for Databand
CREATE ROLE DBND_ROLE;
-- Create a Databand user
CREATE USER DBND_USER DEFAULT_ROLE=DBND_ROLE PASSWORD='<PASSWORD>';
-- Grant the Databand role to the Databand user
GRANT ROLE DBND_ROLE TO USER DBND_USER;
-- Create a warehouse for Databand monitoring
CREATE WAREHOUSE IF NOT EXISTS DBND_WH WAREHOUSE_SIZE='XSMALL' INITIALLY_SUSPENDED=TRUE
AUTO_SUSPEND = 5 AUTO_RESUME = TRUE;
-- Grant the USAGE privilege on the new warehouse to the Databand role
GRANT USAGE ON WAREHOUSE DBND_WH TO ROLE DBND_ROLE;
-- Grant privileges to allow access to query history
GRANT IMPORTED PRIVILEGES ON DATABASE "SNOWFLAKE" TO ROLE DBND_ROLE;
-- Grant privileges on your database(s) to the Databand role
GRANT USAGE, MONITOR ON DATABASE <DB_NAME> TO ROLE DBND_ROLE;
GRANT USAGE, MONITOR ON ALL SCHEMAS IN DATABASE <DB_NAME> TO ROLE DBND_ROLE;
-- Grant read-only privileges to tables and views that will have data quality alerts
GRANT SELECT ON ALL TABLES IN DATABASE <DB_NAME> TO ROLE DBND_ROLE;
GRANT SELECT ON FUTURE TABLES IN DATABASE <DB_NAME> TO ROLE DBND_ROLE;
GRANT SELECT ON ALL VIEWS IN DATABASE <DB_NAME> TO ROLE DBND_ROLE;
GRANT SELECT ON FUTURE VIEWS IN DATABASE <DB_NAME> TO ROLE DBND_ROLE;
The last 4 SQL statements are necessary to calculate aggregates for data quality alerts. These permissions can be granted to only the tables and views for which you want to create data quality alerts. However, granting SELECT
permission to only specific tables requires ongoing maintenance of Snowflake permissions as your alerting needs change. For ease of use, grant SELECT
permissions to all tables and views.
Creating a Snowflake monitor in Databand
Complete the following steps to begin monitoring your Snowflake tables in Databand
- Click Integrations in the navigation menu.
- Click Add integration.
- Choose Snowflake as the integration type.
- In the Integration details step, provide the following configuration details:
- Integration name
- The name that you want to call your Snowflake monitor.
- Snowflake account
- Your account identifier for Snowflake. See the Snowflake documentation to determine your account identifier.
- Username
- The name of the service account you created earlier.
- Role
- The role that you created for the service account.
- Warehouse
- The warehouse that is used for Databand monitor queries.
- Choose which authentication type to use.
- Basic authentication
- Authenticate with the password that you created for your service account.
- Private key authentication
- Authenticate with a private key. If your private key is encrypted, you also need to provide its associated passphrase. See the Snowflake documentation for detailed instructions on setting up a private key.
- Click Next to authenticate with Snowflake.
If the connection to your data warehouse is not successful, the problem might be that the IP address of the Databand instance needs to be whitelisted in the Snowflake network access policies. Contact your Snowflake administrator to make the necessary changes. If you are a SaaS customer, contact the Databand support team to get the static IP of your Databand instance.
Choosing what to monitor with Databand
After you authenticate with Snowflake, you are presented with a list of the objects that your service account has access to. You can now select the tables that you want Databand to monitor. You can select entire databases, entire schemas, or individual tables. If you choose to monitor an entire database or schema, any future tables that are added to that database or schema are automatically added to your monitored tables in Databand. After you make your selections, click Save to finalize the creation of your Snowflake monitor. You can now see your new Snowflake monitor in the Integrations page in the Databand UI.
Known issues
- Each Snowflake monitor in Databand is limited to 1,000 tables, but you can create as many monitors as you want.
- The Snowflake monitor collects only
COPY INTO
queries and single tableINSERT
queries. Only these queries are displayed in the Databand UI.
Snowflake SDK
You can log your dataset operations in Databand by using Python to call COPY INTO
SQL commands on Snowflake. When you wrap your Snowflake cursor's execution with the Databand SnowflakerTracker
context manager, the context
manager catches the cursor's result and extracts Dataset logging operations from it.
Requirements
To log your Snowflake dataset operations with Databand, you must use the official Python package for Snowflake: snowflake-connector-python
.
Installing dbnd
Use the following command to install the required packages:
pip install dbnd dbnd-snowflake
For more information about installing the dbnd-snowflake
package, see Installing the Python SDK.
Integrating with SnowflakeTracker
The following code shows an example method of copying files into a Snowflake table:
from snowflake import connector
SQL_QUERY = """
COPY INTO DB.PUBLIC.TABLE
FROM s3://path/to/some/file/file.csv
CREDENTIALS = (AWS_KEY_ID = 'key' AWS_SECRET_KEY = 'key');
"""
with connector.connect(
user=SNOWFLAKE_USER,
password=SNOWFLAKE_PASSWORD,
account=SNOWFLAKE_ACCOUNT,
database=SNOWFLAKE_DATABASE,
schema=SNOWFLAKE_SCHEMA) as con:
c = con.cursor()
c.execute(SQL_QUERY)
Make the following changes to log the results of SQL_QUERY
with Databand:
from snowflake import connector
from dbnd_snowflake import SnowflakeTracker
SQL_QUERY = """
COPY INTO DB.PUBLIC.TABLE
FROM s3://path/to/some/file/file.csv
CREDENTIALS = (AWS_KEY_ID = 'key' AWS_SECRET_KEY = 'key');
"""
with connector.connect(
user=SNOWFLAKE_USER,
password=SNOWFLAKE_PASSWORD,
account=SNOWFLAKE_ACCOUNT,
database=SNOWFLAKE_DATABASE,
schema=SNOWFLAKE_SCHEMA) as con:
c = con.cursor()
with SnowflakeTracker():
c.execute(SQL_QUERY)
SnowflakeTracker
catches the execution of c.execute(SQL_QUERY)
. With this command, you can track both the read operation of your file from S3 and the write operation to DB.PUBLIC.TABLE
in Snowflake.
Using nested queries for read operations
SnowflakeTracker
supports nested queries only for read operations.
Example:
`copy into dband_poc from ( SELECT $1:dt::date , $1:acount::varchar from @dband_poc/dt=20220316/);`
Known issues
The following limitations apply for integration with SnowflakeTracker
:
- Schema tracking is supported only for tables, not for files.
- Provide only one query execution for each
SnowflakerTracker
context. - Nested queries, such as
COPY INTO (SELECT * FROM TABLE) table FROM...
are not supported.
The following limitations apply for nested queries:
- Nested queries are supported only for read operations in the
FROM
clause of your query. - Join operations are not supported.