Snowflake Integration Requirements
The following are the prerequisites necessary for IBM Automatic Data Lineage to connect to this third-party system, which you may choose to do at your sole discretion. Note that while these are usually sufficient to connect to this third-party system, we cannot guarantee that the connection or integration will be successful since we have no control, liability, or responsibility for third-party products or services, including for their performance.
-
Permission to access all addresses as described in https://docs.snowflake.com/en/user-guide/hostname-whitelist.html
-
Snowflake account (the root account representing a server provided by Snowflake)
-
User with a role with sufficient privileges, based on the mode you want to use
-
For the
INFORMATION_SCHEMAmode, the user has to have the role with the following privileges for each extracted database:-
USAGEon extracted database -
USAGEon each extracted schema inside the extracted database -
At least one privilege (it does not matter which one; for example,
SELECT) on each extracted Snowflake object (for example, TABLE, VIEW) -
MONITOR EXECUTIONfor the extraction of tasks from TASK_HISTORY
-
-
For the
ACCOUNT_USAGEmode, the user has to have the role with the following privileges:-
IMPORTEDprivilege on theSnowflakedatabase, which contains the ACCOUNT_USAGE schema- It can be set up using the following statement:
grant imported privileges on database snowflake to <role>
- It can be set up using the following statement:
-
As of R42.7 - Optionally, to enable extraction of streams:
-
USAGE on extracted database
-
USAGE on each extracted schema inside the extracted database
-
At least one privilege (any type, for example: SELECT) on each extracted stream
-
-
-
-
Connection parameters
-
Snowflake account containing the following information:
-
Account name
-
Region (optional; check your cloud platform at https://docs.snowflake.com/en/user-guide/jdbc-configure.html#connection-parameters and provide the region if a corresponding “Full Account Name” format contains it)
-
Cloud platform (optional; check your cloud platform at https://docs.snowflake.com/en/user-guide/jdbc-configure.html#connection-parameters and provide the cloud platform if a corresponding “Full Account Name” format contains it)
-
-
User name
-
Password (supported up to and including R42.13)
- Not needed if key pair authentication is used
-
Private key file
-
Path to a file storing a key
-
Only needed if a key pair authentication is used
-
-
Private key file password
-
Password to the file storing the key
-
Only needed if a key pair authentication is used
-
-
AES algorithm and not DES. If the private key was created with DES algorithm and used in Manta see: Authentication for Snowflake fails.
Supported Features
-
Snowflake SQL (the majority of the language constructs with lineage impact)
-
SQL stored procedures (using Snowflake scripting code)
-
Lineage analysis of tasks and streams
-
As of R42.7 - Extraction of streams
Known Unsupported Features
Automatic Data Lineage does not support the following Snowflake features. This list includes all of the features that IBM is aware are unsupported, but it might not be comprehensive.
-
Snowpark (Java, Scala, Python)
-
Javascript Snowflake procedures
-
UDFs (User Defined Functions) in Java, Python, Javascript
-
Secure UDFs
-
Lineage through dynamically executed code through EXECUTE IMMEDIATE
-
Recently introduced or changed features and some features that are rarely used
-
Extraction of unexecuted tasks
-
Lineage for tables created using
create table ... as select ...(as the DDL is not available in Snowflake) -
Lineage to Snowsight charts and dashboards
-
Window function syntax with statements like
sum(...) over (partition by ... [order by] ...)
Known Issues
-
R42.0-R42.2 - The bundled Snowflake JDBC driver does not support Java 21.
-
The recommended solution is to upgrade to R42.3 or newer which bundles Snowflake JDBC driver compatible with Java 21.
-
Alternatively, download Snowflake JDBC 3.14.4 or newer from https://docs.snowflake.com/en/developer-guide/jdbc/jdbc-download and place it to
mantaflow/cli/scenarios/manta-datalow-cli/lib-ext/. Note that connection validation in Snowflake will still not be working.
-
-
Tasks are extracted from TASK_HISTORY in both extraction modes, as Snowflake currently does not provide appropriate views to extract defined tasks.
-
In
INFORMATION_SCHEMAmode, TASK_HISTORY returns task activity within the last seven days or the next scheduled execution within the next eight days. It is limited to 10,000 records, showing executions with the most recent timestamp. -
In
ACCOUNT_USAGEmode, TASK_HISTORY returns the history of task usage within the last 365 days (one year).
-
-
Snowflake scanner does not use the connectors truststore in Automatic Data Lineage. It should not be needed in most cases as the certificates used by Snowflake are generally signed by root and intermediate certificates in Java default truststores. However, if it is necessary to provide a certificate, it needs to be uploaded to the Java default truststore. See Importing certificates into Java default truststore for instructions.
-
As of R42.7 - Streams are extracted using
SHOW STREAMSin both extraction modes, as Snowflake currently does not provide appropriate views to extract defined streams. This limits the number of extracted streams to 10k per schema.