BigQuery

Google BigQuery is a serverless, scalable data warehouse that enables analysis over petabytes of data. IBM Automatic Data Lineage offers a powerful scanner for Google BigQuery technology. Once configured, Automatic Data Lineage can automatically connect to the BigQuery resource for extracting and analyzing the pertinent metadata within the selected databases. This metadata includes but is not limited to the BigQuery data dictionary, scripts, views, functions, stored procedures, external tables, and jobs. Automatic Data Lineage can parse all the programming code and logic stored within. This allows Automatic Data Lineage to generate lineage down to the column level while showing all transformation logic associated with individual column elements.

Automatic Data Lineage currently scans:

Check out the guides below for more details on setting up this scanner.

Extraction and Analysis Phase Scenarios

Extraction Phase

For the extraction phase for BigQuery servers, there are three scenarios.

  1. BigQuery dictionary mapping scenario — connects to each configured BigQuery account and stores the mapping between these values: dictionary ID, service URL, connection ID, included and excluded projects/datasets

  2. BigQuery extractor scenario — connects to each configured BigQuery account and extracts the database dictionary and DDL scripts from the configured projects and datasets

  3. Bigquery ingestion scenario - pulls inputs from git Manta Flow Agent Configuration for Extraction:Git Source or a remote agent filesystem location Manta Flow Agent Configuration for Extraction:Agent Source

Analysis Phase

For the analysis phase for BigQuery accounts, there are four scenarios.

  1. BigQuery dictionary dataflow scenario — analyzes metadata from the extracted BigQuery database dictionaries and saves it in your Manta metadata repository

  2. BigQuery DDL dataflow scenario — harvests metadata and lineage from the extracted BigQuery DDL scripts and saves it in your Manta metadata repository

  3. BigQuery SQL dataflow scenario — harvests metadata and lineage from the provided BigQuery SQL scripts and saves it in your Manta metadata repository

  4. BigQuery job dataflow scenario — harvests metadata and lineage from the provided BigQuery job scripts and saves it in your Manta metadata repository