BigQuery

BigQuery is Google Cloud’s fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real time. With BigQuery, there’s no infrastructure to set up or manage, letting you focus on finding meaningful insights using GoogleSQL and taking advantage of flexible pricing models across on-demand and flat-rate options.

BigQuery Source

BigQuery is Google Cloud’s fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real time. With BigQuery, there’s no infrastructure to set up or manage, letting you focus on finding meaningful insights using GoogleSQL and taking advantage of flexible pricing models across on-demand and flat-rate options.

If you are new to BigQuery, you can try to load and query data with the bq tool.

BigQuery uses GoogleSQL for querying data. GoogleSQL is an ANSI-compliant structured query language (SQL) that is also implemented for other Google Cloud services. SQL queries are handled by cluster nodes in the same way as NoSQL data requests. Therefore, the same best practices apply when creating SQL queries to run against your BigQuery data, such as avoiding full table scans or complex filters.

Available Tools

Pre-built Configurations

Requirements

IAM Permissions

BigQuery uses Identity and Access Management (IAM) to control user and group access to BigQuery resources like projects, datasets, and tables.

Authentication via Application Default Credentials (ADC)

By default, Toolbox will use your Application Default Credentials (ADC) to authorize and authenticate when interacting with BigQuery.

When using this method, you need to ensure the IAM identity associated with your ADC (such as a service account) has the correct permissions for the queries you intend to run. Common roles include roles/bigquery.user (which includes permissions to run jobs and read data) or roles/bigbigquery.dataViewer. Follow this guide to set up your ADC.

Authentication via User’s OAuth Access Token

If the useClientOAuth parameter is set to true, Toolbox will instead use the OAuth access token for authentication. This token is parsed from the Authorization header passed in with the tool invocation request. This method allows Toolbox to make queries to BigQuery on behalf of the client or the end-user.

When using this on-behalf-of authentication, you must ensure that the identity used has been granted the correct IAM permissions. Currently, this option is only supported by the following BigQuery tools:

  • bigquery-sql
    Run SQL queries directly against BigQuery datasets.

Example

Initialize a BigQuery source that uses ADC:

sources:
  my-bigquery-source:
    kind: "bigquery"
    project: "my-project-id"
    # location: "US" # Optional: Specifies the location for query jobs.

Initialize a BigQuery source that uses the client’s access token:

sources:
  my-bigquery-client-auth-source:
    kind: "bigquery"
    project: "my-project-id"
    useClientOAuth: true
    # location: "US" # Optional: Specifies the location for query jobs.

Reference

fieldtyperequireddescription
kindstringtrueMust be “bigquery”.
projectstringtrueId of the Google Cloud project to use for billing and as the default project for BigQuery resources.
locationstringfalseSpecifies the location (e.g., ‘us’, ‘asia-northeast1’) in which to run the query job. This location must match the location of any tables referenced in the query. Defaults to the table’s location or ‘US’ if the location cannot be determined. Learn More
useClientOAuthboolfalseIf true, forwards the client’s OAuth access token from the “Authorization” header to downstream queries.