google.cloud.bigquery.table.RowIterator#

Methods

to_dataframe([bqstorage_client, dtypes, …]) Create a pandas DataFrame by loading all pages of a query.

Attributes

pages Iterator of pages in the response.
schema The subset of columns to be read from the table.
total_rows The total number of rows in the table.


class google.cloud.bigquery.table.RowIterator(client, api_request, path, schema, page_token=None, max_results=None, page_size=None, extra_params=None, table=None, selected_fields=None)[source]#

Bases: google.api_core.page_iterator.HTTPIterator

A class for iterating through HTTP/JSON API row list responses.

Parameters:
  • client (google.cloud.bigquery.Client) – The API client.
  • api_request (Callable[google.cloud._http.JSONConnection.api_request]) – The function to use to make API requests.
  • path (str) – The method path to query for the list of items.
  • page_token (str) – A token identifying a page in a result set to start fetching results from.
  • max_results (int, optional) – The maximum number of results to fetch.
  • page_size (int, optional) – The maximum number of rows in each page of results from this request. Non-positive values are ignored. Defaults to a sensible value set by the API.
  • extra_params (Dict[str, object]) – Extra query string parameters for the API call.
  • table (Union[ Table, TableReference, ]) – Optional. The table which these rows belong to, or a reference to it. Used to call the BigQuery Storage API to fetch rows.
  • selected_fields (Sequence[ google.cloud.bigquery.schema.SchemaField, ]) – Optional. A subset of columns to select from this table.
pages#

Iterator of pages in the response.

Returns:
A
generator of page instances.
Return type:types.GeneratorType[google.api_core.page_iterator.Page]
Raises:ValueError – If the iterator has already been started.
schema#

The subset of columns to be read from the table.

Type:List[google.cloud.bigquery.schema.SchemaField]
to_dataframe(bqstorage_client=None, dtypes=None, progress_bar_type=None)[source]#

Create a pandas DataFrame by loading all pages of a query.

Parameters:
  • bqstorage_client (google.cloud.bigquery_storage_v1beta1.BigQueryStorageClient) –

    Beta Feature Optional. A BigQuery Storage API client. If supplied, use the faster BigQuery Storage API to fetch rows from BigQuery. This API is a billable API.

    This method requires the fastavro and google-cloud-bigquery-storage libraries.

    Reading from a specific partition or snapshot is not currently supported by this method.

    Caution: There is a known issue reading small anonymous query result tables with the BQ Storage API. When a problem is encountered reading a table, the tabledata.list method from the BigQuery API is used, instead.

  • dtypes (Map[str, Union[str, pandas.Series.dtype]]) – Optional. A dictionary of column names pandas dtype``s. The provided ``dtype is used when constructing the series for the column specified. Otherwise, the default pandas behavior is used.
  • progress_bar_type (Optional[str]) –

    If set, use the tqdm library to display a progress bar while the data downloads. Install the tqdm package to use this feature.

    Possible values of progress_bar_type include:

    None
    No progress bar.
    'tqdm'
    Use the tqdm.tqdm() function to print a progress bar to sys.stderr.
    'tqdm_notebook'
    Use the tqdm.tqdm_notebook() function to display a progress bar as a Jupyter notebook widget.
    'tqdm_gui'
    Use the tqdm.tqdm_gui() function to display a progress bar as a graphical dialog box.

    ..versionadded:: 1.11.0

Returns:

A DataFrame populated with row data and column headers from the query results. The column headers are derived from the destination table’s schema.

Return type:

pandas.DataFrame

Raises:

ValueError – If the pandas library cannot be imported, or the google.cloud.bigquery_storage_v1beta1 module is required but cannot be imported.

total_rows#

The total number of rows in the table.

Type:int