如何使用 python 从 Google Cloud Storage 加载外部 BigQuery table?

How to loag external BigQuery table from Google Cloud Storage using python?

我在下面的代码 (derived from this tutorial) 中使用了 bigquery.Client()load_table_from_uri() 方法,它创建了原生 table:

from google.cloud import bigquery

def main():
    ''' Load all tables '''
    client = bigquery.Client()
    bq_load_file_in_gcs(
        client,
        'gs://bucket_name/data100rows.csv',
        'CSV',
        'test_data.data100_csv_native'
    )

def bq_load_file_in_gcs(client, path, fmt, table_name):
    '''
        Load BigQuery table from Google Cloud Storage

        client - bigquery client
        path - 'gs://path/to/upload.file',
        fmt -   The format of the data files. "CSV" / "NEWLINE_DELIMITED_JSON".
                https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.sourceFormat
        table_name - table with datasouce
    '''

    job_config = bigquery.LoadJobConfig()
    job_config.autodetect = True
    job_config.skip_leading_rows = 1
    job_config.source_format = fmt

    load_job = client.load_table_from_uri(
        path,
        table_name,
        job_config=job_config
    )

    assert load_job.job_type == 'load'

    load_job.result()  # Waits for table load to complete.

    assert load_job.state == 'DONE'

我需要的是还能够创建外部 table,就像我在 BigQuery 中所做的那样 UI:

但我无法找到在作业配置或方法参数中设置 table 类型的位置。这可能吗,如果是的话 - 如何?

示例在 External Configuration 章节中。

基本上您需要使用 table 对象的 external configuration,例如:

table = bigquery.Table(.........)

external_config = bigquery.ExternalConfig('CSV')
source_uris = ['<url-to-your-external-source>'] #i.e for a csv file in a Cloud Storage bucket 

external_config.source_uris = source_uris
table.external_data_configuration = external_config