如何使用 python 从 Google Cloud Storage 加载外部 BigQuery table?
How to loag external BigQuery table from Google Cloud Storage using python?
我在下面的代码 (derived from this tutorial) 中使用了 bigquery.Client()
的 load_table_from_uri()
方法,它创建了原生 table:
from google.cloud import bigquery
def main():
''' Load all tables '''
client = bigquery.Client()
bq_load_file_in_gcs(
client,
'gs://bucket_name/data100rows.csv',
'CSV',
'test_data.data100_csv_native'
)
def bq_load_file_in_gcs(client, path, fmt, table_name):
'''
Load BigQuery table from Google Cloud Storage
client - bigquery client
path - 'gs://path/to/upload.file',
fmt - The format of the data files. "CSV" / "NEWLINE_DELIMITED_JSON".
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.sourceFormat
table_name - table with datasouce
'''
job_config = bigquery.LoadJobConfig()
job_config.autodetect = True
job_config.skip_leading_rows = 1
job_config.source_format = fmt
load_job = client.load_table_from_uri(
path,
table_name,
job_config=job_config
)
assert load_job.job_type == 'load'
load_job.result() # Waits for table load to complete.
assert load_job.state == 'DONE'
我需要的是还能够创建外部 table,就像我在 BigQuery 中所做的那样 UI:
但我无法找到在作业配置或方法参数中设置 table 类型的位置。这可能吗,如果是的话 - 如何?
示例在 External Configuration 章节中。
基本上您需要使用 table 对象的 external configuration,例如:
table = bigquery.Table(.........)
external_config = bigquery.ExternalConfig('CSV')
source_uris = ['<url-to-your-external-source>'] #i.e for a csv file in a Cloud Storage bucket
external_config.source_uris = source_uris
table.external_data_configuration = external_config
我在下面的代码 (derived from this tutorial) 中使用了 bigquery.Client()
的 load_table_from_uri()
方法,它创建了原生 table:
from google.cloud import bigquery
def main():
''' Load all tables '''
client = bigquery.Client()
bq_load_file_in_gcs(
client,
'gs://bucket_name/data100rows.csv',
'CSV',
'test_data.data100_csv_native'
)
def bq_load_file_in_gcs(client, path, fmt, table_name):
'''
Load BigQuery table from Google Cloud Storage
client - bigquery client
path - 'gs://path/to/upload.file',
fmt - The format of the data files. "CSV" / "NEWLINE_DELIMITED_JSON".
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.sourceFormat
table_name - table with datasouce
'''
job_config = bigquery.LoadJobConfig()
job_config.autodetect = True
job_config.skip_leading_rows = 1
job_config.source_format = fmt
load_job = client.load_table_from_uri(
path,
table_name,
job_config=job_config
)
assert load_job.job_type == 'load'
load_job.result() # Waits for table load to complete.
assert load_job.state == 'DONE'
我需要的是还能够创建外部 table,就像我在 BigQuery 中所做的那样 UI:
但我无法找到在作业配置或方法参数中设置 table 类型的位置。这可能吗,如果是的话 - 如何?
示例在 External Configuration 章节中。
基本上您需要使用 table 对象的 external configuration,例如:
table = bigquery.Table(.........)
external_config = bigquery.ExternalConfig('CSV')
source_uris = ['<url-to-your-external-source>'] #i.e for a csv file in a Cloud Storage bucket
external_config.source_uris = source_uris
table.external_data_configuration = external_config