Google BigQuery，如何将数据从 google 云存储加载到 BigQuery

Question

由于高性能，我正在切换到 Big Query。但是不知道如何将数据从 Google Cloud Storage 上传到 Big Query Database。还有一些问题... 我可以在使用 Big Query 时直接从 google 云存储访问我的数据库吗？我必须先将其转换为某种格式吗？我将如何继续将 Big Query 数据库更新到我的 Google 云存储数据库。

提前致谢。

Answer 1

假设您的数据采用受支持的格式（分隔符，例如 csv/tsv 或 json），您可以使用 [=] 轻松地将数据从 Google 云存储加载到 BigQuery 40=]、CLI 或 API。例如，使用 CLI：

bq load mydataset.mytable gs://my_bucket/file.csv name:string,gender:string,count:integer

这会将 file.csv 从您的 Google Cloud Storage 存储桶 'my_bucket' 加载到 'mydataset' 数据集下的 table 'mytable' 中。 table 将包含三列，- 字符串类型的名称和性别以及整数类型的计数。查看 BigQuery 快速入门指南可能对您有用 [1]

如果您需要添加更多数据，只需再次运行 bq load 命令，默认情况下，它会将 CSV 中的新行附加到 BigQuery 的 table。如果您需要覆盖数据，请添加 --replace 标志，这样它会在加载新数据之前擦除现有内容

此外，您甚至可以运行查询 Google 云存储中的文件，而无需先使用外部表将它们加载到 BigQuery [2]

[1] https://cloud.google.com/bigquery/bq-command-line-tool-quickstart

[2]https://cloud.google.com/bigquery/federated-data-sources

Answer 2

使用 Python 你可以更新为：

import numpy as np
import uuid
from gcloud import bigquery


def load_data_from_gcs(dataset_name, table_name, source):
    bigquery_client = bigquery.Client()
    dataset = bigquery_client.dataset(dataset_name)
    table = dataset.table(table_name)
    job_name = str(uuid.uuid4())
    if table.exists():
        table.delete()
    table.schema = (
        bigquery.SchemaField('ID', 'STRING'),
        bigquery.SchemaField('days', 'STRING'),
        bigquery.SchemaField('last_activ_date', 'STRING'),


    )

    table.create()
    job_name = str(uuid.uuid4())
    job = bigquery_client.load_table_from_storage(
        job_name, table, source)

    job.begin()

    wait_for_job(job)

    print('Loaded {} rows into {}:{}.'.format(
        job.output_rows, dataset_name, table_name))

def wait_for_job(job):
    while True:
        job.reload()
        if job.state == 'DONE':
            if job.error_result:
                raise RuntimeError(job.errors)
            return
        time.sleep(1)   
if __name__ == "__main__":
    load_data_from_gcs('my_model','my_output', 'gs://path-uat/data_project/my_output.csv')

Google BigQuery，如何将数据从 google 云存储加载到 BigQuery

Google BigQuery, How to load data from google cloud storage to BigQuery

google-cloud-storage

google-bigquery