如何在 colab 上设置 Google bigquery 环境变量

How to set Google bigquery environment variable on colab

我打算创建一个脚本来从 Bigquery 中提取数据,但我不知道如何设置环境变量。

这是官方文档中的一个实例:

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
"""
query_job = client.query(query)  # Make an API request.

print("The query data:")
for row in query_job:
    # Row values can be accessed by field name or index.
    print("name={}, count={}".format(row[0], row["total_people"]))

我运行这个但是return一个错误:

DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

我按照official doc,但是我遇到了一个问题:第二步是设置环境变量,但它只提供Windows和Linux/macOS上的实例。那么,如何在 Colab 上设置环境变量呢?

此外,我注意到实例要求我提供密钥路径。在本地机器上没问题,但我不认为上传我的密钥文件并在我的在线代码中传递它的 link 是个好主意。

无需设置环境变量或直接上传到 Colab,您可以将密钥上传到 Google 驱动器并在那里应用必要的限制。在您的代码中,您可以将 Google Drive 挂载到 Colab,使用 Drive 位置作为密钥文件路径进行身份验证。

from google.cloud import bigquery
from google.oauth2 import service_account
from google.colab import drive
import json
# Construct a BigQuery client object.

drive.mount('/content/drive/') # Mount to google drive

# Define full path from Google Drive.
# This example, key is in /MyDrive/Auth/
key_path = '/content/drive/MyDrive/Auth/your_key.json' 

credentials = service_account.Credentials.from_service_account_file(
    filename=key_path, scopes=["https://www.googleapis.com/auth/cloud-platform"],
)

client = bigquery.Client(credentials=credentials, project=credentials.project_id,)

query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
"""
query_job = client.query(query)  # Make an API request.

print("The query data:")
for row in query_job:
    # Row values can be accessed by field name or index.
    print("name={}, count={}".format(row[0], row["total_people"]))

输出:

我问了这个问题,我觉得Ricco D的方案可以完美解决我的问题

但是,我查看了 Google 官方文档,发现它提供了几种从 BigQuery 中提取数据的方法:

  1. 通过魔法使用 BigQuery (%%bigquery --project yourprojectid)
  2. 通过 google-cloud-bigquery 使用 BigQuery(使用 client.query())
  3. 通过 pandas-gbq 使用 BigQuery(使用 pd.io.gbq.read_gbq())

实例和参数设置见here