如何将参数传递给colab上的bigquery查询
How to pass a parameter into bigquery query on colab
我在 colab 上有一个 Bigquery 查询:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')
project_id = '[your project ID]'
sample_count = 2000
df = pd.io.gbq.read_gbq('''
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE state = 'TX'
AND year BETWEEN 1910 AND 1920
GROUP BY name
ORDER BY count DESC
LIMIT 100
''', project_id=project_id, dialect='standard')
df.head()
它有效,但现在我尝试将一个参数传递到查询中并替换查询 WHERE 子句中的“1920”。此参数依赖于另一个文件
end_year = max(record.year) # set end_year
df = pd.io.gbq.read_gbq('''
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE state = 'TX'
AND year BETWEEN 1910 AND end_year
GROUP BY name
ORDER BY count DESC
LIMIT 100
''', project_id=project_id, dialect='standard')
df.head()
但是我得到一个错误:
BadRequest: 400 Syntax error: Unexpected identifier "end_year"
我猜参数没有成功传递到查询中,但我不知道如何解决它。
就 python 而言,您传递给 bigquery 的查询是一个字符串,因此它无法调用您的变量。
像这样:
end_year = max(record.year) # set end_year
df = pd.io.gbq.read_gbq('''
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE state = 'TX'
AND year BETWEEN 1910 AND %s
GROUP BY name
ORDER BY count DESC
LIMIT 100
''' % (end_year), project_id=project_id, dialect='standard')
df.head()
此处非常重要的说明——我假设这是您个人 运行 用于一次性数据分析的脚本,而不是用于 SQL 注入的生产应用程序中的代码可能是个问题。
正如@Mike Karp 所提到的,您代码中的查询是一个字符串,这就是当您将变量直接传递给查询时遇到错误的原因。
您也可以使用 python 的 f string 来格式化您的字符串并能够在您的查询中传递变量。
from google.colab import auth
import pandas as pd
auth.authenticate_user()
print('Authenticated')
project_id = 'PROJECT_ID'
end_year = max(record.year) # set end_year
query = (f" SELECT name, SUM(number) as count \
FROM `bigquery-public-data.usa_names.usa_1910_2013` \
WHERE state = 'TX' \
AND year BETWEEN 1910 AND {end_year} \
GROUP BY name \
ORDER BY count DESC \
LIMIT 100")
df = pd.io.gbq.read_gbq(query=query, project_id=project_id, dialect='standard')
df.head()
我在 colab 上有一个 Bigquery 查询:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')
project_id = '[your project ID]'
sample_count = 2000
df = pd.io.gbq.read_gbq('''
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE state = 'TX'
AND year BETWEEN 1910 AND 1920
GROUP BY name
ORDER BY count DESC
LIMIT 100
''', project_id=project_id, dialect='standard')
df.head()
它有效,但现在我尝试将一个参数传递到查询中并替换查询 WHERE 子句中的“1920”。此参数依赖于另一个文件
end_year = max(record.year) # set end_year
df = pd.io.gbq.read_gbq('''
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE state = 'TX'
AND year BETWEEN 1910 AND end_year
GROUP BY name
ORDER BY count DESC
LIMIT 100
''', project_id=project_id, dialect='standard')
df.head()
但是我得到一个错误:
BadRequest: 400 Syntax error: Unexpected identifier "end_year"
我猜参数没有成功传递到查询中,但我不知道如何解决它。
就 python 而言,您传递给 bigquery 的查询是一个字符串,因此它无法调用您的变量。
像这样:
end_year = max(record.year) # set end_year
df = pd.io.gbq.read_gbq('''
SELECT name, SUM(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE state = 'TX'
AND year BETWEEN 1910 AND %s
GROUP BY name
ORDER BY count DESC
LIMIT 100
''' % (end_year), project_id=project_id, dialect='standard')
df.head()
此处非常重要的说明——我假设这是您个人 运行 用于一次性数据分析的脚本,而不是用于 SQL 注入的生产应用程序中的代码可能是个问题。
正如@Mike Karp 所提到的,您代码中的查询是一个字符串,这就是当您将变量直接传递给查询时遇到错误的原因。
您也可以使用 python 的 f string 来格式化您的字符串并能够在您的查询中传递变量。
from google.colab import auth
import pandas as pd
auth.authenticate_user()
print('Authenticated')
project_id = 'PROJECT_ID'
end_year = max(record.year) # set end_year
query = (f" SELECT name, SUM(number) as count \
FROM `bigquery-public-data.usa_names.usa_1910_2013` \
WHERE state = 'TX' \
AND year BETWEEN 1910 AND {end_year} \
GROUP BY name \
ORDER BY count DESC \
LIMIT 100")
df = pd.io.gbq.read_gbq(query=query, project_id=project_id, dialect='standard')
df.head()