如何使用调用 UDF 的 Python 脚本来使用 BigQuery API
How to use the BigQuery API using a Python script calling a UDF
针对 BigQuery table,我正在尝试 运行 调用 UDF 的 SQL 语句。此语句在 Python 脚本中执行,并通过 BigQuery API.
进行调用
当我执行一个没有 UDF 的简单 SQL 语句时,它工作正常。但是,当我尝试使用 UDF 脚本(存储在本地或 GCS 存储桶中)时,我不断收到相同的错误。
这是我在本地终端上得到的(我通过 Python Launcher 运行 脚本):
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/googleapiclient/http.py",
line 840, in execute
raise HttpError(resp, content, uri=self.uri) googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/[projectId]/queries?alt=json
returned "Required parameter is missing">
这是我的 Python 脚本:
credentials = SignedJwtAssertionCredentials(
SERVICE_ACCOUNT_EMAIL,
key,
scope='https://www.googleapis.com/auth/bigquery')
aservice = build('bigquery','v2',credentials=credentials)
query_requestb = aservice.jobs()
query_data = {
'configuration': {
'query': {
'userDefinedFunctionResources': [
{
'resourceUri': 'gs://[bucketName]/[fileName].js'
}
],
'query': sql
}
},
'timeoutMs': 100000
}
query_response = query_requestb.query(projectId=PROJECT_NUMBER,body=query_data).execute(num_retries=0)
知道 'parameter is missing' 是什么,或者我怎样才能把它变成 运行?
不要指定 userDefinedFunctionResources
,而是在 'query'
的正文中使用 CREATE TEMP FUNCTION
,并在 OPTIONS
子句中引用库。您将需要使用 standard SQL for this, and you can also refer to the documentation on user-defined functions。您的查询看起来像这样:
#standardSQL
CREATE TEMP FUNCTION MyJsFunction(x FLOAT64) RETURNS FLOAT64 LANGUAGE js AS """
return my_js_function(x);
"""
OPTIONS (library='gs://[bucketName]/[fileName].js');
SELECT MyJsFunction(x)
FROM MyTable;
我想要 运行 的查询是按我通常使用 UDF 的营销渠道对流量和销售进行分类。这是我 运行 使用 standard SQL
的查询。此查询存储在我读取并存储在变量 sql
:
中的文件中
CREATE TEMPORARY FUNCTION
mktchannels(source STRING,
medium STRING,
campaign STRING)
RETURNS STRING
LANGUAGE js AS """
return channelGrouping(source,medium,campaign) // where channelGrouping is the function in my channelgrouping.js file which contains the attribution rules
""" OPTIONS ( library=["gs://[bucket]/[path]/regex.js",
"gs://[bucket]/[path]/channelgrouping.js"] );
WITH
traffic AS ( // select fields from the BigQuery table
SELECT
device.deviceCategory AS device,
trafficSource.source AS source,
trafficSource.medium AS medium,
trafficSource.campaign AS campaign,
SUM(totals.visits) AS sessions,
SUM(totals.transactionRevenue)/1e6 as revenue,
SUM(totals.transactions) as transactions
FROM
`[datasetId].[table]`
GROUP BY
device,
source,
medium,
campaign)
SELECT
mktchannels(source,
medium,
campaign) AS channel, // call the temp function set above
device,
SUM(sessions) AS sessions,
SUM(transactions) as transactions,
ROUND(SUM(revenue),2) as revenue
FROM
traffic
GROUP BY
device,
channel
ORDER BY
channel,
device;
然后在 Python 脚本中:
fd = file('myquery.sql', 'r')
sql = fd.read()
fd.close()
query_data = {
'query': sql,
'maximumBillingTier': 10,
'useLegacySql': False,
'timeoutMs': 300000
}
希望这对以后的任何人都有帮助!
针对 BigQuery table,我正在尝试 运行 调用 UDF 的 SQL 语句。此语句在 Python 脚本中执行,并通过 BigQuery API.
进行调用当我执行一个没有 UDF 的简单 SQL 语句时,它工作正常。但是,当我尝试使用 UDF 脚本(存储在本地或 GCS 存储桶中)时,我不断收到相同的错误。 这是我在本地终端上得到的(我通过 Python Launcher 运行 脚本):
Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/googleapiclient/http.py", line 840, in execute raise HttpError(resp, content, uri=self.uri) googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/[projectId]/queries?alt=json returned "Required parameter is missing">
这是我的 Python 脚本:
credentials = SignedJwtAssertionCredentials(
SERVICE_ACCOUNT_EMAIL,
key,
scope='https://www.googleapis.com/auth/bigquery')
aservice = build('bigquery','v2',credentials=credentials)
query_requestb = aservice.jobs()
query_data = {
'configuration': {
'query': {
'userDefinedFunctionResources': [
{
'resourceUri': 'gs://[bucketName]/[fileName].js'
}
],
'query': sql
}
},
'timeoutMs': 100000
}
query_response = query_requestb.query(projectId=PROJECT_NUMBER,body=query_data).execute(num_retries=0)
知道 'parameter is missing' 是什么,或者我怎样才能把它变成 运行?
不要指定 userDefinedFunctionResources
,而是在 'query'
的正文中使用 CREATE TEMP FUNCTION
,并在 OPTIONS
子句中引用库。您将需要使用 standard SQL for this, and you can also refer to the documentation on user-defined functions。您的查询看起来像这样:
#standardSQL
CREATE TEMP FUNCTION MyJsFunction(x FLOAT64) RETURNS FLOAT64 LANGUAGE js AS """
return my_js_function(x);
"""
OPTIONS (library='gs://[bucketName]/[fileName].js');
SELECT MyJsFunction(x)
FROM MyTable;
我想要 运行 的查询是按我通常使用 UDF 的营销渠道对流量和销售进行分类。这是我 运行 使用 standard SQL
的查询。此查询存储在我读取并存储在变量 sql
:
CREATE TEMPORARY FUNCTION
mktchannels(source STRING,
medium STRING,
campaign STRING)
RETURNS STRING
LANGUAGE js AS """
return channelGrouping(source,medium,campaign) // where channelGrouping is the function in my channelgrouping.js file which contains the attribution rules
""" OPTIONS ( library=["gs://[bucket]/[path]/regex.js",
"gs://[bucket]/[path]/channelgrouping.js"] );
WITH
traffic AS ( // select fields from the BigQuery table
SELECT
device.deviceCategory AS device,
trafficSource.source AS source,
trafficSource.medium AS medium,
trafficSource.campaign AS campaign,
SUM(totals.visits) AS sessions,
SUM(totals.transactionRevenue)/1e6 as revenue,
SUM(totals.transactions) as transactions
FROM
`[datasetId].[table]`
GROUP BY
device,
source,
medium,
campaign)
SELECT
mktchannels(source,
medium,
campaign) AS channel, // call the temp function set above
device,
SUM(sessions) AS sessions,
SUM(transactions) as transactions,
ROUND(SUM(revenue),2) as revenue
FROM
traffic
GROUP BY
device,
channel
ORDER BY
channel,
device;
然后在 Python 脚本中:
fd = file('myquery.sql', 'r')
sql = fd.read()
fd.close()
query_data = {
'query': sql,
'maximumBillingTier': 10,
'useLegacySql': False,
'timeoutMs': 300000
}
希望这对以后的任何人都有帮助!