在 Google Cloud Datalab iPython notebook 中为 TABLE_QUERY 传递参数
Passing parameters for TABLE_QUERY in Google Cloud Datalab iPython notebook
我对 Google Cloud Datalab 还是很陌生,在执行参数化查询时遇到一些问题。
我遵循从 Datalab tutorial 传递查询参数的示例,并尝试将其应用于以下查询:
%sql
SELECT user_id, localTime, event
FROM (SELECT user_id, DATE_ADD(date, timezoneOffset, "SECOND") AS localTime, event
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS "user_events_0"
AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
WHERE
user_id IS NOT NULL AND
timezoneOffset IS NOT NULL AND
event IS NOT NULL)
WHERE
user_id IN (SELECT id FROM [my_project:my_dataset.topUsers])
ORDER BY user_id, localTime
我想遍历所有 user_events tables,索引为 0,1,2,3 ... 为此,我我想传递 TABLE_QUERY 的参数并在循环的一次迭代中查询每个 table - 而不是同时查询所有 table。 (因为我需要在每个 table 中对用户记录进行排序;一次对所有 user_events table 执行查询时会超出资源)
1.) 我定义了一个新查询(%%sql --module topUserEvents
等)并替换了上面查询中的以下部分:
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS "user_events_0"
AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
与:
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS "user_events_'+$tableNr+
'" AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
执行查询,将 table 数字作为字符串传递 - 无效:
invalidQuery: Expected a string literal for TABLE_QUERY clause
2.) 我还尝试传递整个字符串,将部分原始查询替换为:
FROM (TABLE_QUERY([my_project:my_dataset:user_events], $tableString))
执行查询,传递整个字符串,返回大查询异常:
invalidQuery: Error preparing subsidiary query:
com.google.cloud.helix.server.bqsql.common.BigQueryException:
Encountered " "CONTAINS" "CONTAINS "" at line 1, column 94.
Was expecting:
")" ...
有谁知道如何为 TABLE_QUERY 参数 传递(部分)字符串,例如上述情况?
任何帮助将不胜感激:)
你能试试下面的方法吗?
定义模块'test1':
%%sql --module test1
SELECT count(*)
FROM TABLE_QUERY(publicdata:samples,
'MSEC_TO_TIMESTAMP(creation_time) < DATE_ADD(CURRENT_TIMESTAMP(), -7, $period)')
运行查询:
period = 'DAY'
bq.Query(test1, period = period).sample()
定义模块'test2':
%sql --module test2
SELECT user_id, localTime, event
FROM (SELECT user_id, DATE_ADD(date, timezoneOffset, "SECOND") AS localTime, event
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS $events_table_num
AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
WHERE
user_id IS NOT NULL AND
timezoneOffset IS NOT NULL AND
event IS NOT NULL)
WHERE
user_id IN (SELECT id FROM [my_project:my_dataset.topUsers])
ORDER BY user_id, localTime
运行查询:
events_table_num = 'user_events_0'
bq.Query(test2,events_table_num = events_table_num).sample()
我对 Google Cloud Datalab 还是很陌生,在执行参数化查询时遇到一些问题。
我遵循从 Datalab tutorial 传递查询参数的示例,并尝试将其应用于以下查询:
%sql
SELECT user_id, localTime, event
FROM (SELECT user_id, DATE_ADD(date, timezoneOffset, "SECOND") AS localTime, event
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS "user_events_0"
AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
WHERE
user_id IS NOT NULL AND
timezoneOffset IS NOT NULL AND
event IS NOT NULL)
WHERE
user_id IN (SELECT id FROM [my_project:my_dataset.topUsers])
ORDER BY user_id, localTime
我想遍历所有 user_events tables,索引为 0,1,2,3 ... 为此,我我想传递 TABLE_QUERY 的参数并在循环的一次迭代中查询每个 table - 而不是同时查询所有 table。 (因为我需要在每个 table 中对用户记录进行排序;一次对所有 user_events table 执行查询时会超出资源)
1.) 我定义了一个新查询(%%sql --module topUserEvents
等)并替换了上面查询中的以下部分:
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS "user_events_0"
AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
与:
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS "user_events_'+$tableNr+
'" AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
执行查询,将 table 数字作为字符串传递 - 无效:
invalidQuery: Expected a string literal for TABLE_QUERY clause
2.) 我还尝试传递整个字符串,将部分原始查询替换为:
FROM (TABLE_QUERY([my_project:my_dataset:user_events], $tableString))
执行查询,传递整个字符串,返回大查询异常:
invalidQuery: Error preparing subsidiary query:
com.google.cloud.helix.server.bqsql.common.BigQueryException:
Encountered " "CONTAINS" "CONTAINS "" at line 1, column 94.
Was expecting:
")" ...
有谁知道如何为 TABLE_QUERY 参数 传递(部分)字符串,例如上述情况?
任何帮助将不胜感激:)
你能试试下面的方法吗?
定义模块'test1':
%%sql --module test1
SELECT count(*)
FROM TABLE_QUERY(publicdata:samples,
'MSEC_TO_TIMESTAMP(creation_time) < DATE_ADD(CURRENT_TIMESTAMP(), -7, $period)')
运行查询:
period = 'DAY'
bq.Query(test1, period = period).sample()
定义模块'test2':
%sql --module test2
SELECT user_id, localTime, event
FROM (SELECT user_id, DATE_ADD(date, timezoneOffset, "SECOND") AS localTime, event
FROM (TABLE_QUERY([my_project:my_dataset:user_events],
'table_id CONTAINS $events_table_num
AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
WHERE
user_id IS NOT NULL AND
timezoneOffset IS NOT NULL AND
event IS NOT NULL)
WHERE
user_id IN (SELECT id FROM [my_project:my_dataset.topUsers])
ORDER BY user_id, localTime
运行查询:
events_table_num = 'user_events_0'
bq.Query(test2,events_table_num = events_table_num).sample()