在 Google Cloud Datalab iPython notebook 中为 TABLE_QUERY 传递参数

Passing parameters for TABLE_QUERY in Google Cloud Datalab iPython notebook

我对 Google Cloud Datalab 还是很陌生,在执行参数化查询时遇到一些问题。

我遵循从 Datalab tutorial 传递查询参数的示例,并尝试将其应用于以下查询:

%sql
SELECT user_id, localTime, event
FROM (SELECT user_id, DATE_ADD(date, timezoneOffset, "SECOND") AS localTime, event
  FROM (TABLE_QUERY([my_project:my_dataset:user_events], 
       'table_id CONTAINS "user_events_0" 
       AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
  WHERE 
  user_id IS NOT NULL AND
  timezoneOffset IS NOT NULL AND
  event IS NOT NULL)
WHERE 
  user_id IN (SELECT id FROM [my_project:my_dataset.topUsers])
ORDER BY user_id, localTime

我想遍历所有 user_events tables,索引为 0,1,2,3 ... 为此,我我想传递 TABLE_QUERY 的参数并在循环的一次迭代中查询每个 table - 而不是同时查询所有 table。 (因为我需要在每个 table 中对用户记录进行排序;一次对所有 user_events table 执行查询时会超出资源)

1.) 我定义了一个新查询(%%sql --module topUserEvents 等)并替换了上面查询中的以下部分:

 FROM (TABLE_QUERY([my_project:my_dataset:user_events], 
      'table_id CONTAINS "user_events_0" 
       AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))

与:

  FROM (TABLE_QUERY([my_project:my_dataset:user_events], 
       'table_id CONTAINS "user_events_'+$tableNr+ 
       '" AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))

执行查询,将 table 数字作为字符串传递 - 无效:

invalidQuery: Expected a string literal for TABLE_QUERY clause

2.) 我还尝试传递整个字符串,将部分原始查询替换为:

  FROM (TABLE_QUERY([my_project:my_dataset:user_events], $tableString))

执行查询,传递整个字符串,返回大查询异常:

invalidQuery: Error preparing subsidiary query:
com.google.cloud.helix.server.bqsql.common.BigQueryException:
Encountered " "CONTAINS" "CONTAINS "" at line 1, column 94.
Was expecting:
")" ...

有谁知道如何为 TABLE_QUERY 参数 传递(部分)字符串,例如上述情况?

任何帮助将不胜感激:)

你能试试下面的方法吗?

定义模块'test1':

%%sql --module test1
SELECT count(*)
FROM TABLE_QUERY(publicdata:samples, 
  'MSEC_TO_TIMESTAMP(creation_time) < DATE_ADD(CURRENT_TIMESTAMP(), -7, $period)')

运行查询:

period = 'DAY'
bq.Query(test1, period = period).sample()

定义模块'test2':

%sql --module test2
SELECT user_id, localTime, event
FROM (SELECT user_id, DATE_ADD(date, timezoneOffset, "SECOND") AS localTime, event
  FROM (TABLE_QUERY([my_project:my_dataset:user_events], 
       'table_id CONTAINS $events_table_num 
       AND RIGHT(table_id, 8) BETWEEN "20160401" AND "20160408"'))
  WHERE 
  user_id IS NOT NULL AND
  timezoneOffset IS NOT NULL AND
  event IS NOT NULL)
WHERE 
  user_id IN (SELECT id FROM [my_project:my_dataset.topUsers])
ORDER BY user_id, localTime

运行查询:

events_table_num = 'user_events_0'
bq.Query(test2,events_table_num = events_table_num).sample()