BigQuery 中的 QueryString 解析

QueryString parsing in BigQuery

我在 BigQuery 中有一列包含 URL 查询字符串,例如 a=1&c=1。我想在我的查询中引用其中一些,例如。用 WHERE querystring.c = 1 之类的东西过滤。

我的计划是将查询字符串转换为JSON,然后使用JSON_EXTRACT。我想我可以编写一个 UDF 来将查询字符串转换为 JSON,但我无法将 node.js querystring package 导入我的 UDF 以简化此过程。

是否可以将 node.js 核心库导入 UDF,如果可以,如何导入?或者有没有更好的方法来实现我想要做的事情?

Alternatively is there a better way to achieve what I'm trying to do?

我认为 - 是的 - 使用 JS UDF 是昂贵的资源明智的并且有一些限制。使用 SQL UDF 成本更低,如果您愿意,可以使用 SQL UDF 在下方进行转换 - 但至少在下方让您了解 "alternative" 方法

对于 BigQuery 标准 SQL

#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, 'a=1&c=1' AS querystring UNION ALL
  SELECT 2, 'c=2&b=3'
)
SELECT 
  id, 
  querystring,
  SPLIT(kv, '=')[SAFE_OFFSET(0)] AS key,
  SPLIT(kv, '=')[SAFE_OFFSET(1)] AS value 
FROM yourTable, UNNEST(SPLIT(querystring, '&')) AS kv   

以上内容允许您 "extract" 所有键值对,如下所示

id  querystring key value    
2   c=2&b=3     b   3    
1   a=1&c=1     c   1    
1   a=1&c=1     a   1    
2   c=2&b=3     c   2    

所以现在你可以像下面这样在 WHERE 子句中使用它们

#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, 'a=1&c=1' AS querystring UNION ALL
  SELECT 2, 'c=2&b=3'
)
SELECT 
  id, 
  querystring,
FROM yourTable, UNNEST(SPLIT(querystring, '&')) AS kv
WHERE SPLIT(kv, '=')[SAFE_OFFSET(0)] = 'c' 
AND SPLIT(kv, '=')[SAFE_OFFSET(1)] = '1'

这给出了以下结果

id  querystring  
1   a=1&c=1    

注意:这只是方法的快速和抽象说明 - 我希望你能 adjust/adopt 根据你的具体情况

下面是上面的转换示例以使用 SQL UDF

#standardSQL
CREATE TEMPORARY FUNCTION parse(qs STRING, key STRING) AS (
  (SELECT SPLIT(kv, '=')[SAFE_OFFSET(1)] FROM UNNEST(SPLIT(qs, '&')) AS kv WHERE SPLIT(kv, '=')[SAFE_OFFSET(0)] = key )
);
WITH yourTable AS (
  SELECT 1 AS id, 'a=1&c=1' AS querystring UNION ALL
  SELECT 2, 'c=2&b=3'
)
SELECT 
  id, 
  querystring
FROM yourTable
WHERE parse(querystring, 'c') = '1'

注意:通常查询字符串没有键重复 - 因此没有解决重复的问题 - 但如果需要的话很容易:o)

but it doesn't decode any encoded components, so my values will still contain things like %20. Any suggestion on that?

#standardSQL
CREATE TEMPORARY FUNCTION parse(qs STRING, key STRING) AS (
  (SELECT SPLIT(kv, '=')[SAFE_OFFSET(1)] FROM UNNEST(SPLIT(qs, '&')) AS kv WHERE SPLIT(kv, '=')[SAFE_OFFSET(0)] = key )
);
CREATE TEMP FUNCTION decode(str STRING)
RETURNS STRING
LANGUAGE js AS """
  if (str == null) return null;
  try {
    return decodeURIComponent(str);
  } catch (e) {
    return str;
  }
""";
WITH yourTable AS (
  SELECT 1 AS id, 'a=1&c=1&d=a%20b%20c' AS querystring UNION ALL
  SELECT 2, 'c=2&b=3'
)
SELECT 
  id,
  querystring,
  decode(parse(querystring, 'd')) as d
FROM yourTable
WHERE parse(querystring, 'c') = '1'

结果是

id  querystring             d    
--  -------------------     -----
1   a=1&c=1&d=a%20b%20c     a b c