BigQuery 中的 QueryString 解析
QueryString parsing in BigQuery
我在 BigQuery 中有一列包含 URL 查询字符串,例如 a=1&c=1
。我想在我的查询中引用其中一些,例如。用 WHERE querystring.c = 1
之类的东西过滤。
我的计划是将查询字符串转换为JSON,然后使用JSON_EXTRACT。我想我可以编写一个 UDF 来将查询字符串转换为 JSON,但我无法将 node.js querystring package 导入我的 UDF 以简化此过程。
是否可以将 node.js 核心库导入 UDF,如果可以,如何导入?或者有没有更好的方法来实现我想要做的事情?
Alternatively is there a better way to achieve what I'm trying to do?
我认为 - 是的 - 使用 JS UDF 是昂贵的资源明智的并且有一些限制。使用 SQL UDF 成本更低,如果您愿意,可以使用 SQL UDF 在下方进行转换 - 但至少在下方让您了解 "alternative" 方法
对于 BigQuery 标准 SQL
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, 'a=1&c=1' AS querystring UNION ALL
SELECT 2, 'c=2&b=3'
)
SELECT
id,
querystring,
SPLIT(kv, '=')[SAFE_OFFSET(0)] AS key,
SPLIT(kv, '=')[SAFE_OFFSET(1)] AS value
FROM yourTable, UNNEST(SPLIT(querystring, '&')) AS kv
以上内容允许您 "extract" 所有键值对,如下所示
id querystring key value
2 c=2&b=3 b 3
1 a=1&c=1 c 1
1 a=1&c=1 a 1
2 c=2&b=3 c 2
所以现在你可以像下面这样在 WHERE 子句中使用它们
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, 'a=1&c=1' AS querystring UNION ALL
SELECT 2, 'c=2&b=3'
)
SELECT
id,
querystring,
FROM yourTable, UNNEST(SPLIT(querystring, '&')) AS kv
WHERE SPLIT(kv, '=')[SAFE_OFFSET(0)] = 'c'
AND SPLIT(kv, '=')[SAFE_OFFSET(1)] = '1'
这给出了以下结果
id querystring
1 a=1&c=1
注意:这只是方法的快速和抽象说明 - 我希望你能 adjust/adopt 根据你的具体情况
下面是上面的转换示例以使用 SQL UDF
#standardSQL
CREATE TEMPORARY FUNCTION parse(qs STRING, key STRING) AS (
(SELECT SPLIT(kv, '=')[SAFE_OFFSET(1)] FROM UNNEST(SPLIT(qs, '&')) AS kv WHERE SPLIT(kv, '=')[SAFE_OFFSET(0)] = key )
);
WITH yourTable AS (
SELECT 1 AS id, 'a=1&c=1' AS querystring UNION ALL
SELECT 2, 'c=2&b=3'
)
SELECT
id,
querystring
FROM yourTable
WHERE parse(querystring, 'c') = '1'
注意:通常查询字符串没有键重复 - 因此没有解决重复的问题 - 但如果需要的话很容易:o)
but it doesn't decode any encoded components, so my values will still contain things like %20. Any suggestion on that?
#standardSQL
CREATE TEMPORARY FUNCTION parse(qs STRING, key STRING) AS (
(SELECT SPLIT(kv, '=')[SAFE_OFFSET(1)] FROM UNNEST(SPLIT(qs, '&')) AS kv WHERE SPLIT(kv, '=')[SAFE_OFFSET(0)] = key )
);
CREATE TEMP FUNCTION decode(str STRING)
RETURNS STRING
LANGUAGE js AS """
if (str == null) return null;
try {
return decodeURIComponent(str);
} catch (e) {
return str;
}
""";
WITH yourTable AS (
SELECT 1 AS id, 'a=1&c=1&d=a%20b%20c' AS querystring UNION ALL
SELECT 2, 'c=2&b=3'
)
SELECT
id,
querystring,
decode(parse(querystring, 'd')) as d
FROM yourTable
WHERE parse(querystring, 'c') = '1'
结果是
id querystring d
-- ------------------- -----
1 a=1&c=1&d=a%20b%20c a b c
我在 BigQuery 中有一列包含 URL 查询字符串,例如 a=1&c=1
。我想在我的查询中引用其中一些,例如。用 WHERE querystring.c = 1
之类的东西过滤。
我的计划是将查询字符串转换为JSON,然后使用JSON_EXTRACT。我想我可以编写一个 UDF 来将查询字符串转换为 JSON,但我无法将 node.js querystring package 导入我的 UDF 以简化此过程。
是否可以将 node.js 核心库导入 UDF,如果可以,如何导入?或者有没有更好的方法来实现我想要做的事情?
Alternatively is there a better way to achieve what I'm trying to do?
我认为 - 是的 - 使用 JS UDF 是昂贵的资源明智的并且有一些限制。使用 SQL UDF 成本更低,如果您愿意,可以使用 SQL UDF 在下方进行转换 - 但至少在下方让您了解 "alternative" 方法
对于 BigQuery 标准 SQL
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, 'a=1&c=1' AS querystring UNION ALL
SELECT 2, 'c=2&b=3'
)
SELECT
id,
querystring,
SPLIT(kv, '=')[SAFE_OFFSET(0)] AS key,
SPLIT(kv, '=')[SAFE_OFFSET(1)] AS value
FROM yourTable, UNNEST(SPLIT(querystring, '&')) AS kv
以上内容允许您 "extract" 所有键值对,如下所示
id querystring key value
2 c=2&b=3 b 3
1 a=1&c=1 c 1
1 a=1&c=1 a 1
2 c=2&b=3 c 2
所以现在你可以像下面这样在 WHERE 子句中使用它们
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, 'a=1&c=1' AS querystring UNION ALL
SELECT 2, 'c=2&b=3'
)
SELECT
id,
querystring,
FROM yourTable, UNNEST(SPLIT(querystring, '&')) AS kv
WHERE SPLIT(kv, '=')[SAFE_OFFSET(0)] = 'c'
AND SPLIT(kv, '=')[SAFE_OFFSET(1)] = '1'
这给出了以下结果
id querystring
1 a=1&c=1
注意:这只是方法的快速和抽象说明 - 我希望你能 adjust/adopt 根据你的具体情况
下面是上面的转换示例以使用 SQL UDF
#standardSQL
CREATE TEMPORARY FUNCTION parse(qs STRING, key STRING) AS (
(SELECT SPLIT(kv, '=')[SAFE_OFFSET(1)] FROM UNNEST(SPLIT(qs, '&')) AS kv WHERE SPLIT(kv, '=')[SAFE_OFFSET(0)] = key )
);
WITH yourTable AS (
SELECT 1 AS id, 'a=1&c=1' AS querystring UNION ALL
SELECT 2, 'c=2&b=3'
)
SELECT
id,
querystring
FROM yourTable
WHERE parse(querystring, 'c') = '1'
注意:通常查询字符串没有键重复 - 因此没有解决重复的问题 - 但如果需要的话很容易:o)
but it doesn't decode any encoded components, so my values will still contain things like %20. Any suggestion on that?
#standardSQL
CREATE TEMPORARY FUNCTION parse(qs STRING, key STRING) AS (
(SELECT SPLIT(kv, '=')[SAFE_OFFSET(1)] FROM UNNEST(SPLIT(qs, '&')) AS kv WHERE SPLIT(kv, '=')[SAFE_OFFSET(0)] = key )
);
CREATE TEMP FUNCTION decode(str STRING)
RETURNS STRING
LANGUAGE js AS """
if (str == null) return null;
try {
return decodeURIComponent(str);
} catch (e) {
return str;
}
""";
WITH yourTable AS (
SELECT 1 AS id, 'a=1&c=1&d=a%20b%20c' AS querystring UNION ALL
SELECT 2, 'c=2&b=3'
)
SELECT
id,
querystring,
decode(parse(querystring, 'd')) as d
FROM yourTable
WHERE parse(querystring, 'c') = '1'
结果是
id querystring d
-- ------------------- -----
1 a=1&c=1&d=a%20b%20c a b c