无法将 BigQuery 遗留 SQL 转换为标准 SQL 以用于 HAVING LEFT(...)
Unable to translate BigQuery legacy SQL to standard SQL for HAVING LEFT(...)
我想使用 BigQuery Standard SQL for a query like this one:
SELECT package, COUNT(*) count
FROM (
SELECT REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, id
FROM (
SELECT SPLIT(content, '\n') line, id
FROM [github-groovy-files:github.contents]
WHERE content CONTAINS 'import'
HAVING LEFT(line, 6)='import' )
GROUP BY package, id
)
GROUP BY 1
ORDER BY count DESC
LIMIT 30;
我无法通过这样的事情(有效但不能分组或计数):
with lines as
(SELECT SPLIT(c.content, '\n') line, c.id as id
FROM `<dataset>.contents` c, `<dataset>.files` f
WHERE c.id = f.id AND f.path LIKE '%.groovy')
select
array(select REGEXP_REPLACE(l, r'import |;', '') AS class from unnest(line) as l where l like 'import %') imports, id
from lines;
LEFT()
不在标准 SQL 中,似乎没有接受数组类型的函数。
LEFT() is not in Standard SQL ...
在 BigQuery Standard SQL 中,您可以使用 SUBSTR(value, position[, length])
而不是 Legacy 的 LEFT
... and there doesn't seem to be a function that will accept and array type.
有很多 Array's related functions as well as functions that accept array as argument - for example UNNEST()
I would like to use BigQuery Standard SQL for a query like this one:
以下是 BigQuery 标准的等效查询 SQL
SELECT package, COUNT(*) COUNT
FROM (
SELECT REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, id
FROM (
SELECT line, id
FROM `github-groovy-files.github.contents`,
UNNEST(SPLIT(content, '\n')) line
WHERE SUBSTR(line, 1, 6)='import'
)
GROUP BY package, id
)
GROUP BY 1
ORDER BY COUNT DESC
LIMIT 30
您可以使用 WHERE line LIKE 'import%'
而不是 WHERE SUBSTR(line, 1, 6)='import'
另请注意,此查询可以用多种方式编写 - 因此在我上面的示例中,我专注于 "translating" 您的查询从遗留到标准 sql,同时保留核心结构和方法原始查询
但是如果你想用标准的力量重写它 SQL - 你最终会得到类似下面的东西
SELECT REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, COUNT(DISTINCT id) count
FROM `github-groovy-files.github.contents`,
UNNEST(SPLIT(content, '\n')) line
WHERE line LIKE 'import%'
GROUP BY 1
ORDER BY count DESC
LIMIT 30
我想使用 BigQuery Standard SQL for a query like this one:
SELECT package, COUNT(*) count
FROM (
SELECT REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, id
FROM (
SELECT SPLIT(content, '\n') line, id
FROM [github-groovy-files:github.contents]
WHERE content CONTAINS 'import'
HAVING LEFT(line, 6)='import' )
GROUP BY package, id
)
GROUP BY 1
ORDER BY count DESC
LIMIT 30;
我无法通过这样的事情(有效但不能分组或计数):
with lines as
(SELECT SPLIT(c.content, '\n') line, c.id as id
FROM `<dataset>.contents` c, `<dataset>.files` f
WHERE c.id = f.id AND f.path LIKE '%.groovy')
select
array(select REGEXP_REPLACE(l, r'import |;', '') AS class from unnest(line) as l where l like 'import %') imports, id
from lines;
LEFT()
不在标准 SQL 中,似乎没有接受数组类型的函数。
LEFT() is not in Standard SQL ...
在 BigQuery Standard SQL 中,您可以使用 SUBSTR(value, position[, length])
而不是 Legacy 的 LEFT
... and there doesn't seem to be a function that will accept and array type.
有很多 Array's related functions as well as functions that accept array as argument - for example UNNEST()
I would like to use BigQuery Standard SQL for a query like this one:
以下是 BigQuery 标准的等效查询 SQL
SELECT package, COUNT(*) COUNT
FROM (
SELECT REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, id
FROM (
SELECT line, id
FROM `github-groovy-files.github.contents`,
UNNEST(SPLIT(content, '\n')) line
WHERE SUBSTR(line, 1, 6)='import'
)
GROUP BY package, id
)
GROUP BY 1
ORDER BY COUNT DESC
LIMIT 30
您可以使用 WHERE line LIKE 'import%'
WHERE SUBSTR(line, 1, 6)='import'
另请注意,此查询可以用多种方式编写 - 因此在我上面的示例中,我专注于 "translating" 您的查询从遗留到标准 sql,同时保留核心结构和方法原始查询
但是如果你想用标准的力量重写它 SQL - 你最终会得到类似下面的东西
SELECT REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, COUNT(DISTINCT id) count
FROM `github-groovy-files.github.contents`,
UNNEST(SPLIT(content, '\n')) line
WHERE line LIKE 'import%'
GROUP BY 1
ORDER BY count DESC
LIMIT 30