BigQuery 错误 – UNIQUE_HEAP 需要一个 int32 参数
BigQuery Error – UNIQUE_HEAP requires an int32 argument
使用旧版 SQL,我尝试在 Google BigQuery 中使用 COUNT(DISTINCT field, n)
。但我收到以下错误:
UNIQUE_HEAP requires an int32 argument which is greater than 0 (error code: invalidQuery)
这是我使用过的查询:
SELECT
hits.page.pagePath AS Page,
COUNT(DISTINCT CONCAT(fullVisitorId, INTEGER(visitId)), 1e6) AS UniquePageviews,
COUNT(DISTINCT fullVisitorId, 1e6) as Users
FROM
[xxxxxxxx.ga_sessions_20170101]
GROUP BY
Page
ORDER BY
UniquePageviews DESC
LIMIT
20
BigQuery 甚至没有显示错误的行号,因此我不确定是哪一行导致了这个错误。
What could be possible cause of above error?
不要在 COUNT(DISTINCT)
中使用 1e6
。相反,对第二个参数 'N'
使用实际的 INTEGER 值(默认值为 1000),或者使用 EXACT_COUNT_DISTINCT()
。
EXACT_COUNT_DISTINCT() documentation
If you require greater accuracy from COUNT(DISTINCT), you can specify
a second parameter, n, which gives the threshold below which exact
results are guaranteed. By default, n is 1000, but if you give a
larger n, you will get exact results for COUNT(DISTINCT) up to that
value of n. However, giving larger values of n will reduce scalability
of this operator and may substantially increase query execution time
or cause the query to fail.
To compute the exact number of distinct values, use
EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using
GROUP EACH BY on the relevant field(s) and then applying COUNT(*). The
GROUP EACH BY approach is more scalable but might incur a slight
up-front performance penalty.
使用旧版 SQL,我尝试在 Google BigQuery 中使用 COUNT(DISTINCT field, n)
。但我收到以下错误:
UNIQUE_HEAP requires an int32 argument which is greater than 0 (error code: invalidQuery)
这是我使用过的查询:
SELECT
hits.page.pagePath AS Page,
COUNT(DISTINCT CONCAT(fullVisitorId, INTEGER(visitId)), 1e6) AS UniquePageviews,
COUNT(DISTINCT fullVisitorId, 1e6) as Users
FROM
[xxxxxxxx.ga_sessions_20170101]
GROUP BY
Page
ORDER BY
UniquePageviews DESC
LIMIT
20
BigQuery 甚至没有显示错误的行号,因此我不确定是哪一行导致了这个错误。
What could be possible cause of above error?
不要在 COUNT(DISTINCT)
中使用 1e6
。相反,对第二个参数 'N'
使用实际的 INTEGER 值(默认值为 1000),或者使用 EXACT_COUNT_DISTINCT()
。
EXACT_COUNT_DISTINCT() documentation
If you require greater accuracy from COUNT(DISTINCT), you can specify a second parameter, n, which gives the threshold below which exact results are guaranteed. By default, n is 1000, but if you give a larger n, you will get exact results for COUNT(DISTINCT) up to that value of n. However, giving larger values of n will reduce scalability of this operator and may substantially increase query execution time or cause the query to fail.
To compute the exact number of distinct values, use EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using GROUP EACH BY on the relevant field(s) and then applying COUNT(*). The GROUP EACH BY approach is more scalable but might incur a slight up-front performance penalty.