大查询:"Clustering encountered a key that is longer than"

BigQuery: "Clustering encountered a key that is longer than"

在对我的维基百科网页浏览表进行聚类时出现错误:

Clustering encountered a key that is longer than the maximum allowed limit of 1024 bytes.

上下文:https://medium.com/google-cloud/bigquery-optimized-cluster-your-tables-65e2f684594b

(我按

聚类
CREATE TABLE `fh-bigquery.wikipedia_v3.pageviews_2017`
PARTITION BY DATE(datehour)
CLUSTER BY wiki, title
...

)

当聚类表时,BigQuery 的键限制为 1KB。

您可以通过更改插入代码来解决示例表的这个问题,这样它会截断任何太长的条目。

例如,而不是:

INSERT INTO `fh-bigquery.wikipedia_v3.pageviews_2018` (datehour, wiki, title, views)
SELECT datehour, wiki, title, views

截断可能很长的标题:

INSERT INTO `fh-bigquery.wikipedia_v3.pageviews_2018` (datehour, wiki, title, views)
SELECT datehour, wiki, SUBSTR(title, 0, 300) title, views

如果您继续遇到错误,请注意一些格式错误的字符串的长度可能比 SUBSTR() 看到的更长。过滤掉那些:

WHERE BYTE_LENGTH(title) < 300