Google BigQuery 无法处理更大的结果集 "Response too large to return" 或 "Resources exceeded during query execution"

Google BigQuery unable to process larger result set getting "Response too large to return" or "Resources exceeded during query execution"

google-bigquery

我目前正在 C# 应用程序中处理大型 table（~105M 记录）。

当使用 'Order by' 或 'Order Each by' 子句查询 table 时，出现 "Resources exceeded during query execution" 错误。
如果我删除 'Order by' 或 'Order Each by' 子句，那么我会收到 Response too large to return 错误。

这是两个场景的示例查询（我使用的是维基百科 public table）

SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] 按 Id、标题顺序对每个进行分组按 ID、标题描述
SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] 按 Id, title

这是我的问题

Big Query Response 的最大大小是多少？
我们如何select查询请求中的所有记录而不是'Export Method'？

1。 Big Query Response 的最大大小是多少？

正如在 Quota-policy 查询中提到的那样，最大响应大小：压缩 10 GB（当 return 处理大型查询结果时无限制）

2。我们如何select查询请求中的所有记录而不是'Export Method'？

如果您计划运行一个可能 return 更大结果的查询，您可以在 job configuration.

中将 allowLargeResults 设置为 true

return 大结果的查询将需要更长的时间来执行，即使结果集很小，并且受制于 additional limitations:

您必须指定目的地 table。
您不能指定顶级 ORDER BY、TOP 或 LIMIT 子句。这样做会抵消使用 allowLargeResults 的好处，因为无法再并行计算查询输出。
Window 函数只有在与 PARTITION BY 子句结合使用时才能 return 大型查询结果。

详细了解如何分页以获得结果 here and also read from the BigQuery Analytics book, the pages that start with page 200, where it is explained how Jobs::getQueryResults 与 maxResults 参数和 int 的阻塞模式一起工作。

更新：

查询结果大小限制 - 有时，很难知道压缩后的 10 GB 数据表示。

当您运行在 BigQuery 中进行正常查询时，响应大小限制为 10 GB 的压缩数据。有时，很难知道压缩后的 10 GB 数据手段。它会压缩 2 倍吗？ 10倍？结果被压缩在它们各自的列，这意味着压缩比往往非常好的。例如，如果您有一列是一个国家的名称，那么可能只有几个不同的值。当你只有几个不同的值，这意味着没有很多唯一信息，并且该列通常会很好地压缩。如果您 return 加密数据块，它们将可能不会很好地压缩，因为它们大多是随机的。（这在上面链接的书第 220 页上有解释）

Google BigQuery 无法处理更大的结果集 "Response too large to return" 或 "Resources exceeded during query execution"

Google BigQuery unable to process larger result set getting "Response too large to return" or "Resources exceeded during query execution"

google-bigquery