"Flatten" table 的重复字段和聚合

Question

虽然我已经习惯了 PostgreSQL 和许多其他 SQL 方言，但这件事有点难倒我：

我有一个 BigQuery table，看起来像：

orders
|- orderId
|- orderStatus
|- orderLines
   |- sku
   |- price_per_item
   |- quantity

按照标准 SQL 我会做的：

select orderLines.sku, sum(orderLines.price_per_item * quantity)
from flatten(orders, orderLines.sku) o
where orderStatus = 'valid'

但 "flatten" 在标准 SQL 中不起作用。

所以我可以这样做：

select array(select sku FROM UNNEST(orderLines)) sku, array(select price_per_item from unnest(orderLines)) revenue
from orders

但是，我现在无法总结，即：

select array(select sku FROM UNNEST(orderLines)) sku, sum(array(select price_per_item from unnest(orderLines))) revenue
from orders
group by sku

我尝试使用 'with' 语句来预创建 table。但是结果是一样的

什么是正确的方法？为什么这看起来不必要地冗长？

我有点恼火不得不使用遗留 SQL，因为我也在连接中使用了一个函数，它只适用于标准 SQL。

Answer 1

如果您熟悉 Postgre 中的数组SQL，您可能以前使用过 UNNEST 运算符。在这种情况下，您需要使用它来将数组与 table 本身连接起来，使重复变平：

select orderLine.sku, sum(orderLine.price_per_item * quantity)
from orders, UNNEST(orderLines) AS orderLine
where orderStatus = 'valid'
GROUP BY sku

（我添加了 GROUP BY，因为它看起来好像不见了）。有关数组的更多信息，包括使用 UNNEST 的示例，see the documentation. If you are used to using legacy SQL in BigQuery, there is a migration guide 描述了 BigQuery 中旧版和标准 SQL 在扁平化方面的差异以及其他主题。

"Flatten" table 的重复字段和聚合

"Flatten" a table's repeated field(s) and aggregate

google-bigquery

bigquery-standard-sql