Impala 上的多维数据集运算符

Question

在 Impala 和 PrestoDB 之间进行基准测试时，我们注意到在 Imapala 中构建 pivot tables 非常困难，因为它没有像 Presto 那样的 Cube 运算符。以下是 Presto 中的两个示例：

The CUBE operator generates all possible grouping sets (i.e. a power set) for a given set of columns. For example, the query:`

SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY CUBE (origin_state, destination_state);

is equivalent to:

SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS (
 (origin_state, destination_state),
 (origin_state),
 (destination_state),
 ());

另一个示例是 ROLLUP 运算符。完整文档在这里：https://prestodb.io/docs/current/sql/select.html.

它不是语法糖，因为 PRESTO 对整个查询执行一次 table 扫描 - 因此使用此运算符，您可以在一个请求中构建枢轴 table Impala 需要运行 2-3 个查询。

有没有一种方法我们可以通过一个查询/table-扫描 Impala instaead 3 来做到这一点？否则在创建任何类型的枢轴时性能都会变得很糟糕 table.

Answer 1

我们可以使用 impala windo 函数，但您将得到 3 列而不是单列输出。

SELECT origin_state,
        destination_state,
        SUM(package_weight) OVER (PARTITION BY origin_state, destination_state) AS pkgwgrbyorganddest,
        SUM(package_weight) OVER (PARTITION BY origin_state) AS pkgwgrbyorg,
        SUM(package_weight) OVER (PARTITION BY destination_state) AS pkgwgrbydest
 FROM shipping;

Impala 上的多维数据集运算符

Cube Operators on Impala

performance

hadoop

cloudera

presto

impala