Impala 上的多维数据集运算符
Cube Operators on Impala
在 Impala 和 PrestoDB 之间进行基准测试时,我们注意到在 Imapala 中构建 pivot tables 非常困难,因为它没有像 Presto 那样的 Cube 运算符。以下是 Presto 中的两个示例:
The CUBE operator generates all possible grouping sets (i.e. a power set) for a given set of columns. For example, the query:`
SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY CUBE (origin_state, destination_state);
is equivalent to:
SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS (
(origin_state, destination_state),
(origin_state),
(destination_state),
());
另一个示例是 ROLLUP
运算符。完整文档在这里:https://prestodb.io/docs/current/sql/select.html.
它不是语法糖,因为 PRESTO 对整个查询执行一次 table 扫描 - 因此使用此运算符,您可以在一个请求中构建枢轴 table Impala 需要 运行 2-3 个查询。
有没有一种方法我们可以通过一个查询/table-扫描 Impala instaead 3 来做到这一点?否则在创建任何类型的枢轴时性能都会变得很糟糕 table.
我们可以使用 impala windo 函数,但您将得到 3 列而不是单列输出。
SELECT origin_state,
destination_state,
SUM(package_weight) OVER (PARTITION BY origin_state, destination_state) AS pkgwgrbyorganddest,
SUM(package_weight) OVER (PARTITION BY origin_state) AS pkgwgrbyorg,
SUM(package_weight) OVER (PARTITION BY destination_state) AS pkgwgrbydest
FROM shipping;
在 Impala 和 PrestoDB 之间进行基准测试时,我们注意到在 Imapala 中构建 pivot tables 非常困难,因为它没有像 Presto 那样的 Cube 运算符。以下是 Presto 中的两个示例:
The CUBE operator generates all possible grouping sets (i.e. a power set) for a given set of columns. For example, the query:`
SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY CUBE (origin_state, destination_state);
is equivalent to:
SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS (
(origin_state, destination_state),
(origin_state),
(destination_state),
());
另一个示例是 ROLLUP
运算符。完整文档在这里:https://prestodb.io/docs/current/sql/select.html.
它不是语法糖,因为 PRESTO 对整个查询执行一次 table 扫描 - 因此使用此运算符,您可以在一个请求中构建枢轴 table Impala 需要 运行 2-3 个查询。
有没有一种方法我们可以通过一个查询/table-扫描 Impala instaead 3 来做到这一点?否则在创建任何类型的枢轴时性能都会变得很糟糕 table.
我们可以使用 impala windo 函数,但您将得到 3 列而不是单列输出。
SELECT origin_state,
destination_state,
SUM(package_weight) OVER (PARTITION BY origin_state, destination_state) AS pkgwgrbyorganddest,
SUM(package_weight) OVER (PARTITION BY origin_state) AS pkgwgrbyorg,
SUM(package_weight) OVER (PARTITION BY destination_state) AS pkgwgrbydest
FROM shipping;