在 BigQuery 中按 2 列分区

Partition By 2 Columns in BigQuery

假设我想对每个单位的前3个月收入和收入金额进行排名,我可以执行以下没问题:

Select * FROM (SELECT Flat,
EXTRACT(YEAR FROM pay_day) AS Year,
EXTRACT(MONTH FROM pay_day) AS Month,
RANK() OVER(PARTITION BY Flat ORDER BY SUM(USD_amt) DESC) AS rank, 
SUM(USD_amt) AS Revenue
FROM `finances.reservations` AS f
GROUP BY Flat, Year, Month)
WHERE rank<=3 AND Flat IS NOT NULL
ORDER BY Flat, Year, Rank, Month 
ASC

结果

   Flat Year Month Rank Revenue
    1   2019    12  1   3281
    1   2019    4   2   3031
    1   2020    7   3   3019
    1A  2020    11  1   1805
    1A  2020    9   2   1178
    1A  2020    5   3   1166
    2   2020    1   1   3419
    2   2020    7   2   2644
    2   2021    1   3   2460
    3   2019    10  3   2466
    3   2020    6   1   2558
    3   2020    1   2   2530
    4   2020    7   1   0

但现在说我想要每个单位每年的前 3 个月,我认为我只需要按年修改分区如下:

RANK() OVER(PARTITION BY Flat, EXTRACT(YEAR FROM pay_day) ORDER BY SUM(USD_amt) DESC) AS rank

我希望结果是这样的:

   Flat Year Month Rank Revenue
    1   2019    12  1   3281
    1   2019    4   2   3031
    1   2019    1   3   2031
    1   2020    4   1   3031
    1   2020    9   2   3001
    1   2020    7   3   2919

但这会导致错误“PARTITION BY expression references column pay_day which is need not grouped and aggregated at [4:50]”我想知道我做错了什么?

table架构如下:

Field name            | Type    | Mode
----------------------+---------+------------
Flat                  | STRING  | NULLABLE
pay_day               | DATE    | NULLABLE
nights                | INTEGER | NULLABLE
check_in              | DATE    | NULLABLE
check_out             | DATE    | NULLABLE
nights__in_month_     | STRING  | NULLABLE
nights_outside_month_ | STRING  | NULLABLE
cleaning              | INTEGER | NULLABLE
currency              | STRING  | NULLABLE
USD_amt               | INTEGER | NULLABLE
EANR                  | STRING  | NULLABLE
name                  | STRING  | NULLABLE
people                | INTEGER | NULLABLE
country               | STRING  | NULLABLE
reservation_no_       | STRING  | NULLABLE
payment_processor     | STRING  | NULLABLE
Check_in_day          | STRING  | NULLABLE
Cleaner               | STRING  | NULLABLE
Review                | STRING  | NULLABLE

我想你可以在 BigQuery 中引用列别名:

SELECT *
FROM (SELECT Flat, EXTRACT(YEAR FROM pay_day) AS Year,
             EXTRACT(MONTH FROM pay_day) AS Month,
             RANK() OVER (PARTITION BY Flat, Year ORDER BY SUM(USD_amt) DESC) AS rank, 
             SUM(USD_amt) AS Revenue
      FROM `finances.reservations` AS f
      GROUP BY Flat, Year, Month
     ) ym
WHERE rank <= 3 AND Flat IS NOT NULL
ORDER BY Flat, Year, Rank, Month ASC

如果没有,您可以使用:

RANK() OVER (PARTITION BY Flat, EXTRACT(YEAR FROM MIN(pay_day))
             ORDER BY SUM(USD_amt) DESC
            ) AS rank, 

GROUP BY 范围内的最小日期具有相同的年份,因此这实际上是相同的。我意识到这就是我通常处理这个问题的方式。