SQL - 按一列和一些字段类型分区
SQL - partition by one column and some field types
我的 table 大得多,但剪短一点会像:
---------+---+----------+--------+------------+---
|distance|qtt|deliver_by| store |deliver_time| ...
+--------+---+----------+--------+------------|---
| 11 | 1| pa | store_a| 1111 |
| 123 | 2| pa | store_a| 1112 |
| 33 | 3| pb | store_a| 1113 |
| 33 | 2| pa | store_b| 2221 |
| 44 | 2| pb | store_b| 2222 |
| 5 | 2| pc | store_b| 2223 |
| 5 | 2| pc | store_b| 2224 |
| 6 | 5| pb | store_c| 3331 |
| 7 | 5| pb | store_c| 3332 |
----------------------------------------------....
有多家商店,但只有 3 家可能送货(deliver_by:pa、pb 和 pc) 在特定时间交付产品。考虑 deliver_time
一个时间戳。
我想 select 整个 table 和 添加 6 个新列, min 和 max 次在商店中每 deliver_by
次。
商店可以由 3 种交付方式(pa、pb、pc)中的任何一种提供服务,但不是必需的。
我几乎可以完成正确的结果,使用下面的查询,问题是在案例 deliver_by
pX 不存在,我没有得到 null 而是 min/max到店送货。
我真的很想使用分区依据,所以我写了这个来添加新的 min/max 列:
select
min(deliver_time) over (partition by store, deliver_by='pa') as as min_time_sd_pa
, max(deliver_time) over (partition by store, deliver_by='pa') as as min_time_sd_pa
, min(deliver_time) over (partition by store, deliver_by='pb') as as min_time_sd_pb
, max(deliver_time) over (partition by store, deliver_by='pb') as as min_time_sd_pb
, min(deliver_time) over (partition by store, deliver_by='pc') as as min_time_sd_pc
, max(deliver_time) over (partition by store, deliver_by='pc') as as min_time_sd_pc
, distance, qtt, ....
from mytable
正确的输出应该是:
min_time_sd_pa|max_time_sd_pa|min_time_sd_pb|max_time_sd_pb|min_time_sd_pc|max_time_sd_pc|distance|qtt|deliver_by| store |deliver_time
--------------+--------------+--------------+--------------+--------------+--------------+--------+---+----------+--------+------------
1111 | 1112 | 1113 | 1113 | null | null | 11 | 1| pa | store_a| 1111
1111 | 1112 | 1113 | 1113 | null | null | 123 | 2| pa | store_a| 1112
1111 | 1112 | 1113 | 1113 | null | null | 33 | 3| pb | store_a| 1113
2221 | 2221 | 2222 | 2222 | 2223 | 2224 | 33 | 2| pa | store_b| 2221
2221 | 2221 | 2222 | 2222 | 2223 | 2224 | 44 | 2| pb | store_b| 2222
2221 | 2221 | 2222 | 2222 | 2223 | 2224 | 5 | 2| pc | store_b| 2223
2221 | 2221 | 2222 | 2222 | 2223 | 2224 | 5 | 2| pc | store_b| 2224
null | null | null | null | 3331 | 3332 | 6 | 5| pb | store_c| 3331
null | null | null | null | 3331 | 3332 | 7 | 5| pb | store_c| 3332
---------------------------------------------------------------------------------------------------------------------------------------
我的 select min(..) over..
语句中缺少什么,或者我怎样才能以最简单的方式实现这个结果?
我正在使用 Hive QL,但我想这在大多数 SQL DBMS.
中都是通用的
谢谢
您可以在 min
和 max
中使用 case
表达式。
select
min(case when deliver_by='pa' then deliver_time end) over (partition by store) as min_time_sd_pa
,max(case when deliver_by='pa' then deliver_time end) over (partition by store) as max_time_sd_pa
,min(case when deliver_by='pb' then deliver_time end) over (partition by store) as min_time_sd_pb
,max(case when deliver_by='pb' then deliver_time end) over (partition by store) as max_time_sd_pb
,min(case when deliver_by='pc' then deliver_time end) over (partition by store) as min_time_sd_pc
,max(case when deliver_by='pc' then deliver_time end) over (partition by store) as max_time_sd_pc
,m.*
from mytable m
我的 table 大得多,但剪短一点会像:
---------+---+----------+--------+------------+---
|distance|qtt|deliver_by| store |deliver_time| ...
+--------+---+----------+--------+------------|---
| 11 | 1| pa | store_a| 1111 |
| 123 | 2| pa | store_a| 1112 |
| 33 | 3| pb | store_a| 1113 |
| 33 | 2| pa | store_b| 2221 |
| 44 | 2| pb | store_b| 2222 |
| 5 | 2| pc | store_b| 2223 |
| 5 | 2| pc | store_b| 2224 |
| 6 | 5| pb | store_c| 3331 |
| 7 | 5| pb | store_c| 3332 |
----------------------------------------------....
有多家商店,但只有 3 家可能送货(deliver_by:pa、pb 和 pc) 在特定时间交付产品。考虑 deliver_time
一个时间戳。
我想 select 整个 table 和 添加 6 个新列, min 和 max 次在商店中每 deliver_by
次。
商店可以由 3 种交付方式(pa、pb、pc)中的任何一种提供服务,但不是必需的。
我几乎可以完成正确的结果,使用下面的查询,问题是在案例 deliver_by
pX 不存在,我没有得到 null 而是 min/max到店送货。
我真的很想使用分区依据,所以我写了这个来添加新的 min/max 列:
select
min(deliver_time) over (partition by store, deliver_by='pa') as as min_time_sd_pa
, max(deliver_time) over (partition by store, deliver_by='pa') as as min_time_sd_pa
, min(deliver_time) over (partition by store, deliver_by='pb') as as min_time_sd_pb
, max(deliver_time) over (partition by store, deliver_by='pb') as as min_time_sd_pb
, min(deliver_time) over (partition by store, deliver_by='pc') as as min_time_sd_pc
, max(deliver_time) over (partition by store, deliver_by='pc') as as min_time_sd_pc
, distance, qtt, ....
from mytable
正确的输出应该是:
min_time_sd_pa|max_time_sd_pa|min_time_sd_pb|max_time_sd_pb|min_time_sd_pc|max_time_sd_pc|distance|qtt|deliver_by| store |deliver_time
--------------+--------------+--------------+--------------+--------------+--------------+--------+---+----------+--------+------------
1111 | 1112 | 1113 | 1113 | null | null | 11 | 1| pa | store_a| 1111
1111 | 1112 | 1113 | 1113 | null | null | 123 | 2| pa | store_a| 1112
1111 | 1112 | 1113 | 1113 | null | null | 33 | 3| pb | store_a| 1113
2221 | 2221 | 2222 | 2222 | 2223 | 2224 | 33 | 2| pa | store_b| 2221
2221 | 2221 | 2222 | 2222 | 2223 | 2224 | 44 | 2| pb | store_b| 2222
2221 | 2221 | 2222 | 2222 | 2223 | 2224 | 5 | 2| pc | store_b| 2223
2221 | 2221 | 2222 | 2222 | 2223 | 2224 | 5 | 2| pc | store_b| 2224
null | null | null | null | 3331 | 3332 | 6 | 5| pb | store_c| 3331
null | null | null | null | 3331 | 3332 | 7 | 5| pb | store_c| 3332
---------------------------------------------------------------------------------------------------------------------------------------
我的 select min(..) over..
语句中缺少什么,或者我怎样才能以最简单的方式实现这个结果?
我正在使用 Hive QL,但我想这在大多数 SQL DBMS.
谢谢
您可以在 min
和 max
中使用 case
表达式。
select
min(case when deliver_by='pa' then deliver_time end) over (partition by store) as min_time_sd_pa
,max(case when deliver_by='pa' then deliver_time end) over (partition by store) as max_time_sd_pa
,min(case when deliver_by='pb' then deliver_time end) over (partition by store) as min_time_sd_pb
,max(case when deliver_by='pb' then deliver_time end) over (partition by store) as max_time_sd_pb
,min(case when deliver_by='pc' then deliver_time end) over (partition by store) as min_time_sd_pc
,max(case when deliver_by='pc' then deliver_time end) over (partition by store) as max_time_sd_pc
,m.*
from mytable m