Hive - 根据收集时间提取第一条和最后一条记录的数据
Hive - pull the data for first and last records based on collection time
我有一个配置单元 table,我们每天都在其中获取带有收集时间(unix 时间戳)的数据
+--------+-----------------+------+----------+
| ticket | collection_time | type | day |
+--------+-----------------+------+----------+
| t1 | 123 | auto | 20170302 |
| t2 | 234 | req | 20170302 |
| t3 | 345 | req | 20170302 |
| t4 | 678 | auto | 20170303 |
| t5 | 111 | req | 20170301 |
| t6 | 222 | auto | 20170301 |
| t7 | 333 | auto | 20170301 |
| t8 | 444 | req | 20170301 |
+--------+-----------------+------+----------+
我想查询一天的总票数以及一天中第一次取票和最后一次取票的票数(取票时间和类型)。请提出使用配置单元查询解决此问题的任何方法。
+---------------+---------------------+----------------+--------------------+---------------+----------+
| count(ticket) | first_tkt_coll_time | first_tkt_type | last_tkt_coll_time | last_tkt_type | day |
+---------------+---------------------+----------------+--------------------+---------------+----------+
| 3 | 123 | auto | 345 | req20170302 | |
| 1 | 678 | auto | 678 | auto | 20170303 |
| 4 | 111 | req | 444 | req | 20170301 |
+---------------+---------------------+----------------+--------------------+---------------+----------+
我们还需要在同一输出集中输入第二天的计数(票)。
+---------+-------------+-----------------+------+-----------------+------+-----+
| cnt_day | cnt_nxt_day | collection_time | type | collection_time | type | day |
+---------+-------------+-----------------+------+-----------------+------+-----+
| 4 | 3| 111 | req | 444 | req | 20170301 |
| 3 | 1| 123 | auto | 345 | req | 20170302 |
| 1 | - | 678 | auto | 678 | auto | 20170303 |
+---------+-------------+-----------------+------+-----------------+------+-----+
提前谢谢你:)
select cnt
,cnt_next_day
,min_val.collection_time
,min_val.type
,max_val.collection_time
,max_val.type
,day
from (select count(*) as cnt
,lead (count(*)) over (order by day) as cnt_next_day
,min(named_struct('collection_time',collection_time,'type',type)) as min_val
,max(named_struct('collection_time',collection_time,'type',type)) as max_val
,day
from mytable
group by day
) t
+-----+--------------+-----------------+------+-----------------+------+----------+
| cnt | cnt_next_day | collection_time | type | collection_time | type | day |
+-----+--------------+-----------------+------+-----------------+------+----------+
| 4 | 3 | 111 | req | 444 | req | 20170301 |
| 3 | 1 | 123 | auto | 345 | req | 20170302 |
| 1 | NULL | 678 | auto | 678 | auto | 20170303 |
+-----+--------------+-----------------+------+-----------------+------+----------+
我有一个配置单元 table,我们每天都在其中获取带有收集时间(unix 时间戳)的数据
+--------+-----------------+------+----------+
| ticket | collection_time | type | day |
+--------+-----------------+------+----------+
| t1 | 123 | auto | 20170302 |
| t2 | 234 | req | 20170302 |
| t3 | 345 | req | 20170302 |
| t4 | 678 | auto | 20170303 |
| t5 | 111 | req | 20170301 |
| t6 | 222 | auto | 20170301 |
| t7 | 333 | auto | 20170301 |
| t8 | 444 | req | 20170301 |
+--------+-----------------+------+----------+
我想查询一天的总票数以及一天中第一次取票和最后一次取票的票数(取票时间和类型)。请提出使用配置单元查询解决此问题的任何方法。
+---------------+---------------------+----------------+--------------------+---------------+----------+
| count(ticket) | first_tkt_coll_time | first_tkt_type | last_tkt_coll_time | last_tkt_type | day |
+---------------+---------------------+----------------+--------------------+---------------+----------+
| 3 | 123 | auto | 345 | req20170302 | |
| 1 | 678 | auto | 678 | auto | 20170303 |
| 4 | 111 | req | 444 | req | 20170301 |
+---------------+---------------------+----------------+--------------------+---------------+----------+
我们还需要在同一输出集中输入第二天的计数(票)。
+---------+-------------+-----------------+------+-----------------+------+-----+
| cnt_day | cnt_nxt_day | collection_time | type | collection_time | type | day |
+---------+-------------+-----------------+------+-----------------+------+-----+
| 4 | 3| 111 | req | 444 | req | 20170301 |
| 3 | 1| 123 | auto | 345 | req | 20170302 |
| 1 | - | 678 | auto | 678 | auto | 20170303 |
+---------+-------------+-----------------+------+-----------------+------+-----+
提前谢谢你:)
select cnt
,cnt_next_day
,min_val.collection_time
,min_val.type
,max_val.collection_time
,max_val.type
,day
from (select count(*) as cnt
,lead (count(*)) over (order by day) as cnt_next_day
,min(named_struct('collection_time',collection_time,'type',type)) as min_val
,max(named_struct('collection_time',collection_time,'type',type)) as max_val
,day
from mytable
group by day
) t
+-----+--------------+-----------------+------+-----------------+------+----------+
| cnt | cnt_next_day | collection_time | type | collection_time | type | day |
+-----+--------------+-----------------+------+-----------------+------+----------+
| 4 | 3 | 111 | req | 444 | req | 20170301 |
| 3 | 1 | 123 | auto | 345 | req | 20170302 |
| 1 | NULL | 678 | auto | 678 | auto | 20170303 |
+-----+--------------+-----------------+------+-----------------+------+----------+