蜂巢,SELECT 联盟
HIVE, SELECT UNION
我从大数据开始,两个 SELECT ... table_name GROUP BY 在配置单元中的联合会是什么?
+--------+------------------+---------+
| rating | date_upd | version |
+--------+------------------+---------+
| 3 | 2021-07-01 12:13 | 2.1.9 |
| 5 | 2021-07-01 10:39 | 2.2.6 |
| 4 | 2021-07-02 10:24 | 2.2.7 |
| 5 | 2021-07-02 05:37 | 3.2.4 |
| 1 | 2021-07-02 21:40 | 3.2.5 |
SELECT substr('date_upd',1,10) as 'day',
count(*) cnt
FROM tbl_one
GROUP BY
substr(date_upd,1,10);
SELECT substr('date_upd',1,7) as 'month',
count(*) cnt
FROM table_name
GROUP BY
substr('date_upd',1,7);
如果您的 date_upd
是时间戳,您可以使用 extract
来获取日、月、年。
SELECT extract(day from current_timestamp()) ;
SELECT extract(month from current_timestamp());
SELECT extract(year from current_timestamp()) ;
如果您的 date_upd
是一个字符串,那么您可以使用您正在使用的逻辑。但我认为你的 substr
是错误的。
SELECT extract(day from from_unixtime(unix_timestamp('01/01/2021','MM/dd/yyyy'))) day_part;
...
如果您想在同一查询中同时获得每月计数和每日计数,请使用分析计数 count(*) over(...)
。
演示:
with your_data as ( --Your data example, use real table instead of this CTE
select stack(5, --number of tuples to produce
3, '2021-07-01 12:13', '2.1.9',
5,'2021-07-01 10:39','2.2.6',
4,'2021-07-02 10:24','2.2.7',
5,'2021-07-02 05:37','3.2.4',
1,'2021-07-02 21:40','3.2.5'
) as (rating, date_upd, version)
)
select rating, date_upd, version,
substr(date_upd,1,10) as dt,
substr(date_upd,1,7) as mnth,
count(*) over(partition by substr(date_upd,1,10)) as day_cnt ,
count(*) over(partition by substr(date_upd,1,7)) as mnth_cnt
from your_data --Use your table instead of CTE
结果:
rating date_upd version dt mnth day_cnt mnth_cnt
4 2021-07-02 10:24 2.2.7 2021-07-02 2021-07 3 5
5 2021-07-02 05:37 3.2.4 2021-07-02 2021-07 3 5
1 2021-07-02 21:40 3.2.5 2021-07-02 2021-07 3 5
3 2021-07-01 12:13 2.1.9 2021-07-01 2021-07 2 5
5 2021-07-01 10:39 2.2.6 2021-07-01 2021-07 2 5
只需将 your_data CTE 替换为真实的 table 名称即可。
我从大数据开始,两个 SELECT ... table_name GROUP BY 在配置单元中的联合会是什么?
+--------+------------------+---------+
| rating | date_upd | version |
+--------+------------------+---------+
| 3 | 2021-07-01 12:13 | 2.1.9 |
| 5 | 2021-07-01 10:39 | 2.2.6 |
| 4 | 2021-07-02 10:24 | 2.2.7 |
| 5 | 2021-07-02 05:37 | 3.2.4 |
| 1 | 2021-07-02 21:40 | 3.2.5 |
SELECT substr('date_upd',1,10) as 'day',
count(*) cnt
FROM tbl_one
GROUP BY
substr(date_upd,1,10);
SELECT substr('date_upd',1,7) as 'month',
count(*) cnt
FROM table_name
GROUP BY
substr('date_upd',1,7);
如果您的 date_upd
是时间戳,您可以使用 extract
来获取日、月、年。
SELECT extract(day from current_timestamp()) ;
SELECT extract(month from current_timestamp());
SELECT extract(year from current_timestamp()) ;
如果您的 date_upd
是一个字符串,那么您可以使用您正在使用的逻辑。但我认为你的 substr
是错误的。
SELECT extract(day from from_unixtime(unix_timestamp('01/01/2021','MM/dd/yyyy'))) day_part;
...
如果您想在同一查询中同时获得每月计数和每日计数,请使用分析计数 count(*) over(...)
。
演示:
with your_data as ( --Your data example, use real table instead of this CTE
select stack(5, --number of tuples to produce
3, '2021-07-01 12:13', '2.1.9',
5,'2021-07-01 10:39','2.2.6',
4,'2021-07-02 10:24','2.2.7',
5,'2021-07-02 05:37','3.2.4',
1,'2021-07-02 21:40','3.2.5'
) as (rating, date_upd, version)
)
select rating, date_upd, version,
substr(date_upd,1,10) as dt,
substr(date_upd,1,7) as mnth,
count(*) over(partition by substr(date_upd,1,10)) as day_cnt ,
count(*) over(partition by substr(date_upd,1,7)) as mnth_cnt
from your_data --Use your table instead of CTE
结果:
rating date_upd version dt mnth day_cnt mnth_cnt
4 2021-07-02 10:24 2.2.7 2021-07-02 2021-07 3 5
5 2021-07-02 05:37 3.2.4 2021-07-02 2021-07 3 5
1 2021-07-02 21:40 3.2.5 2021-07-02 2021-07 3 5
3 2021-07-01 12:13 2.1.9 2021-07-01 2021-07 2 5
5 2021-07-01 10:39 2.2.6 2021-07-01 2021-07 2 5
只需将 your_data CTE 替换为真实的 table 名称即可。