蜂巢,SELECT 联盟

HIVE, SELECT UNION

我从大数据开始,两个 SELECT ... table_name GROUP BY 在配置单元中的联合会是什么?

+--------+------------------+---------+
| rating |    date_upd      | version |
+--------+------------------+---------+
| 3      | 2021-07-01 12:13 | 2.1.9   |
| 5      | 2021-07-01 10:39 | 2.2.6   |
| 4      | 2021-07-02 10:24 | 2.2.7   |
| 5      | 2021-07-02 05:37 | 3.2.4   |
| 1      | 2021-07-02 21:40 | 3.2.5   |


SELECT substr('date_upd',1,10) as 'day',
       count(*) cnt 
FROM tbl_one 
GROUP BY
       substr(date_upd,1,10);


SELECT substr('date_upd',1,7) as 'month',
       count(*) cnt 
FROM table_name 
GROUP BY
      substr('date_upd',1,7);

如果您的 date_upd 是时间戳,您可以使用 extract 来获取日、月、年。

SELECT  extract(day from current_timestamp())  ;
SELECT  extract(month from current_timestamp());  
SELECT  extract(year from current_timestamp()) ; 

如果您的 date_upd 是一个字符串,那么您可以使用您正在使用的逻辑。但我认为你的 substr 是错误的。

SELECT  extract(day from from_unixtime(unix_timestamp('01/01/2021','MM/dd/yyyy'))) day_part;
...

如果您想在同一查询中同时获得每月计数和每日计数,请使用分析计数 count(*) over(...)

演示:

with your_data as ( --Your data example, use real table instead of this CTE 
select stack(5, --number of tuples to produce
3, '2021-07-01 12:13', '2.1.9',
5,'2021-07-01 10:39','2.2.6',
4,'2021-07-02 10:24','2.2.7',
5,'2021-07-02 05:37','3.2.4',
1,'2021-07-02 21:40','3.2.5'
) as (rating, date_upd, version)
)

select  rating, date_upd, version,
        substr(date_upd,1,10) as dt,
        substr(date_upd,1,7) as mnth,  
        count(*) over(partition by substr(date_upd,1,10)) as day_cnt ,
        count(*) over(partition by substr(date_upd,1,7)) as mnth_cnt
   from your_data --Use your table instead of CTE

结果:

rating  date_upd           version  dt          mnth     day_cnt    mnth_cnt    
4       2021-07-02 10:24   2.2.7    2021-07-02  2021-07  3          5
5       2021-07-02 05:37   3.2.4    2021-07-02  2021-07  3          5
1       2021-07-02 21:40   3.2.5    2021-07-02  2021-07  3          5
3       2021-07-01 12:13   2.1.9    2021-07-01  2021-07  2          5
5       2021-07-01 10:39   2.2.6    2021-07-01  2021-07  2          5

只需将 your_data CTE 替换为真实的 table 名称即可。