按布尔值计算不同
Count distinct by boolean value
是否有更好(更漂亮、更惯用,甚至更高效)的方式来执行以下操作?
Objective: 通过另一个布尔列计算一个列的不同值。
示例数据:
id | metadata_streaming_date | cols_exist |
--- | ----------------------- | -----------|
1 | 2022-02-20 | true |
1 | 2022-02-20 | true |
2 | 2022-02-20 | true |
2 | 2022-02-20 | true |
3 | 2022-02-20 | false |
1 | 2022-02-19 | true |
2 | 2022-02-19 | false |
3 | 2022-02-19 | false |
4 | 2022-02-19 | false |
4 | 2022-02-19 | false |
预期结果是 count distinct id
按 metadata_streaming_date
分组,按所需 (where cols_exist = false
) 和总体(每个日期此 ID 的所有行)拆分。
预期结果 table:
| metadata_streaming_date | wanted | overall |
| ----------------------- | -------| --------|
| 2022-02-20 | 1 | 3 |
| 2022-02-19 | 3 | 4 |
我可以通过两个子查询和内部连接实现它然后 metadata_streaming_date
:
select
t1.metadata_streaming_date,
overall,
wanted,
wanted / overall as perc
from
(
select
metadata_streaming_date,
count(distinct id) as overall
from
non_needed_fields_view
where
metadata_streaming_date >= '2022-02-19'
group by
metadata_streaming_date
) as t1
inner join (
select
metadata_streaming_date,
count(distinct id) as wanted
from
non_needed_fields_view
where
cols_exist is false
and metadata_streaming_date >= '2022-02-19'
group by
metadata_streaming_date
) as t2 on t1.metadata_streaming_date = t2.metadata_streaming_date
你可以尝试用DISTINCT
的聚合条件函数,让你的逻辑在CASE WHEN
表达式中。
SELECT metadata_streaming_date,
COUNT(DISTINCT CASE WHEN cols_exist = false THEN id END) wanted ,
COUNT(DISTINCT id) overall
FROM non_needed_fields_view
WHERE metadata_streaming_date >= '2022-02-19'
GROUP BY metadata_streaming_date
- 聚合函数有一个很酷的 FILTER 语法,目前由一些 RDBMS / SQL 引擎支持,包括 Spark SQL、PostgreSQL & SQL网站。据我所知,它是 SQL ISO 标准的一部分。
- SQL 中日期的 ISO 语法是
DATE 'yyyy-MM-dd'
select metadata_streaming_date
,count(distinct id) filter (where cols_exist = false) as wanted
,count(distinct id) as overall
from non_needed_fields_view
where metadata_streaming_date >= date '2022-02-19'
group by metadata_streaming_date
+-----------------------+------+-------+
|metadata_streaming_date|wanted|overall|
+-----------------------+------+-------+
| 2022-02-19| 3| 4|
| 2022-02-20| 1| 3|
+-----------------------+------+-------+
是否有更好(更漂亮、更惯用,甚至更高效)的方式来执行以下操作?
Objective: 通过另一个布尔列计算一个列的不同值。
示例数据:
id | metadata_streaming_date | cols_exist |
--- | ----------------------- | -----------|
1 | 2022-02-20 | true |
1 | 2022-02-20 | true |
2 | 2022-02-20 | true |
2 | 2022-02-20 | true |
3 | 2022-02-20 | false |
1 | 2022-02-19 | true |
2 | 2022-02-19 | false |
3 | 2022-02-19 | false |
4 | 2022-02-19 | false |
4 | 2022-02-19 | false |
预期结果是 count distinct id
按 metadata_streaming_date
分组,按所需 (where cols_exist = false
) 和总体(每个日期此 ID 的所有行)拆分。
预期结果 table:
| metadata_streaming_date | wanted | overall |
| ----------------------- | -------| --------|
| 2022-02-20 | 1 | 3 |
| 2022-02-19 | 3 | 4 |
我可以通过两个子查询和内部连接实现它然后 metadata_streaming_date
:
select
t1.metadata_streaming_date,
overall,
wanted,
wanted / overall as perc
from
(
select
metadata_streaming_date,
count(distinct id) as overall
from
non_needed_fields_view
where
metadata_streaming_date >= '2022-02-19'
group by
metadata_streaming_date
) as t1
inner join (
select
metadata_streaming_date,
count(distinct id) as wanted
from
non_needed_fields_view
where
cols_exist is false
and metadata_streaming_date >= '2022-02-19'
group by
metadata_streaming_date
) as t2 on t1.metadata_streaming_date = t2.metadata_streaming_date
你可以尝试用DISTINCT
的聚合条件函数,让你的逻辑在CASE WHEN
表达式中。
SELECT metadata_streaming_date,
COUNT(DISTINCT CASE WHEN cols_exist = false THEN id END) wanted ,
COUNT(DISTINCT id) overall
FROM non_needed_fields_view
WHERE metadata_streaming_date >= '2022-02-19'
GROUP BY metadata_streaming_date
- 聚合函数有一个很酷的 FILTER 语法,目前由一些 RDBMS / SQL 引擎支持,包括 Spark SQL、PostgreSQL & SQL网站。据我所知,它是 SQL ISO 标准的一部分。
- SQL 中日期的 ISO 语法是
DATE 'yyyy-MM-dd'
select metadata_streaming_date
,count(distinct id) filter (where cols_exist = false) as wanted
,count(distinct id) as overall
from non_needed_fields_view
where metadata_streaming_date >= date '2022-02-19'
group by metadata_streaming_date
+-----------------------+------+-------+
|metadata_streaming_date|wanted|overall|
+-----------------------+------+-------+
| 2022-02-19| 3| 4|
| 2022-02-20| 1| 3|
+-----------------------+------+-------+