如何在 SQL 中实现 tidyr 的 complete()?
How to implement tidyr's complete() in SQL?
使用虚拟示例,我需要完成一个包含隐式缺失值的数据集。这在 R
中使用 tidyr
的函数 complete
.
是微不足道的
library(tidyr)
df <- data.frame(Borough = c('Brooklyn', 'Brooklyn', 'Queens'),
Crime = c('Robbery', 'Homicide', 'Drug'),
Count=c(1, 2, 1))
> df
Borough Crime Count
1 Brooklyn Robbery 1
2 Brooklyn Homicide 2
3 Queens Drug 1
#Complete implicit missing values
> complete(df, Borough, Crime, fill=list(Count=0))
Borough Crime Count
1 Brooklyn Drug 0
2 Brooklyn Homicide 2
3 Brooklyn Robbery 1
4 Queens Drug 1
5 Queens Homicide 0
6 Queens Robbery 0
但是,如果真实数据非常大,并且存储在Oracle的SQL table中,那么如何使用SQL查询来完成呢?
交叉连接有犯罪的不同行政区并左连接原始 table 以获得计数为 0 的缺失行。
select b.borough,c.crime,coalesce(t.count,0) as count
from (select distinct borough from tbl) b
cross join (select distinct crime from tbl) c
left join tbl t on t.borough=b.borough and t.crime=c.crime
使用虚拟示例,我需要完成一个包含隐式缺失值的数据集。这在 R
中使用 tidyr
的函数 complete
.
library(tidyr)
df <- data.frame(Borough = c('Brooklyn', 'Brooklyn', 'Queens'),
Crime = c('Robbery', 'Homicide', 'Drug'),
Count=c(1, 2, 1))
> df
Borough Crime Count
1 Brooklyn Robbery 1
2 Brooklyn Homicide 2
3 Queens Drug 1
#Complete implicit missing values
> complete(df, Borough, Crime, fill=list(Count=0))
Borough Crime Count
1 Brooklyn Drug 0
2 Brooklyn Homicide 2
3 Brooklyn Robbery 1
4 Queens Drug 1
5 Queens Homicide 0
6 Queens Robbery 0
但是,如果真实数据非常大,并且存储在Oracle的SQL table中,那么如何使用SQL查询来完成呢?
交叉连接有犯罪的不同行政区并左连接原始 table 以获得计数为 0 的缺失行。
select b.borough,c.crime,coalesce(t.count,0) as count
from (select distinct borough from tbl) b
cross join (select distinct crime from tbl) c
left join tbl t on t.borough=b.borough and t.crime=c.crime