SQLDF 在 R 中合并:计算两个日期之间的 NA 值
SQLDF merge in R: counting NA values between two dates
我正在尝试在 R 中使用 SQLDF 合并两个数据集。我正在计算 dat2 中两个日期之间的值的 dat1 的平均值。我想计算 dat2 中这两个日期之间的 NA 值的数量。
dat3= sqldf("select a.ID, avg(b.mean_pm25) as avg_pm
from dat1 a
left join dat2 b
on a.ZIP=b.ZIP and (b.pm_date between a.startdate and a.enddate)
group by a.ID")
使用内置的数据框BOD
添加一些NA来提供测试数据,然后统计它们:
library(sqldf)
BOD$Time[3:4] <- NA # test data
sqldf("select sum(Time is null) as no_of_na from BOD")
## no_of_na
## 1 2
只需添加条件聚合即可查询。下面显示了带有和不带有 CASE
语句的两个版本。
select d1.ID
, avg(d2.mean_pm25) as avg_pm
, sum(d2.mean_pm25 IS NULL) as count_pm_nas
, sum(case
when d2.mean_pm25 IS NULL
then 1
else 0
end) as count_pm_nas_alternative
from dat1 d1
left join dat2 d2
on d1.ZIP = d2.ZIP
and (d2.pm_date between d1.startdate and d1.enddate)
group by d1.ID
此外,对于 SQL 查询的最佳实践,请考虑 Bad Habits to Kick : Using table aliases like (a, b, c) or (t1, t2, t3)。
我正在尝试在 R 中使用 SQLDF 合并两个数据集。我正在计算 dat2 中两个日期之间的值的 dat1 的平均值。我想计算 dat2 中这两个日期之间的 NA 值的数量。
dat3= sqldf("select a.ID, avg(b.mean_pm25) as avg_pm
from dat1 a
left join dat2 b
on a.ZIP=b.ZIP and (b.pm_date between a.startdate and a.enddate)
group by a.ID")
使用内置的数据框BOD
添加一些NA来提供测试数据,然后统计它们:
library(sqldf)
BOD$Time[3:4] <- NA # test data
sqldf("select sum(Time is null) as no_of_na from BOD")
## no_of_na
## 1 2
只需添加条件聚合即可查询。下面显示了带有和不带有 CASE
语句的两个版本。
select d1.ID
, avg(d2.mean_pm25) as avg_pm
, sum(d2.mean_pm25 IS NULL) as count_pm_nas
, sum(case
when d2.mean_pm25 IS NULL
then 1
else 0
end) as count_pm_nas_alternative
from dat1 d1
left join dat2 d2
on d1.ZIP = d2.ZIP
and (d2.pm_date between d1.startdate and d1.enddate)
group by d1.ID
此外,对于 SQL 查询的最佳实践,请考虑 Bad Habits to Kick : Using table aliases like (a, b, c) or (t1, t2, t3)。