SQLDF 在 R 中合并:计算两个日期之间的 NA 值

SQLDF merge in R: counting NA values between two dates

我正在尝试在 R 中使用 SQLDF 合并两个数据集。我正在计算 dat2 中两个日期之间的值的 dat1 的平均值。我想计算 dat2 中这两个日期之间的 NA 值的数量。

dat3= sqldf("select a.ID, avg(b.mean_pm25) as avg_pm
                from dat1 a
                left join dat2 b
                on a.ZIP=b.ZIP and (b.pm_date between a.startdate and a.enddate)
               group by a.ID")

使用内置的数据框BOD添加一些NA来提供测试数据,然后统计它们:

library(sqldf)
BOD$Time[3:4] <- NA  # test data

sqldf("select sum(Time is null) as no_of_na from BOD")
##   no_of_na
## 1        2

只需添加条件聚合即可查询。下面显示了带有和不带有 CASE 语句的两个版本。

select d1.ID
       , avg(d2.mean_pm25) as avg_pm
       , sum(d2.mean_pm25 IS NULL) as count_pm_nas
       , sum(case 
                  when d2.mean_pm25 IS NULL
                  then 1
                  else 0
             end) as count_pm_nas_alternative
from dat1 d1
left join dat2 d2
     on d1.ZIP = d2.ZIP 
    and (d2.pm_date between d1.startdate and d1.enddate)
group by d1.ID

此外,对于 SQL 查询的最佳实践,请考虑 Bad Habits to Kick : Using table aliases like (a, b, c) or (t1, t2, t3)