如何获取 R 中特定日期前 N 天的平均温度？

Question

我有两个数据集：一个数据集 (A) 包含每天的温度，另一个数据集 (B) 包含个人 ID 和出生日期 (dob)。我需要在每个人出生前的最后 3 天获得平均温度。例如：如果个人 1 出生于 02/20/2021，我需要从 02/17/2021 到 02/19/2021 的平均温度。有没有一种方法可以在 R 中做到这一点，所以我的输出将是 ind |多布 | avg_temp。这是一个示例数据（在我的真实情况下，我的数据有很多天数和个人）：

> temp <- c(26,27,28,30,32,27,28,29)
> date <- as.Date(c('02-15-2021', '02-16-2021', '02-17-2021', '02-18-2021', '02-19-2021', '02-20-2021', '02-21-2021',
+ '02-22-2021'), "%m-%d-%Y")
> A <- data.frame(date, temp)
> id <- c(1,2,3,4,5,6,7,8,9,10)
> dob <- as.Date(c('02-18-2021', '02-17-2021', '02-20-2021', '02-23-2021', '02-25-2021', '02-23-2021', '02-17-2021',
+                  '02-25-2021', '02-25-2021', '02-23-2021'), "%m-%d-%Y")
> B <- data.frame(id, dob)

如果日期没有完整的 3 天，它会用可用天数（2 或 1）求平均值，如果没有可用天数，它会 return 0 作为平均值.

有人可以帮我在 R 中做这个吗？正如我上面提到的，我的数据集非常大，有大约 37,000 个 ID，温度范围从 2007 年到 2021 年。

提前致谢。

Answer 1

这是一种方法。我们不会重复计算，而是先获取出生日期的向量，然后将它们合并回去，因为几个人的出生日期相同。该函数本身非常简单。从A开始取出生日期前三天，计算平均值，然后return一个data.frame这样我们就很容易将结果合并到B .

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
myfunc <- function(x){
  three_days <- as.Date(x - ddays(3))
  
  A <- A[A$date < x & A$date >= three_days , ]
  avg_temp <- mean(A$temp)
  dat <- data.frame(dob = x, avg_temp = avg_temp)
  return(dat)
}

dobs <- unique(B$dob)

avg_temps <- lapply(dobs, myfunc)
avg_temps <- do.call(rbind, avg_temps)

B <- merge(B, avg_temps, by = "dob")

B
#>           dob id avg_temp
#> 1  2021-02-17  2     26.5
#> 2  2021-02-17  7     26.5
#> 3  2021-02-18  1     27.0
#> 4  2021-02-20  3     30.0
#> 5  2021-02-23  4     28.0
#> 6  2021-02-23  6     28.0
#> 7  2021-02-23 10     28.0
#> 8  2021-02-25  5     29.0
#> 9  2021-02-25  8     29.0
#> 10 2021-02-25  9     29.0

^{由 reprex package (v2.0.1)}

创建于 2022-02-02

如何获取 R 中特定日期前 N 天的平均温度？

How to get the average temperature for N days prior to a specific date in R?

average

r