以多列值和日期和时间计算为条件过滤 R 数据框

Question

我有一个包含超过 50 000 行的相机陷阱检测数据的数据框，我想识别 and/or 删除一个物种在指定时间段内对另一个物种同时发生的观察相机站.

下面是我的数据框示例：

   Species    StationID    DateTime
1  Human      A            2013-05-20 10:00:00
2  Dog        A            2013-05-20 10:09:00
3  Dog        A            2013-05-21 10:40:00
4  Puma       B            2013-05-21 15:59:00
5  Dog        B            2013-05-23 10:05:00
6  Human      B            2013-05-23 10:10:00

如果我想 identify/remove 在同一摄像头站的人类检测两侧的 10 分钟内检测到所有狗，那么我希望 return 编辑以下数据：

   Species    StationID    DateTime
1  Human      A            2013-05-20 10:00:00
2  Dog        A            2013-05-21 10:40:00
3  Puma       B            2013-05-21 15:59:00
4  Human      B            2013-05-23 10:10:00

除其他外，我尝试将人类和狗的检测拆分为单独的数据帧，并根据所需的 + 或 - 10 分钟的时间容差为狗的观察结果创建上下日期时间列。然后我使用了如下所示的 fuzzy_left_join，它在 StationID 上的调节效果很好，但它没有 return 基于指定的 DateTime 操作的正确检测。

Dog_HumanDF <- fuzzy_left_join(DogDF, HumanDF, 
                                  by = c("StationID" = "StationID",
                                         "DateTimeDogLower" = "DateTimeHuman", 
                                         "DateTimeDogUpper" = "DateTimeHuman"),  
                                  match_fun = list(`==`, `<=` , `>=`))

我广泛搜索了类似的问题和解决方案，但找不到适合我目的的内容。我更喜欢不需要像 fuzzy_join 那样生成单独数据帧的解决方案。非常感谢任何帮助！

Answer 1

以下解决方案由 Barrett Wolfe 在另一个平台上提供，效果完美！我希望这对其他用户也有用。

no_bad_dogs <- function(df){
output <- list()
for(i in 1:length(unique(df$StationID))){
stat_df <- df[df$StationID==unique(df$StationID)[i],]
if("Human" %in% stat_df$Species & "Dog" %in% stat_df$Species){
HumanDF <- stat_df[stat_df$Species == "Human",]
dog_index <- which(stat_df$Species=="Dog")
DogDF <- stat_df[dog_index,]
bad_in_dog_index <- sapply(DogDF$DateTime, FUN = function(x, human_times){return(any( x >= human_times-600 & x <= human_times+600))},human_times = HumanDF$DateTime)
if(any(bad_in_dog_index)){
output[[i]] <- stat_df[-dog_index[bad_in_dog_index],]} else {output[[i]] <- stat_df }
} else { output[[i]] <- stat_df }
}
do.call("rbind", output)
}

以多列值和日期和时间计算为条件过滤 R 数据框

Filter R data frame conditioned on multiple column values and date & time calculation

camera

datetime

r

dataframe