Lubridate:语句 returns TRUE/FALSE 本身,但在函数中会导致 "missing value where TRUE/FALSE needed" 错误

Lubridate: Statement returns TRUE/FALSE on its own, but incurs "missing value where TRUE/FALSE needed" error when in function

首先我要说的是,我完全知道之前已经回答过类似的问题,但经过数小时的阅读和故障排除,我相信我遇到了一个独特的问题。如果我错过了什么,我深表歉意。在投票率很高的类似问题中给出的答案指向数据中的 NA,但正如我的问题中所解释的,我似乎没有,也不知道它们可能会出现在哪里。

我正在 运行使用 lubridate、readr 和 dplyr 包在 R 4.1.2 中使用 for 循环,试图将个人在通过可靠性测试之前获取的数据标记为无效数据。测试对特定群体来说是独一无二的,因此一个人对于一个群体、许多群体、所有群体或 none 可能是可靠的。我编写的函数旨在获取一个数据框“x”,并针对每个单独的观察者检查数据点是否对具有一列观察者(观察者)的数据框“键”有效,测试通过日期(begin_valid),以及它们现在有效的组 (group_valid)。如果他们通过了多次测试,则每个观察者的键可能有多行。我使用 Lubridate 包中的工具为日期创建 POSIXct 值,这些值可以进行算术运算并相互比较。用户如果想删除无效数据可以设置y = "remove",或者如果他们想标记并保留无效数据则可以离开。这是代码:

invalidata <- function(x, y){
  library(lubridate)
  library(readr)
  library(dplyr)
  x$valid <- rep(1, length(rownames(x)))
  alts <- 0
  key <- read_csv("updatable csv file")
  key$begin_valid <- parse_date_time(key$begin_valid, c("mdy", "dmy", "ydm", "mdy"), tz= "Africa/Lubumbashi")
  for(i in unique(x$observer)){
    subkey <- subset(key, key$observer == i)
    subx <- subset(x, x$observer == i)
    if(is.na(subkey$begin_valid) == TRUE || is.na(subkey$group_valid) == TRUE){ #if reliable for nothing, remove
      x[x$observer == i]$valid <- 0
      print("removed completely unreliable")
    }else{
      for(j in rownames(subx)){
        if(subx$group[j] %in% subkey$group_valid == FALSE && "All" %in% subkey$group_valid == FALSE){ #if not reliable for specific group or all groups, remove
          x$valid[j] <- 0
          print("removed unreliable for group")
        } 
        if(subx$group[j] %in% subkey$group_valid){ #remove if before reliability date for group
          if(subx$date[j] < subset(subkey, subkey$group_valid == subx$group[j])$begin_valid){
            x$valid[j] <- 0
            print("removed pre-reliability")
          }
        } else{ #remove if not reliable for specific group, and before reliability date for all
          if(subx$date[j] < subset(subkey, subkey$group_valid == "All")$begin_valid){
            x$valid[j] <- 0
            print("removed pre-reliability")
          }
        }
      }
    }
  }
  if(y == "remove"){ #remove all invalid data and validity column
    x <- subset(x, x$valid == 1)
    x <- select(x, -valid)
  }
  return(x)}

我的问题是线路

if(subx$date[j] < subset(subkey, subkey$group_valid == "All")$begin_valid)

哪个returns错误:

Error in if (subx$date[j] < subset(subkey, subkey$group_valid == >"All")$begin_valid) { : missing value where TRUE/FALSE needed

但是,当我运行括号里面的代码

subx$date[j] < subset(subkey, subkey$group_valid == "All")$begin_valid

在循环上下文之外,我收到相关的 TRUE 或 FALSE 值。我已经检查了所有日期的任何 NULL 或 NA 值,并在代码的前一步中处理了所有带有 NA 的数据:

if(is.na(subkey$begin_valid) == TRUE || is.na(subkey$group_valid) == TRUE){}
else{ #code at issue }

这条非常相似的线路没有问题:

if(subx$date[j] < subset(subkey, subkey$group_valid == subx$group[j])$begin_valid){

我最好的猜测是日期格式可能出了问题?我知道这个错误通常是 NULL 或 NA 漂浮在数据中的症状,但对于我来说,我无法弄清楚它们可能来自哪里。 “x”中的日期已经被解析并且不包含 NA 或 NULL。我没有包括数据,因为它是专有的,但如果人们 interested/think 有必要,我可以提供模拟数据。预先感谢您通读并提出任何 thoughts/troubleshooting 建议!

MRE:

x 的 dput 输出:

structure(list(date = structure(c(1486764000, 1486764000, 1486850400, 
1486936800, 1487023200, 1487109600, 1487109600, 1487196000, 1487196000, 
1487368800, 1487368800, 1487368800, 1487368800, 1487368800, 1487368800, 
1487455200, 1487455200, 1487455200, 1487541600, 1487887200), class = c("POSIXct", 
"POSIXt"), tzone = "Africa/Lubumbashi"), time = structure(c(23734, 
53419, 41352, 33034, 24220, 34812, 35624, 27949, 27950, 49192, 
49286, 49392, 49401, 62719, 62725, 26046, 26047, 27246, 46611, 
61228), class = c("hms", "difftime"), units = "secs"), observer = c("MA", 
"LE", "VI", "VI", "MI", "MA", "MA", "ME", "VI", "BA", "MA", "BA", 
"MA", "ME", "MI", "MA", "BA", "MI", "BA", "MA"), group = c("EKK", 
"EKK", "KKL", "EKK", "KKL", "KKL", "KKL", "EKK", "EKK", "EKK", 
"EKK", "EKK", "EKK", "KKL", "KKL", "EKK", "EKK", "KKL", "EKK", 
"KKL")), row.names = c(NA, -20L), spec = structure(list(cols = list(
    date = structure(list(), class = c("collector_character", 
    "collector")), time = structure(list(format = ""), class = c("collector_time", 
    "collector")), observer = structure(list(), class = c("collector_character", 
    "collector")), group = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x000001f6f2f7af70>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

对于密钥:

structure(list(observer = c("BA", "MI", "VI", "ME", "DA", "OK", 
"FR", "MA", "LA", "DE", "JD", "JD", "JD", "BR", "DA", "DA", "PA", 
"PA", "JA", "JE", "DI", "JP", "LE", "MR", "NG", "TR", "TE"), 
    begin_valid = c("8/12/2016", "12/21/2019", "8/11/2016", "8/11/2016", 
    "12/11/2019", "12/17/2019", "12/11/2019", "11/2/2016", "1/11/2020", 
    "12/12/2019", "12/16/2019", "12/16/2019", "11/22/2020", "6/19/2021", 
    "11/26/2020", "11/26/2020", "7/25/2021", "7/25/2021", NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), group_valid = c("All", "All", 
    "All", "All", "All", "All", "FKK", "All", "FKK", "FKK", "EKK", 
    "KKL", "All", "EKK", "EKK", "KKL", "EKK", "KKL", NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), subgroup = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, "S", NA, NA, NA, "S", NA, "N", 
    NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -27L
), spec = structure(list(cols = list(observer = structure(list(), class = c("collector_character", 
"collector")), begin_valid = structure(list(), class = c("collector_character", 
"collector")), group_valid = structure(list(), class = c("collector_character", 
"collector")), subgroup = structure(list(), class = c("collector_character", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

此代码中有两个错误:

  • 因为rownames(.)returns个字符串,不能用subx$group[j]。两个选项:

    1. 首选。使用 for (j in seq_len(nrow(subx))),所有引用都可以正常工作。
    2. 保留 for(j in rownames(subx)),但将所有 subx$ 引用更改为类似于 subx[j,"group"]
  • x[x$observer == i]$valid是错误代码,改成x$valid[x$observer == i].

在这两项更改之后,您的代码运行没有错误,在此示例中,在控制台上打印了四次 "removed pre-reliability"

故障排除时,不能混用 subx$group[1]subx$group["1"],它们非常不同,后者(如预期)会产生 NA