Lubridate:语句 returns TRUE/FALSE 本身,但在函数中会导致 "missing value where TRUE/FALSE needed" 错误
Lubridate: Statement returns TRUE/FALSE on its own, but incurs "missing value where TRUE/FALSE needed" error when in function
首先我要说的是,我完全知道之前已经回答过类似的问题,但经过数小时的阅读和故障排除,我相信我遇到了一个独特的问题。如果我错过了什么,我深表歉意。在投票率很高的类似问题中给出的答案指向数据中的 NA,但正如我的问题中所解释的,我似乎没有,也不知道它们可能会出现在哪里。
我正在 运行使用 lubridate、readr 和 dplyr 包在 R 4.1.2 中使用 for 循环,试图将个人在通过可靠性测试之前获取的数据标记为无效数据。测试对特定群体来说是独一无二的,因此一个人对于一个群体、许多群体、所有群体或 none 可能是可靠的。我编写的函数旨在获取一个数据框“x”,并针对每个单独的观察者检查数据点是否对具有一列观察者(观察者)的数据框“键”有效,测试通过日期(begin_valid),以及它们现在有效的组 (group_valid)。如果他们通过了多次测试,则每个观察者的键可能有多行。我使用 Lubridate 包中的工具为日期创建 POSIXct 值,这些值可以进行算术运算并相互比较。用户如果想删除无效数据可以设置y = "remove",或者如果他们想标记并保留无效数据则可以离开。这是代码:
invalidata <- function(x, y){
library(lubridate)
library(readr)
library(dplyr)
x$valid <- rep(1, length(rownames(x)))
alts <- 0
key <- read_csv("updatable csv file")
key$begin_valid <- parse_date_time(key$begin_valid, c("mdy", "dmy", "ydm", "mdy"), tz= "Africa/Lubumbashi")
for(i in unique(x$observer)){
subkey <- subset(key, key$observer == i)
subx <- subset(x, x$observer == i)
if(is.na(subkey$begin_valid) == TRUE || is.na(subkey$group_valid) == TRUE){ #if reliable for nothing, remove
x[x$observer == i]$valid <- 0
print("removed completely unreliable")
}else{
for(j in rownames(subx)){
if(subx$group[j] %in% subkey$group_valid == FALSE && "All" %in% subkey$group_valid == FALSE){ #if not reliable for specific group or all groups, remove
x$valid[j] <- 0
print("removed unreliable for group")
}
if(subx$group[j] %in% subkey$group_valid){ #remove if before reliability date for group
if(subx$date[j] < subset(subkey, subkey$group_valid == subx$group[j])$begin_valid){
x$valid[j] <- 0
print("removed pre-reliability")
}
} else{ #remove if not reliable for specific group, and before reliability date for all
if(subx$date[j] < subset(subkey, subkey$group_valid == "All")$begin_valid){
x$valid[j] <- 0
print("removed pre-reliability")
}
}
}
}
}
if(y == "remove"){ #remove all invalid data and validity column
x <- subset(x, x$valid == 1)
x <- select(x, -valid)
}
return(x)}
我的问题是线路
if(subx$date[j] < subset(subkey, subkey$group_valid == "All")$begin_valid)
哪个returns错误:
Error in if (subx$date[j] < subset(subkey, subkey$group_valid == >"All")$begin_valid) { :
missing value where TRUE/FALSE needed
但是,当我运行括号里面的代码
subx$date[j] < subset(subkey, subkey$group_valid == "All")$begin_valid
在循环上下文之外,我收到相关的 TRUE 或 FALSE 值。我已经检查了所有日期的任何 NULL 或 NA 值,并在代码的前一步中处理了所有带有 NA 的数据:
if(is.na(subkey$begin_valid) == TRUE || is.na(subkey$group_valid) == TRUE){}
else{ #code at issue }
这条非常相似的线路没有问题:
if(subx$date[j] < subset(subkey, subkey$group_valid == subx$group[j])$begin_valid){
我最好的猜测是日期格式可能出了问题?我知道这个错误通常是 NULL 或 NA 漂浮在数据中的症状,但对于我来说,我无法弄清楚它们可能来自哪里。 “x”中的日期已经被解析并且不包含 NA 或 NULL。我没有包括数据,因为它是专有的,但如果人们 interested/think 有必要,我可以提供模拟数据。预先感谢您通读并提出任何 thoughts/troubleshooting 建议!
MRE:
x 的 dput 输出:
structure(list(date = structure(c(1486764000, 1486764000, 1486850400,
1486936800, 1487023200, 1487109600, 1487109600, 1487196000, 1487196000,
1487368800, 1487368800, 1487368800, 1487368800, 1487368800, 1487368800,
1487455200, 1487455200, 1487455200, 1487541600, 1487887200), class = c("POSIXct",
"POSIXt"), tzone = "Africa/Lubumbashi"), time = structure(c(23734,
53419, 41352, 33034, 24220, 34812, 35624, 27949, 27950, 49192,
49286, 49392, 49401, 62719, 62725, 26046, 26047, 27246, 46611,
61228), class = c("hms", "difftime"), units = "secs"), observer = c("MA",
"LE", "VI", "VI", "MI", "MA", "MA", "ME", "VI", "BA", "MA", "BA",
"MA", "ME", "MI", "MA", "BA", "MI", "BA", "MA"), group = c("EKK",
"EKK", "KKL", "EKK", "KKL", "KKL", "KKL", "EKK", "EKK", "EKK",
"EKK", "EKK", "EKK", "KKL", "KKL", "EKK", "EKK", "KKL", "EKK",
"KKL")), row.names = c(NA, -20L), spec = structure(list(cols = list(
date = structure(list(), class = c("collector_character",
"collector")), time = structure(list(format = ""), class = c("collector_time",
"collector")), observer = structure(list(), class = c("collector_character",
"collector")), group = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x000001f6f2f7af70>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
对于密钥:
structure(list(observer = c("BA", "MI", "VI", "ME", "DA", "OK",
"FR", "MA", "LA", "DE", "JD", "JD", "JD", "BR", "DA", "DA", "PA",
"PA", "JA", "JE", "DI", "JP", "LE", "MR", "NG", "TR", "TE"),
begin_valid = c("8/12/2016", "12/21/2019", "8/11/2016", "8/11/2016",
"12/11/2019", "12/17/2019", "12/11/2019", "11/2/2016", "1/11/2020",
"12/12/2019", "12/16/2019", "12/16/2019", "11/22/2020", "6/19/2021",
"11/26/2020", "11/26/2020", "7/25/2021", "7/25/2021", NA,
NA, NA, NA, NA, NA, NA, NA, NA), group_valid = c("All", "All",
"All", "All", "All", "All", "FKK", "All", "FKK", "FKK", "EKK",
"KKL", "All", "EKK", "EKK", "KKL", "EKK", "KKL", NA, NA,
NA, NA, NA, NA, NA, NA, NA), subgroup = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, "S", NA, NA, NA, "S", NA, "N",
NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -27L
), spec = structure(list(cols = list(observer = structure(list(), class = c("collector_character",
"collector")), begin_valid = structure(list(), class = c("collector_character",
"collector")), group_valid = structure(list(), class = c("collector_character",
"collector")), subgroup = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
此代码中有两个错误:
因为rownames(.)
returns个字符串,不能用subx$group[j]
。两个选项:
- 首选。使用
for (j in seq_len(nrow(subx)))
,所有引用都可以正常工作。
- 保留
for(j in rownames(subx))
,但将所有 subx$
引用更改为类似于 subx[j,"group"]
。
x[x$observer == i]$valid
是错误代码,改成x$valid[x$observer == i]
.
在这两项更改之后,您的代码运行没有错误,在此示例中,在控制台上打印了四次 "removed pre-reliability"
。
故障排除时,不能混用 subx$group[1]
和 subx$group["1"]
,它们非常不同,后者(如预期)会产生 NA
。
首先我要说的是,我完全知道之前已经回答过类似的问题,但经过数小时的阅读和故障排除,我相信我遇到了一个独特的问题。如果我错过了什么,我深表歉意。在投票率很高的类似问题中给出的答案指向数据中的 NA,但正如我的问题中所解释的,我似乎没有,也不知道它们可能会出现在哪里。
我正在 运行使用 lubridate、readr 和 dplyr 包在 R 4.1.2 中使用 for 循环,试图将个人在通过可靠性测试之前获取的数据标记为无效数据。测试对特定群体来说是独一无二的,因此一个人对于一个群体、许多群体、所有群体或 none 可能是可靠的。我编写的函数旨在获取一个数据框“x”,并针对每个单独的观察者检查数据点是否对具有一列观察者(观察者)的数据框“键”有效,测试通过日期(begin_valid),以及它们现在有效的组 (group_valid)。如果他们通过了多次测试,则每个观察者的键可能有多行。我使用 Lubridate 包中的工具为日期创建 POSIXct 值,这些值可以进行算术运算并相互比较。用户如果想删除无效数据可以设置y = "remove",或者如果他们想标记并保留无效数据则可以离开。这是代码:
invalidata <- function(x, y){
library(lubridate)
library(readr)
library(dplyr)
x$valid <- rep(1, length(rownames(x)))
alts <- 0
key <- read_csv("updatable csv file")
key$begin_valid <- parse_date_time(key$begin_valid, c("mdy", "dmy", "ydm", "mdy"), tz= "Africa/Lubumbashi")
for(i in unique(x$observer)){
subkey <- subset(key, key$observer == i)
subx <- subset(x, x$observer == i)
if(is.na(subkey$begin_valid) == TRUE || is.na(subkey$group_valid) == TRUE){ #if reliable for nothing, remove
x[x$observer == i]$valid <- 0
print("removed completely unreliable")
}else{
for(j in rownames(subx)){
if(subx$group[j] %in% subkey$group_valid == FALSE && "All" %in% subkey$group_valid == FALSE){ #if not reliable for specific group or all groups, remove
x$valid[j] <- 0
print("removed unreliable for group")
}
if(subx$group[j] %in% subkey$group_valid){ #remove if before reliability date for group
if(subx$date[j] < subset(subkey, subkey$group_valid == subx$group[j])$begin_valid){
x$valid[j] <- 0
print("removed pre-reliability")
}
} else{ #remove if not reliable for specific group, and before reliability date for all
if(subx$date[j] < subset(subkey, subkey$group_valid == "All")$begin_valid){
x$valid[j] <- 0
print("removed pre-reliability")
}
}
}
}
}
if(y == "remove"){ #remove all invalid data and validity column
x <- subset(x, x$valid == 1)
x <- select(x, -valid)
}
return(x)}
我的问题是线路
if(subx$date[j] < subset(subkey, subkey$group_valid == "All")$begin_valid)
哪个returns错误:
Error in if (subx$date[j] < subset(subkey, subkey$group_valid == >"All")$begin_valid) { : missing value where TRUE/FALSE needed
但是,当我运行括号里面的代码
subx$date[j] < subset(subkey, subkey$group_valid == "All")$begin_valid
在循环上下文之外,我收到相关的 TRUE 或 FALSE 值。我已经检查了所有日期的任何 NULL 或 NA 值,并在代码的前一步中处理了所有带有 NA 的数据:
if(is.na(subkey$begin_valid) == TRUE || is.na(subkey$group_valid) == TRUE){}
else{ #code at issue }
这条非常相似的线路没有问题:
if(subx$date[j] < subset(subkey, subkey$group_valid == subx$group[j])$begin_valid){
我最好的猜测是日期格式可能出了问题?我知道这个错误通常是 NULL 或 NA 漂浮在数据中的症状,但对于我来说,我无法弄清楚它们可能来自哪里。 “x”中的日期已经被解析并且不包含 NA 或 NULL。我没有包括数据,因为它是专有的,但如果人们 interested/think 有必要,我可以提供模拟数据。预先感谢您通读并提出任何 thoughts/troubleshooting 建议!
MRE:
x 的 dput 输出:
structure(list(date = structure(c(1486764000, 1486764000, 1486850400,
1486936800, 1487023200, 1487109600, 1487109600, 1487196000, 1487196000,
1487368800, 1487368800, 1487368800, 1487368800, 1487368800, 1487368800,
1487455200, 1487455200, 1487455200, 1487541600, 1487887200), class = c("POSIXct",
"POSIXt"), tzone = "Africa/Lubumbashi"), time = structure(c(23734,
53419, 41352, 33034, 24220, 34812, 35624, 27949, 27950, 49192,
49286, 49392, 49401, 62719, 62725, 26046, 26047, 27246, 46611,
61228), class = c("hms", "difftime"), units = "secs"), observer = c("MA",
"LE", "VI", "VI", "MI", "MA", "MA", "ME", "VI", "BA", "MA", "BA",
"MA", "ME", "MI", "MA", "BA", "MI", "BA", "MA"), group = c("EKK",
"EKK", "KKL", "EKK", "KKL", "KKL", "KKL", "EKK", "EKK", "EKK",
"EKK", "EKK", "EKK", "KKL", "KKL", "EKK", "EKK", "KKL", "EKK",
"KKL")), row.names = c(NA, -20L), spec = structure(list(cols = list(
date = structure(list(), class = c("collector_character",
"collector")), time = structure(list(format = ""), class = c("collector_time",
"collector")), observer = structure(list(), class = c("collector_character",
"collector")), group = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x000001f6f2f7af70>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
对于密钥:
structure(list(observer = c("BA", "MI", "VI", "ME", "DA", "OK",
"FR", "MA", "LA", "DE", "JD", "JD", "JD", "BR", "DA", "DA", "PA",
"PA", "JA", "JE", "DI", "JP", "LE", "MR", "NG", "TR", "TE"),
begin_valid = c("8/12/2016", "12/21/2019", "8/11/2016", "8/11/2016",
"12/11/2019", "12/17/2019", "12/11/2019", "11/2/2016", "1/11/2020",
"12/12/2019", "12/16/2019", "12/16/2019", "11/22/2020", "6/19/2021",
"11/26/2020", "11/26/2020", "7/25/2021", "7/25/2021", NA,
NA, NA, NA, NA, NA, NA, NA, NA), group_valid = c("All", "All",
"All", "All", "All", "All", "FKK", "All", "FKK", "FKK", "EKK",
"KKL", "All", "EKK", "EKK", "KKL", "EKK", "KKL", NA, NA,
NA, NA, NA, NA, NA, NA, NA), subgroup = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, "S", NA, NA, NA, "S", NA, "N",
NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -27L
), spec = structure(list(cols = list(observer = structure(list(), class = c("collector_character",
"collector")), begin_valid = structure(list(), class = c("collector_character",
"collector")), group_valid = structure(list(), class = c("collector_character",
"collector")), subgroup = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
此代码中有两个错误:
因为
rownames(.)
returns个字符串,不能用subx$group[j]
。两个选项:- 首选。使用
for (j in seq_len(nrow(subx)))
,所有引用都可以正常工作。 - 保留
for(j in rownames(subx))
,但将所有subx$
引用更改为类似于subx[j,"group"]
。
- 首选。使用
x[x$observer == i]$valid
是错误代码,改成x$valid[x$observer == i]
.
在这两项更改之后,您的代码运行没有错误,在此示例中,在控制台上打印了四次 "removed pre-reliability"
。
故障排除时,不能混用 subx$group[1]
和 subx$group["1"]
,它们非常不同,后者(如预期)会产生 NA
。