如何确定给出错误的数据点？

Question

我在 R 中有一个代码，一次读取一行，通过 data.frame 如果满足一组特定条件，则更改 [=28 中的一个变量的值=].在伪代码中：

for(i in 1:nrow(data)) {

 if (conditions on data[i,]) { change value } else {do nothing}

}

当代码为运行时，它会在某一点停止并抛出以下错误消息：Error in if (condition : missing value where TRUE/FALSE needed

我理解错误消息意味着，在某个时刻，当计算 if 语句中的条件时，结果是 Na 而不是 TRUE 或 FALSE.

但是，当我通过使用 i 的值（即 R 中的 "stored" 来尝试 R 中的条件时（我假设它是抛出错误的数据集的行) 我得到 TRUE 的答案。我是否正确理解 i 的值允许我识别数据框的哪一行抛出错误？如果不是，我是否应该寻找其他方法来确定数据集的哪一行导致了错误？

Answer 1

我认为答案是"yes"

 print(i) ## Error: doesn't exist yet
 for (i in 1:10) {
     if (i==4) stop("simulated error")
 }
 print(i)  ## 4

try() 函数也很有用。这里我们做了一个模拟错误的函数f，然后使用try()这样我们就可以运行一直循环下去。我们不会在遇到错误时停止，而是填写一个代表错误代码的值（在本例中为 10000）。（我们也可以让错误行为成为空操作，即继续循环的下一次迭代；在这种情况下，将在错误位置留下 NA。）

 f <- function(x) {
     if (x==4) stop("simulated error")
     return(x)
 }
 results <- rep(NA,10)
 for (i in 1:10) {
     res <- try(f(i))
     if (is(res,"try-error")) {
        results[i] <- 10000
     } else {
        results[i] <- res
    }
 }

Answer 2

只要您的 for 循环不在函数内部，i 就会等于它在错误发生之前命中的最终值。因此在你的错误之后：

 data[i, ]

应该给你病理排。

如果您运行在一个函数中，由于作用域规则，我应该和函数一起死。在那种情况下，我会修改您的代码以打印出每一行（或 i）直到它消失：

 for(i in 1:nrow(data)) {
   print(i) #or print(data[i, ])
   if (conditions on data[i,]) { change value } else {do nothing}

}

Answer 3

1) 替换值

用replace不是更好吗？

这里有一些例子：replace function examples

你的情况

 replace (df$column, your_condition, value)

2) 过滤

如果您确定您的数据仅包含 TRUEs/FALSEs 或 NA，您可以：

a) 特定列中带有 NA 的子集行

df[(is.na(df$column)), ]

b) 使用来自 dplyr

的 filter 过滤掉东西

library("dplyr")
filter(df, is.na(column)) # filter NAs in dplyr you don't have to use $ to specify column
filter(df, !is.na(column) & column!="FALSE") # filter everything other than NA and FALSE
filter(df, column!="TRUE" & column!="FALSE") # careful with that, won't return NAs

3) 选择行号

最后，当您需要出现 NA 的特定行号时，请使用 which

which(is.na(df$column)) # row numbers with NAs
which(df$column!="TRUE") # row numbers other than TRUEs
which(df$column!="TRUE" & df$column!="FALSE") # again, won't return NAs

如何确定给出错误的数据点？

How to determine data point that gives error?

error-handling

runtime-error

r