如何定义某些行序列中没有 NA 的列?

How to define column that doesn't have NAs in sequence of certain rows?

这是一些示例数据:

    df1 <- read.table(text = "Date  Client1 Client2 Client3
                  01.01.2019    0   0   2
                  01.02.2019    0   0   3
                  01.03.2019    0   0   4
                  01.04.2019    0   0   4
                  01.05.2019    0   0   4
                  01.06.2019    1   0   4
                  01.07.2019    0   0   0
                  01.08.2019    0   0   1
                  01.09.2019    0   0   0
                  01.10.2019    0   3   0
                  01.11.2019    0   0   2
                  01.12.2019    2   0   0
                  01.01.2020    3   4   3
                  01.02.2020    4   0   3
                  01.03.2020    5   0   0
                  01.04.2020    5   0   0
                  ", header = TRUE)
df1[df1 == 0] <- NA

问题是如何为包含不包含 NA 的 5 行或更多行的序列的每一列找到逻辑索引。

Client1 TRUE
Client2 FALSE
Client3 TRUE

我会使用 rle() 函数来计算 !is.na() 的 运行 长度。例如,使用您对 df1:

的定义
df2 <- data.frame(Name = character(3), Group = character(3))

for (i in 1:3) {
  runs <- rle(!is.na(df1[, i + 1]))
  good <- which(runs$values == TRUE)
  runs <- runs$lengths
  n <- length(runs)
  df2$Group[i] <- if (n %in% good) "Stable"
                  else if (max(runs[good]) >= 5) "Was_Stable"
                  else "Not_Stable"
  df2$Name[i] <- names(df1)[i + 1]
}

您可以使用 sapply 遍历列并使用 any 进行检查:

sapply(df1[-1], function(x) any(with(rle(!is.na(x)), values & lengths >= 5)))

# Client1 Client2 Client3 
#   TRUE   FALSE    TRUE 

类似于,那么你可以这样使用rle

# add a column with no NAs as an example
df1 <- cbind(df1, dummy = 1:NROW(df1)) 

# find columns with five or more NAs in a row
is_num <- vapply(df1, is.numeric, TRUE) # assume we only look at numerics?
res <- setNames(rep(TRUE, NCOL(df1)), colnames(df1))
res[is_num] <- vapply(df1[is_num], function(x){
  o <- rle(!is.na(x))
  any(o$lengths[o$values] > 4)
}, TRUE)
res
#R> Date Client1 Client2 Client3   dummy 
#R> TRUE    TRUE   FALSE    TRUE    TRUE

我希望这会很快。如果你不 关心其他列,那么你可以这样做:

is_num <- vapply(df1, is.numeric, TRUE)
vapply(df1[is_num], function(x){
  o <- rle(!is.na(x))
  any(o$lengths[o$values] > 4)
}, TRUE)
#R> Client1 Client2 Client3 
#R>    TRUE   FALSE    TRUE

事后我意识到这只是 的一个小改动。将他的方法与我的解决方案相结合会产生以下结果:

vapply(df1[vapply(df1, is.numeric, TRUE)], function(x)
  with(rle(!is.na(x)), any(lengths[values] > 4)), TRUE)
#R> Client1 Client2 Client3 
#R>    TRUE   FALSE    TRUE