如何定义某些行序列中没有 NA 的列?
How to define column that doesn't have NAs in sequence of certain rows?
这是一些示例数据:
df1 <- read.table(text = "Date Client1 Client2 Client3
01.01.2019 0 0 2
01.02.2019 0 0 3
01.03.2019 0 0 4
01.04.2019 0 0 4
01.05.2019 0 0 4
01.06.2019 1 0 4
01.07.2019 0 0 0
01.08.2019 0 0 1
01.09.2019 0 0 0
01.10.2019 0 3 0
01.11.2019 0 0 2
01.12.2019 2 0 0
01.01.2020 3 4 3
01.02.2020 4 0 3
01.03.2020 5 0 0
01.04.2020 5 0 0
", header = TRUE)
df1[df1 == 0] <- NA
问题是如何为包含不包含 NA 的 5 行或更多行的序列的每一列找到逻辑索引。
Client1 TRUE
Client2 FALSE
Client3 TRUE
我会使用 rle()
函数来计算 !is.na()
的 运行 长度。例如,使用您对 df1
:
的定义
df2 <- data.frame(Name = character(3), Group = character(3))
for (i in 1:3) {
runs <- rle(!is.na(df1[, i + 1]))
good <- which(runs$values == TRUE)
runs <- runs$lengths
n <- length(runs)
df2$Group[i] <- if (n %in% good) "Stable"
else if (max(runs[good]) >= 5) "Was_Stable"
else "Not_Stable"
df2$Name[i] <- names(df1)[i + 1]
}
您可以使用 sapply
遍历列并使用 any
进行检查:
sapply(df1[-1], function(x) any(with(rle(!is.na(x)), values & lengths >= 5)))
# Client1 Client2 Client3
# TRUE FALSE TRUE
类似于,那么你可以这样使用rle
:
# add a column with no NAs as an example
df1 <- cbind(df1, dummy = 1:NROW(df1))
# find columns with five or more NAs in a row
is_num <- vapply(df1, is.numeric, TRUE) # assume we only look at numerics?
res <- setNames(rep(TRUE, NCOL(df1)), colnames(df1))
res[is_num] <- vapply(df1[is_num], function(x){
o <- rle(!is.na(x))
any(o$lengths[o$values] > 4)
}, TRUE)
res
#R> Date Client1 Client2 Client3 dummy
#R> TRUE TRUE FALSE TRUE TRUE
我希望这会很快。如果你不
关心其他列,那么你可以这样做:
is_num <- vapply(df1, is.numeric, TRUE)
vapply(df1[is_num], function(x){
o <- rle(!is.na(x))
any(o$lengths[o$values] > 4)
}, TRUE)
#R> Client1 Client2 Client3
#R> TRUE FALSE TRUE
事后我意识到这只是 的一个小改动。将他的方法与我的解决方案相结合会产生以下结果:
vapply(df1[vapply(df1, is.numeric, TRUE)], function(x)
with(rle(!is.na(x)), any(lengths[values] > 4)), TRUE)
#R> Client1 Client2 Client3
#R> TRUE FALSE TRUE
这是一些示例数据:
df1 <- read.table(text = "Date Client1 Client2 Client3
01.01.2019 0 0 2
01.02.2019 0 0 3
01.03.2019 0 0 4
01.04.2019 0 0 4
01.05.2019 0 0 4
01.06.2019 1 0 4
01.07.2019 0 0 0
01.08.2019 0 0 1
01.09.2019 0 0 0
01.10.2019 0 3 0
01.11.2019 0 0 2
01.12.2019 2 0 0
01.01.2020 3 4 3
01.02.2020 4 0 3
01.03.2020 5 0 0
01.04.2020 5 0 0
", header = TRUE)
df1[df1 == 0] <- NA
问题是如何为包含不包含 NA 的 5 行或更多行的序列的每一列找到逻辑索引。
Client1 TRUE
Client2 FALSE
Client3 TRUE
我会使用 rle()
函数来计算 !is.na()
的 运行 长度。例如,使用您对 df1
:
df2 <- data.frame(Name = character(3), Group = character(3))
for (i in 1:3) {
runs <- rle(!is.na(df1[, i + 1]))
good <- which(runs$values == TRUE)
runs <- runs$lengths
n <- length(runs)
df2$Group[i] <- if (n %in% good) "Stable"
else if (max(runs[good]) >= 5) "Was_Stable"
else "Not_Stable"
df2$Name[i] <- names(df1)[i + 1]
}
您可以使用 sapply
遍历列并使用 any
进行检查:
sapply(df1[-1], function(x) any(with(rle(!is.na(x)), values & lengths >= 5)))
# Client1 Client2 Client3
# TRUE FALSE TRUE
类似于rle
:
# add a column with no NAs as an example
df1 <- cbind(df1, dummy = 1:NROW(df1))
# find columns with five or more NAs in a row
is_num <- vapply(df1, is.numeric, TRUE) # assume we only look at numerics?
res <- setNames(rep(TRUE, NCOL(df1)), colnames(df1))
res[is_num] <- vapply(df1[is_num], function(x){
o <- rle(!is.na(x))
any(o$lengths[o$values] > 4)
}, TRUE)
res
#R> Date Client1 Client2 Client3 dummy
#R> TRUE TRUE FALSE TRUE TRUE
我希望这会很快。如果你不 关心其他列,那么你可以这样做:
is_num <- vapply(df1, is.numeric, TRUE)
vapply(df1[is_num], function(x){
o <- rle(!is.na(x))
any(o$lengths[o$values] > 4)
}, TRUE)
#R> Client1 Client2 Client3
#R> TRUE FALSE TRUE
事后我意识到这只是
vapply(df1[vapply(df1, is.numeric, TRUE)], function(x)
with(rle(!is.na(x)), any(lengths[values] > 4)), TRUE)
#R> Client1 Client2 Client3
#R> TRUE FALSE TRUE