根据某些约束删除多行

Delete multiple rows based on some constrains

我正在使用 R 并尝试根据某些约束从数据框中删除一些行。所以,如果我得到

dat <- data.frame(Cs=c("c1","c2","c3","c4","c5","c6"),  
  R1=sample(c("Y","N"),6,replace=TRUE), R2=sample(c("Y","N"),6,replace=TRUE),
  R3=sample(c("Y","N"),6,replace=TRUE), R4=sample(c("Y","N"),6,replace=TRUE),
  R5=sample(c("Y","N"),6,replace=TRUE), R6=sample(c("Y","N"),6,replace=TRUE))

我想删除在某些给定列(例如 R1、R3、R4)中具有 "N" 的所有行。对于一个列,我找到了这个解决方案:delete row for certain constrains

d <- dat[dat[,"R1"]!="N",]

效果很好。但是如果我把多列作为

d <- dat[dat[,c("R1","R3","R4")]!="N",]

我得到了很多充满 NA 的额外行。那我哪里错了?

您可以使用

dat[rowSums(dat[, c("R1","R3","R4")] == "N") == 0, , drop=FALSE]
#  Cs R1 R2 R3 R4 R5 R6
#5 c5  Y  Y  Y  Y  Y  Y

或者,如果您不喜欢过多的打字:

dat[!rowSums(dat[c('R1','R3','R4')]=='N'),]

这会首先测试你数据的"R1"、"R3"和"R4"列的每个"cell"是否等于"N",然后计算每行 TRUE 值的总和。如果一行中没有 "N",则总和为 0 并保留。我添加了 drop=FALSE 以将结构保持为 data.frame.

OP 评论后的注释:

如果您在未指定 drop=TRUE 选项的情况下仅对 data.frame 的 1 列进行子集化,则 [.data.frame 的默认行为是强制生成 1 列 -data.frame 到一个原子向量。然后,rowSums 将不会对该生成的向量起作用。为避免这种情况,请将您的代码更改为:

dat[!rowSums(dat[,'R1', drop=FALSE]=='N'), ] 

示例数据:

set.seed(5) 
dat <- data.frame(Cs=c("c1","c2","c3","c4","c5","c6"),  
                  R1=sample(c("Y","N"),6,replace=TRUE), R2=sample(c("Y","N"),6,replace=TRUE),
                  R3=sample(c("Y","N"),6,replace=TRUE), R4=sample(c("Y","N"),6,replace=TRUE),
                  R5=sample(c("Y","N"),6,replace=TRUE), R6=sample(c("Y","N"),6,replace=TRUE))

您可以为每一行创建一个由布尔值组成的 'keep' 变量:

keep <- apply(dat[,c("R1","R3","R4")],
                  MARGIN=1,
                  FUN=function(x){all(x!='N')})
res <- dat[keep,]

> res
  Cs R1 R2 R3 R4 R5 R6
1 c1  Y  Y  Y  Y  Y  Y

数据: 使用的种子:1234

dat <- structure(list(Cs = structure(1:6, .Label = c("c1", "c2", "c3", 
"c4", "c5", "c6"), class = "factor"), R1 = structure(c(2L, 1L, 
1L, 1L, 1L, 1L), .Label = c("N", "Y"), class = "factor"), R2 = structure(c(2L, 
2L, 1L, 1L, 1L, 1L), .Label = c("N", "Y"), class = "factor"), 
    R3 = structure(c(2L, 1L, 2L, 1L, 2L, 2L), .Label = c("N", 
    "Y"), class = "factor"), R4 = structure(c(1L, 1L, 1L, 1L, 
    1L, 1L), .Label = "Y", class = "factor"), R5 = structure(c(2L, 
    1L, 1L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"), 
    R6 = structure(c(2L, 2L, 2L, 1L, 2L, 1L), .Label = c("N", 
    "Y"), class = "factor")), .Names = c("Cs", "R1", "R2", "R3", 
"R4", "R5", "R6"), row.names = c(NA, -6L), class = "data.frame")