根据某些约束删除多行
Delete multiple rows based on some constrains
我正在使用 R 并尝试根据某些约束从数据框中删除一些行。所以,如果我得到
dat <- data.frame(Cs=c("c1","c2","c3","c4","c5","c6"),
R1=sample(c("Y","N"),6,replace=TRUE), R2=sample(c("Y","N"),6,replace=TRUE),
R3=sample(c("Y","N"),6,replace=TRUE), R4=sample(c("Y","N"),6,replace=TRUE),
R5=sample(c("Y","N"),6,replace=TRUE), R6=sample(c("Y","N"),6,replace=TRUE))
我想删除在某些给定列(例如 R1、R3、R4)中具有 "N" 的所有行。对于一个列,我找到了这个解决方案:delete row for certain constrains
d <- dat[dat[,"R1"]!="N",]
效果很好。但是如果我把多列作为
d <- dat[dat[,c("R1","R3","R4")]!="N",]
我得到了很多充满 NA 的额外行。那我哪里错了?
您可以使用
dat[rowSums(dat[, c("R1","R3","R4")] == "N") == 0, , drop=FALSE]
# Cs R1 R2 R3 R4 R5 R6
#5 c5 Y Y Y Y Y Y
或者,如果您不喜欢过多的打字:
dat[!rowSums(dat[c('R1','R3','R4')]=='N'),]
这会首先测试你数据的"R1"、"R3"和"R4"列的每个"cell"是否等于"N",然后计算每行 TRUE 值的总和。如果一行中没有 "N",则总和为 0 并保留。我添加了 drop=FALSE
以将结构保持为 data.frame
.
OP 评论后的注释:
如果您在未指定 drop=TRUE
选项的情况下仅对 data.frame
的 1 列进行子集化,则 [.data.frame
的默认行为是强制生成 1 列 -data.frame 到一个原子向量。然后,rowSums
将不会对该生成的向量起作用。为避免这种情况,请将您的代码更改为:
dat[!rowSums(dat[,'R1', drop=FALSE]=='N'), ]
示例数据:
set.seed(5)
dat <- data.frame(Cs=c("c1","c2","c3","c4","c5","c6"),
R1=sample(c("Y","N"),6,replace=TRUE), R2=sample(c("Y","N"),6,replace=TRUE),
R3=sample(c("Y","N"),6,replace=TRUE), R4=sample(c("Y","N"),6,replace=TRUE),
R5=sample(c("Y","N"),6,replace=TRUE), R6=sample(c("Y","N"),6,replace=TRUE))
您可以为每一行创建一个由布尔值组成的 'keep' 变量:
keep <- apply(dat[,c("R1","R3","R4")],
MARGIN=1,
FUN=function(x){all(x!='N')})
res <- dat[keep,]
> res
Cs R1 R2 R3 R4 R5 R6
1 c1 Y Y Y Y Y Y
数据:
使用的种子:1234
dat <- structure(list(Cs = structure(1:6, .Label = c("c1", "c2", "c3",
"c4", "c5", "c6"), class = "factor"), R1 = structure(c(2L, 1L,
1L, 1L, 1L, 1L), .Label = c("N", "Y"), class = "factor"), R2 = structure(c(2L,
2L, 1L, 1L, 1L, 1L), .Label = c("N", "Y"), class = "factor"),
R3 = structure(c(2L, 1L, 2L, 1L, 2L, 2L), .Label = c("N",
"Y"), class = "factor"), R4 = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = "Y", class = "factor"), R5 = structure(c(2L,
1L, 1L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"),
R6 = structure(c(2L, 2L, 2L, 1L, 2L, 1L), .Label = c("N",
"Y"), class = "factor")), .Names = c("Cs", "R1", "R2", "R3",
"R4", "R5", "R6"), row.names = c(NA, -6L), class = "data.frame")
我正在使用 R 并尝试根据某些约束从数据框中删除一些行。所以,如果我得到
dat <- data.frame(Cs=c("c1","c2","c3","c4","c5","c6"),
R1=sample(c("Y","N"),6,replace=TRUE), R2=sample(c("Y","N"),6,replace=TRUE),
R3=sample(c("Y","N"),6,replace=TRUE), R4=sample(c("Y","N"),6,replace=TRUE),
R5=sample(c("Y","N"),6,replace=TRUE), R6=sample(c("Y","N"),6,replace=TRUE))
我想删除在某些给定列(例如 R1、R3、R4)中具有 "N" 的所有行。对于一个列,我找到了这个解决方案:delete row for certain constrains
d <- dat[dat[,"R1"]!="N",]
效果很好。但是如果我把多列作为
d <- dat[dat[,c("R1","R3","R4")]!="N",]
我得到了很多充满 NA 的额外行。那我哪里错了?
您可以使用
dat[rowSums(dat[, c("R1","R3","R4")] == "N") == 0, , drop=FALSE]
# Cs R1 R2 R3 R4 R5 R6
#5 c5 Y Y Y Y Y Y
或者,如果您不喜欢过多的打字:
dat[!rowSums(dat[c('R1','R3','R4')]=='N'),]
这会首先测试你数据的"R1"、"R3"和"R4"列的每个"cell"是否等于"N",然后计算每行 TRUE 值的总和。如果一行中没有 "N",则总和为 0 并保留。我添加了 drop=FALSE
以将结构保持为 data.frame
.
OP 评论后的注释:
如果您在未指定 drop=TRUE
选项的情况下仅对 data.frame
的 1 列进行子集化,则 [.data.frame
的默认行为是强制生成 1 列 -data.frame 到一个原子向量。然后,rowSums
将不会对该生成的向量起作用。为避免这种情况,请将您的代码更改为:
dat[!rowSums(dat[,'R1', drop=FALSE]=='N'), ]
示例数据:
set.seed(5)
dat <- data.frame(Cs=c("c1","c2","c3","c4","c5","c6"),
R1=sample(c("Y","N"),6,replace=TRUE), R2=sample(c("Y","N"),6,replace=TRUE),
R3=sample(c("Y","N"),6,replace=TRUE), R4=sample(c("Y","N"),6,replace=TRUE),
R5=sample(c("Y","N"),6,replace=TRUE), R6=sample(c("Y","N"),6,replace=TRUE))
您可以为每一行创建一个由布尔值组成的 'keep' 变量:
keep <- apply(dat[,c("R1","R3","R4")],
MARGIN=1,
FUN=function(x){all(x!='N')})
res <- dat[keep,]
> res
Cs R1 R2 R3 R4 R5 R6
1 c1 Y Y Y Y Y Y
数据: 使用的种子:1234
dat <- structure(list(Cs = structure(1:6, .Label = c("c1", "c2", "c3",
"c4", "c5", "c6"), class = "factor"), R1 = structure(c(2L, 1L,
1L, 1L, 1L, 1L), .Label = c("N", "Y"), class = "factor"), R2 = structure(c(2L,
2L, 1L, 1L, 1L, 1L), .Label = c("N", "Y"), class = "factor"),
R3 = structure(c(2L, 1L, 2L, 1L, 2L, 2L), .Label = c("N",
"Y"), class = "factor"), R4 = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = "Y", class = "factor"), R5 = structure(c(2L,
1L, 1L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"),
R6 = structure(c(2L, 2L, 2L, 1L, 2L, 1L), .Label = c("N",
"Y"), class = "factor")), .Names = c("Cs", "R1", "R2", "R3",
"R4", "R5", "R6"), row.names = c(NA, -6L), class = "data.frame")