过滤至少有两个特定值的行

Filter rows which has at least two of particular values

我有一个这样的数据框。

df
     Languages          Order   Machine    Company
[1]    W,X,Y,Z,H,I       D         D          B
[2]    W,X               B         A          G
[3]    W,I               E         B          A
[4]    H,I               B         C          B
[5]    W                 G         G          C

我想获取语言在 W、H、I 中的 3 个值中有 2 个的行数。

结果应该是:3,因为第1行、第3行和第4行至少包含W、H、I这3个值中的2个值

您可以在 df$Languages 上使用 strsplit,并在 intersect 上使用 W、H、I。然后得到这个结果的 lengths 并使用 which 得到那些超过 1 >1.

sum(lengths(sapply(strsplit(df$Languages, ",", TRUE), intersect, c("W","H","I"))) > 1)
#[1] 3

您可以使用:

sum(sapply(strsplit(df$Languages, ','), function(x) 
           sum(c("W","H","I") %in% x) >= 2))
#[1] 3

数据

df<- structure(list(Languages = c("W,X,Y,Z,H,I", "W,X", "W,I", "H,I", 
"W"), Order = c("D", "B", "E", "B", "G"), Machine = c("D", "A", 
"B", "C", "G"), Company = c("B", "G", "A", "B", "C")), 
class = "data.frame", row.names = c(NA, -5L))

一种 tidyverse 方法

df %>% filter(map_int(str_split(Languages, ','), ~ sum(.x %in% c('W', 'H', 'I'))) >= 2)

    Languages Order Machine Company
1 W,X,Y,Z,H,I     D       D       B
2         W,I     E       B       A
3         H,I     B       C       B