检查列表中的值是否存在于多列 R data.table
Check if a value among a list exist in multiple column R data.table
问题
我有一个数据集,其中包含许多可以包含相同值的相同类型的变量。我想检查在这些变量中,我们是否可以在列表中找到一个值列表中的一个值。
例子
假设我们有一个包含因子类型 DAS1
、DAS2
、DAS3
的 3 个变量的数据集。
这些变量的可能值是 c("0", "1", "x", "y")
(请注意,我不是要区分数字和字母。将每个值都视为字符)。
library(data.table)
start <- data.table::data.table(DAS1 = c("0","1","x","0","1","0","1"),
DAS2 = c("x","y","0","0","x","1","0"),
DAS3 = c("1","1","y","1","x","y","0"))
我的 objective 是找出哪一行至少包含一个值 "x"
或 "y"
.
的观察值
result <- data.table::data.table(DAS1 = c("0","1","x","0","1","0","1"),
DAS2 = c("x","y","0","0","x","1","0"),
DAS3 = c("1","1","y","1","x","y","0"),
xy = c(T,T,T,F,T,T,F))
条件
我真的很想用 data.table
包而不是 dplyr
来做,因为我主要使用 data.table
,如果不是,我不喜欢在两个包之间切换必要的。
@lovalery 的回答
start[, xy := apply(start[,c("DAS1", "DAS2", "DAS3")],1, function(x) any(x %in% c("x", "y")))][]
你可以这样做:
Reprex
- 代码
library(data.table)
start[, xy := apply(start,1, function(x) any(x == "x" | x == "y"))][]
- 输出
#> DAS1 DAS2 DAS3 xy
#> 1: 0 x 1 TRUE
#> 2: 1 y 1 TRUE
#> 3: x 0 y TRUE
#> 4: 0 0 1 FALSE
#> 5: 1 x x TRUE
#> 6: 0 1 y TRUE
#> 7: 1 0 0 FALSE
由 reprex package (v2.0.1)
于 2022-03-04 创建
你可以试试这个
> start[, xy := rowSums((.SD == "x") + (.SD == "y")) > 0][]
DAS1 DAS2 DAS3 xy
1: 0 x 1 TRUE
2: 1 y 1 TRUE
3: x 0 y TRUE
4: 0 0 1 FALSE
5: 1 x x TRUE
6: 0 1 y TRUE
7: 1 0 0 FALSE
或
> start[, xy := rowSums(Reduce(`+`, Map(`==`, c("x", "y"), list(.SD)))) > 0][]
DAS1 DAS2 DAS3 xy
1: 0 x 1 TRUE
2: 1 y 1 TRUE
3: x 0 y TRUE
4: 0 0 1 FALSE
5: 1 x x TRUE
6: 0 1 y TRUE
7: 1 0 0 FALSE
问题
我有一个数据集,其中包含许多可以包含相同值的相同类型的变量。我想检查在这些变量中,我们是否可以在列表中找到一个值列表中的一个值。
例子
假设我们有一个包含因子类型 DAS1
、DAS2
、DAS3
的 3 个变量的数据集。
这些变量的可能值是 c("0", "1", "x", "y")
(请注意,我不是要区分数字和字母。将每个值都视为字符)。
library(data.table)
start <- data.table::data.table(DAS1 = c("0","1","x","0","1","0","1"),
DAS2 = c("x","y","0","0","x","1","0"),
DAS3 = c("1","1","y","1","x","y","0"))
我的 objective 是找出哪一行至少包含一个值 "x"
或 "y"
.
result <- data.table::data.table(DAS1 = c("0","1","x","0","1","0","1"),
DAS2 = c("x","y","0","0","x","1","0"),
DAS3 = c("1","1","y","1","x","y","0"),
xy = c(T,T,T,F,T,T,F))
条件
我真的很想用 data.table
包而不是 dplyr
来做,因为我主要使用 data.table
,如果不是,我不喜欢在两个包之间切换必要的。
@lovalery 的回答
start[, xy := apply(start[,c("DAS1", "DAS2", "DAS3")],1, function(x) any(x %in% c("x", "y")))][]
你可以这样做:
Reprex
- 代码
library(data.table)
start[, xy := apply(start,1, function(x) any(x == "x" | x == "y"))][]
- 输出
#> DAS1 DAS2 DAS3 xy
#> 1: 0 x 1 TRUE
#> 2: 1 y 1 TRUE
#> 3: x 0 y TRUE
#> 4: 0 0 1 FALSE
#> 5: 1 x x TRUE
#> 6: 0 1 y TRUE
#> 7: 1 0 0 FALSE
由 reprex package (v2.0.1)
于 2022-03-04 创建你可以试试这个
> start[, xy := rowSums((.SD == "x") + (.SD == "y")) > 0][]
DAS1 DAS2 DAS3 xy
1: 0 x 1 TRUE
2: 1 y 1 TRUE
3: x 0 y TRUE
4: 0 0 1 FALSE
5: 1 x x TRUE
6: 0 1 y TRUE
7: 1 0 0 FALSE
或
> start[, xy := rowSums(Reduce(`+`, Map(`==`, c("x", "y"), list(.SD)))) > 0][]
DAS1 DAS2 DAS3 xy
1: 0 x 1 TRUE
2: 1 y 1 TRUE
3: x 0 y TRUE
4: 0 0 1 FALSE
5: 1 x x TRUE
6: 0 1 y TRUE
7: 1 0 0 FALSE