如何按数据框或矩阵中的不同行进行子集化？

Question

假设我有以下矩阵：

matrix(c(1,1,2,1,2,3,2,1,3,2,2,1),ncol=3)

结果：

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    3    2
[3,]    2    2    2
[4,]    1    1    1

如何根据每行是否有重复值来 filter/subset 这个矩阵？例如，在这种情况下，我只想保留第 1 行和第 2 行。

如有任何想法，我们将不胜感激！

Answer 1

indx <- apply(m, 1, function(x) !any(duplicated(x)))
m[indx, ]
#     [,1] [,2] [,3]
#[1,]    1    2    3
#[2,]    1    3    2

第二个只是为了好玩。你可以按照逻辑看看它为什么有效。

indx2 <- apply(m, 1, function(x) length(unique(x)) == length(x))
m[indx2,]
#     [,1] [,2] [,3]
#[1,]    1    2    3
#[2,]    1    3    2

Answer 2

这是我的方法，使用 anyDuplicated 函数稍微短一点，应该更快。

mat[!apply(mat, 1, anyDuplicated), ]
[,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    3    2

Answer 3

试试这个：（我怀疑会比任何 apply 方法都快）

 mat[ rowSums(mat == mat[,1])!=ncol(mat) , ]
# ---with your object---
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    3    2

How to subset by distinct rows in a data frame or matrix?