R中的多次重复(2次,3次,...)
Multiple duplicates (2 times, 3 times,...) in R
经过一段时间的搜索,我知道这个问题还没有得到解答。假设我有以下向量
v <- c("a", "b", "b", "c","c","c", "d", "d", "d", "d")
如何找到重复次数超过 1 次的值
(应该是"c","c","c", "d", "d", "d", "d")
且重复次数超过 2 次
(应该是"d", "d", "d", "d"
)
函数 duplicated(v)
只有 returns 个值有重复。
您可以生成一个 table()
,然后检查 v
的哪些元素是 table 的相关子集的一部分,例如
R> v <- c("a", "b", "b", "c","c","c", "d", "d", "d", "d")
R> tab <- table(v)
R> tab
v
a b c d
1 2 3 4
R> v[v %in% names(tab[tab > 2])]
[1] "c" "c" "c" "d" "d" "d" "d"
R> v[v %in% names(tab[tab > 3])]
[1] "d" "d" "d" "d"
我会使用 ave
编写一个简单的函数,如下所示:
myFun <- function(vector, thresh) {
ind <- ave(rep(1, length(vector)), vector, FUN = length)
vector[ind > thresh + 1] ## added "+1" to match your terminology
}
此处应用于"v":
myFun(v, 1)
# [1] "c" "c" "c" "d" "d" "d" "d"
myFun(v, 2)
# [1] "d" "d" "d" "d"
当然总有"data.table":
as.data.table(v)[, N := .N, by = v][N > 1 + 1]$v
# [1] "c" "c" "c" "d" "d" "d" "d"
as.data.table(v)[, N := .N, by = v][N > 2 + 1]$v
# [1] "d" "d" "d" "d"
经过一段时间的搜索,我知道这个问题还没有得到解答。假设我有以下向量
v <- c("a", "b", "b", "c","c","c", "d", "d", "d", "d")
如何找到重复次数超过 1 次的值
(应该是"c","c","c", "d", "d", "d", "d")
且重复次数超过 2 次
(应该是"d", "d", "d", "d"
)
函数 duplicated(v)
只有 returns 个值有重复。
您可以生成一个 table()
,然后检查 v
的哪些元素是 table 的相关子集的一部分,例如
R> v <- c("a", "b", "b", "c","c","c", "d", "d", "d", "d")
R> tab <- table(v)
R> tab
v
a b c d
1 2 3 4
R> v[v %in% names(tab[tab > 2])]
[1] "c" "c" "c" "d" "d" "d" "d"
R> v[v %in% names(tab[tab > 3])]
[1] "d" "d" "d" "d"
我会使用 ave
编写一个简单的函数,如下所示:
myFun <- function(vector, thresh) {
ind <- ave(rep(1, length(vector)), vector, FUN = length)
vector[ind > thresh + 1] ## added "+1" to match your terminology
}
此处应用于"v":
myFun(v, 1)
# [1] "c" "c" "c" "d" "d" "d" "d"
myFun(v, 2)
# [1] "d" "d" "d" "d"
当然总有"data.table":
as.data.table(v)[, N := .N, by = v][N > 1 + 1]$v
# [1] "c" "c" "c" "d" "d" "d" "d"
as.data.table(v)[, N := .N, by = v][N > 2 + 1]$v
# [1] "d" "d" "d" "d"