匹配两个不等长的列表
Matching two list of unequal length
我试图匹配 2 个列表中的值,仅当列表之间的变量名称相同时。我希望结果是一个列表,其中包含较长列表的长度,其中包含总匹配数。
jac <- structure(list(s1 = "a", s2 = c("b", "c", "d"), s3 = 5),
.Names = c("s1", "s2", "s3"))
larger <- structure(list(s1 = structure(c(1L, 1L, 1L), .Label = "a", class = "factor"),
s2 = structure(c(2L, 1L, 3L), .Label = c("b", "c", "d"), class = "factor"),
s3 = c(1, 2, 7)), .Names = c("s1", "s2", "s3"), row.names = c(NA, -3L), class = "data.frame")
我正在使用 mapply(FUN = pmatch, jac, larger)
,它给出了正确的总数,但不是我想要的格式:
s1 s2 s3 s1result s2result s3result
a c 1 1 2 NA
a b 2 1 1 NA
a c 7 1 3 NA
但是,我不认为 pmatch 会在所有情况下都确保名称匹配,所以我编写了一个函数,但我仍然遇到问题:
prodMatch <- function(jac,larger){
for(i in 1:nrow(larger)){
if(names(jac)[i] %in% names(larger[i])){
r[i] <- jac %in% larger[i]
r
}
}
}
有人能帮忙吗?
另一个数据集导致一个不是 ohter 的倍数:
larger2 <-
structure(list(s1 = structure(c(1L, 1L, 1L), class = "factor", .Label = "a"),
s2 = structure(c(1L, 1L, 1L), class = "factor", .Label = "c"),
s3 = c(1, 2, 7), s4 = c(8, 9, 10)), .Names = c("s1", "s2",
"s3", "s4"), row.names = c(NA, -3L), class = "data.frame")
mapply
returns 一个匹配索引的列表,你可以简单地使用 as.data.frame
:
将它转换为数据框
as.data.frame(mapply(match, jac, larger))
# s1 s2 s3
# 1 1 2 NA
# 2 1 1 NA
# 3 1 3 NA
和 cbind
larger
的结果符合您的预期:
cbind(larger,
setNames(as.data.frame(mapply(match, jac, larger)),
paste(names(jac), "result", sep = "")))
# s1 s2 s3 s1result s2result s3result
#1 a c 1 1 2 NA
#2 a b 2 1 1 NA
#3 a d 7 1 3 NA
Update:为了处理两个列表的名称不匹配的情况,我们可以同时遍历 larger
和它的名称从 jac
中提取元素如下:
as.data.frame(
mapply(function(col, name) {
m <- match(jac[[name]], col)
if(length(m) == 0) NA else m # if the name doesn't exist in jac return NA as well
}, larger, names(larger)))
# s1 s2 s3
#1 1 2 NA
#2 1 1 NA
#3 1 3 NA
我试图匹配 2 个列表中的值,仅当列表之间的变量名称相同时。我希望结果是一个列表,其中包含较长列表的长度,其中包含总匹配数。
jac <- structure(list(s1 = "a", s2 = c("b", "c", "d"), s3 = 5),
.Names = c("s1", "s2", "s3"))
larger <- structure(list(s1 = structure(c(1L, 1L, 1L), .Label = "a", class = "factor"),
s2 = structure(c(2L, 1L, 3L), .Label = c("b", "c", "d"), class = "factor"),
s3 = c(1, 2, 7)), .Names = c("s1", "s2", "s3"), row.names = c(NA, -3L), class = "data.frame")
我正在使用 mapply(FUN = pmatch, jac, larger)
,它给出了正确的总数,但不是我想要的格式:
s1 s2 s3 s1result s2result s3result
a c 1 1 2 NA
a b 2 1 1 NA
a c 7 1 3 NA
但是,我不认为 pmatch 会在所有情况下都确保名称匹配,所以我编写了一个函数,但我仍然遇到问题:
prodMatch <- function(jac,larger){
for(i in 1:nrow(larger)){
if(names(jac)[i] %in% names(larger[i])){
r[i] <- jac %in% larger[i]
r
}
}
}
有人能帮忙吗?
另一个数据集导致一个不是 ohter 的倍数:
larger2 <-
structure(list(s1 = structure(c(1L, 1L, 1L), class = "factor", .Label = "a"),
s2 = structure(c(1L, 1L, 1L), class = "factor", .Label = "c"),
s3 = c(1, 2, 7), s4 = c(8, 9, 10)), .Names = c("s1", "s2",
"s3", "s4"), row.names = c(NA, -3L), class = "data.frame")
mapply
returns 一个匹配索引的列表,你可以简单地使用 as.data.frame
:
as.data.frame(mapply(match, jac, larger))
# s1 s2 s3
# 1 1 2 NA
# 2 1 1 NA
# 3 1 3 NA
和 cbind
larger
的结果符合您的预期:
cbind(larger,
setNames(as.data.frame(mapply(match, jac, larger)),
paste(names(jac), "result", sep = "")))
# s1 s2 s3 s1result s2result s3result
#1 a c 1 1 2 NA
#2 a b 2 1 1 NA
#3 a d 7 1 3 NA
Update:为了处理两个列表的名称不匹配的情况,我们可以同时遍历 larger
和它的名称从 jac
中提取元素如下:
as.data.frame(
mapply(function(col, name) {
m <- match(jac[[name]], col)
if(length(m) == 0) NA else m # if the name doesn't exist in jac return NA as well
}, larger, names(larger)))
# s1 s2 s3
#1 1 2 NA
#2 1 1 NA
#3 1 3 NA