R:在 mapply 的函数内填充 data.frame

R: populate data.frame within function in mapply

A data.frame df1 被查询(模糊匹配)与另一个 data.frame df2agrep。通过迭代其输出(名为 matches 的列表保存 df2 中各个匹配项的行号),df1 填充了来自 df2 的关联值。 目标是传递给 mapply 的函数;然而,在我所有的尝试中 df1 保持不变。

在 for 循环中,代码按预期工作,并使用来自 df2 的附属变量填充 df1。尽管如此,我还是很想知道如何使用传递给 mapply.

的函数来解决这个问题

首先,两个 data.frames:

df1 <- structure(list(Species = c("Alisma plantago-aquatica", "Alnus glutinosa",
                                  "Carex davalliana", "Carex echinata",
                                  "Carex elata"),
                      CheckPoint = c(NA, NA, NA, NA, NA),
                      L = c(NA, NA, NA, NA, NA),
                      R = c(NA, NA, NA, NA, NA),
                      K = c(NA, NA, NA, NA, NA)),
                 row.names = c(NA, 5L), class = "data.frame")

df2 <- structure(list(Species = c("Alisma gramineum", "Alisma lanceolatum",
                                  "Alisma plantago-aquatica", "Alnus glutinosa",
                                  "Alnus incana", "Alnus viridis",
                                  "Carex davalliana", "Carex depauperata",
                                  "Carex diandra", "Carex digitata",
                                  "Carex dioica", "Carex distans",
                                  "Carex disticha", "Carex echinata",
                                  "Carex elata"),
                      L = c(7L, 7L, 7L, 5L, 6L, 7L, 9L, 4L, 8L, 3L, 9L, 9L, 8L,
                            8L, 8L),
                      R = c(7L, 7L, 5L, 5L, 4L, 3L, 4L, 7L, 6L, NA, 4L, 6L, 6L,
                            NA, NA),
                      K = c(6L, 2L, NA, 3L, 5L, 4L, 4L, 2L, 7L, 4L, NA, 3L, NA,
                            3L, 2L)),
                 row.names = seq(1:15), class = "data.frame")

然后,通过Species进行模糊匹配:

matches <- lapply(df1$Species, agrep, x = df2$Species, value = FALSE,
                 max.distance = c(deletions = 0,
                                  insertions = 1,
                                  substitutions = 1))

使用 df2 中的值填充 df1 按预期工作:

for (i in 1:dim(df1)[1]){
  df1[i, 2:5] <- df2[matches[[i]], ]
  }

与我使用 mapply 的方法相反,它确实 return 正确的值,尽管作为一个从未写入 df1 的分解值列表。没有组合(有或没有 return(df1),将其写入另一个变量,也没有绝望地尝试 SIMPLIFYUSE.NAMES 的状态)产生了预期的结果。

populatedf1 <- function(matches, index){
    df1[index, 2:5] <- df2[matches, ]
  #return(df1)
  }

mapply(populatedf1, matches, seq_along(matches), SIMPLIFY = FALSE,
              USE.NAMES = FALSE)

如果有人知道解决方案或能指出我的方向,那就太好了,谢谢! :)

实际上,如果将 lapply 替换为 sapply(这样 returns向量而不是列表),然后直接赋值。

matches <- sapply(df1$Species, agrep, x = df2$Species, value = FALSE,
                   max.distance = c(deletions = 0,
                                    insertions = 1,
                                   substitutions = 1))

df1[, 2:5] <- df2[matches,]
df1

#                   Species               CheckPoint L  R  K
#1 Alisma plantago-aquatica Alisma plantago-aquatica 7  5 NA
#2          Alnus glutinosa          Alnus glutinosa 5  5  3
#3         Carex davalliana         Carex davalliana 9  4  4
#4           Carex echinata           Carex echinata 8 NA  3
#5              Carex elata              Carex elata 8 NA  2

就您的方法而言,您可以将 MapmapplySIMPLIFY = FALSE 一起使用,并使用 do.call 和 [= 将数据帧列表合并到一个数据帧中20=] 然后赋值。

df1[, 2:5] <- do.call(rbind, Map(populatedf1, matches, seq_along(matches)))