R:在 mapply 的函数内填充 data.frame
R: populate data.frame within function in mapply
A data.frame df1
被查询(模糊匹配)与另一个 data.frame df2
和 agrep
。通过迭代其输出(名为 matches
的列表保存 df2
中各个匹配项的行号),df1
填充了来自 df2
的关联值。
目标是传递给 mapply
的函数;然而,在我所有的尝试中 df1
保持不变。
在 for 循环中,代码按预期工作,并使用来自 df2
的附属变量填充 df1
。尽管如此,我还是很想知道如何使用传递给 mapply
.
的函数来解决这个问题
首先,两个 data.frames:
df1 <- structure(list(Species = c("Alisma plantago-aquatica", "Alnus glutinosa",
"Carex davalliana", "Carex echinata",
"Carex elata"),
CheckPoint = c(NA, NA, NA, NA, NA),
L = c(NA, NA, NA, NA, NA),
R = c(NA, NA, NA, NA, NA),
K = c(NA, NA, NA, NA, NA)),
row.names = c(NA, 5L), class = "data.frame")
df2 <- structure(list(Species = c("Alisma gramineum", "Alisma lanceolatum",
"Alisma plantago-aquatica", "Alnus glutinosa",
"Alnus incana", "Alnus viridis",
"Carex davalliana", "Carex depauperata",
"Carex diandra", "Carex digitata",
"Carex dioica", "Carex distans",
"Carex disticha", "Carex echinata",
"Carex elata"),
L = c(7L, 7L, 7L, 5L, 6L, 7L, 9L, 4L, 8L, 3L, 9L, 9L, 8L,
8L, 8L),
R = c(7L, 7L, 5L, 5L, 4L, 3L, 4L, 7L, 6L, NA, 4L, 6L, 6L,
NA, NA),
K = c(6L, 2L, NA, 3L, 5L, 4L, 4L, 2L, 7L, 4L, NA, 3L, NA,
3L, 2L)),
row.names = seq(1:15), class = "data.frame")
然后,通过Species
进行模糊匹配:
matches <- lapply(df1$Species, agrep, x = df2$Species, value = FALSE,
max.distance = c(deletions = 0,
insertions = 1,
substitutions = 1))
使用 df2
中的值填充 df1
按预期工作:
for (i in 1:dim(df1)[1]){
df1[i, 2:5] <- df2[matches[[i]], ]
}
与我使用 mapply
的方法相反,它确实 return 正确的值,尽管作为一个从未写入 df1
的分解值列表。没有组合(有或没有 return(df1)
,将其写入另一个变量,也没有绝望地尝试 SIMPLIFY
或 USE.NAMES
的状态)产生了预期的结果。
populatedf1 <- function(matches, index){
df1[index, 2:5] <- df2[matches, ]
#return(df1)
}
mapply(populatedf1, matches, seq_along(matches), SIMPLIFY = FALSE,
USE.NAMES = FALSE)
如果有人知道解决方案或能指出我的方向,那就太好了,谢谢! :)
实际上,如果将 lapply
替换为 sapply
(这样 returns向量而不是列表),然后直接赋值。
matches <- sapply(df1$Species, agrep, x = df2$Species, value = FALSE,
max.distance = c(deletions = 0,
insertions = 1,
substitutions = 1))
df1[, 2:5] <- df2[matches,]
df1
# Species CheckPoint L R K
#1 Alisma plantago-aquatica Alisma plantago-aquatica 7 5 NA
#2 Alnus glutinosa Alnus glutinosa 5 5 3
#3 Carex davalliana Carex davalliana 9 4 4
#4 Carex echinata Carex echinata 8 NA 3
#5 Carex elata Carex elata 8 NA 2
就您的方法而言,您可以将 Map
或 mapply
与 SIMPLIFY = FALSE
一起使用,并使用 do.call
和 [= 将数据帧列表合并到一个数据帧中20=] 然后赋值。
df1[, 2:5] <- do.call(rbind, Map(populatedf1, matches, seq_along(matches)))
A data.frame df1
被查询(模糊匹配)与另一个 data.frame df2
和 agrep
。通过迭代其输出(名为 matches
的列表保存 df2
中各个匹配项的行号),df1
填充了来自 df2
的关联值。
目标是传递给 mapply
的函数;然而,在我所有的尝试中 df1
保持不变。
在 for 循环中,代码按预期工作,并使用来自 df2
的附属变量填充 df1
。尽管如此,我还是很想知道如何使用传递给 mapply
.
首先,两个 data.frames:
df1 <- structure(list(Species = c("Alisma plantago-aquatica", "Alnus glutinosa",
"Carex davalliana", "Carex echinata",
"Carex elata"),
CheckPoint = c(NA, NA, NA, NA, NA),
L = c(NA, NA, NA, NA, NA),
R = c(NA, NA, NA, NA, NA),
K = c(NA, NA, NA, NA, NA)),
row.names = c(NA, 5L), class = "data.frame")
df2 <- structure(list(Species = c("Alisma gramineum", "Alisma lanceolatum",
"Alisma plantago-aquatica", "Alnus glutinosa",
"Alnus incana", "Alnus viridis",
"Carex davalliana", "Carex depauperata",
"Carex diandra", "Carex digitata",
"Carex dioica", "Carex distans",
"Carex disticha", "Carex echinata",
"Carex elata"),
L = c(7L, 7L, 7L, 5L, 6L, 7L, 9L, 4L, 8L, 3L, 9L, 9L, 8L,
8L, 8L),
R = c(7L, 7L, 5L, 5L, 4L, 3L, 4L, 7L, 6L, NA, 4L, 6L, 6L,
NA, NA),
K = c(6L, 2L, NA, 3L, 5L, 4L, 4L, 2L, 7L, 4L, NA, 3L, NA,
3L, 2L)),
row.names = seq(1:15), class = "data.frame")
然后,通过Species
进行模糊匹配:
matches <- lapply(df1$Species, agrep, x = df2$Species, value = FALSE,
max.distance = c(deletions = 0,
insertions = 1,
substitutions = 1))
使用 df2
中的值填充 df1
按预期工作:
for (i in 1:dim(df1)[1]){
df1[i, 2:5] <- df2[matches[[i]], ]
}
与我使用 mapply
的方法相反,它确实 return 正确的值,尽管作为一个从未写入 df1
的分解值列表。没有组合(有或没有 return(df1)
,将其写入另一个变量,也没有绝望地尝试 SIMPLIFY
或 USE.NAMES
的状态)产生了预期的结果。
populatedf1 <- function(matches, index){
df1[index, 2:5] <- df2[matches, ]
#return(df1)
}
mapply(populatedf1, matches, seq_along(matches), SIMPLIFY = FALSE,
USE.NAMES = FALSE)
如果有人知道解决方案或能指出我的方向,那就太好了,谢谢! :)
实际上,如果将 lapply
替换为 sapply
(这样 returns向量而不是列表),然后直接赋值。
matches <- sapply(df1$Species, agrep, x = df2$Species, value = FALSE,
max.distance = c(deletions = 0,
insertions = 1,
substitutions = 1))
df1[, 2:5] <- df2[matches,]
df1
# Species CheckPoint L R K
#1 Alisma plantago-aquatica Alisma plantago-aquatica 7 5 NA
#2 Alnus glutinosa Alnus glutinosa 5 5 3
#3 Carex davalliana Carex davalliana 9 4 4
#4 Carex echinata Carex echinata 8 NA 3
#5 Carex elata Carex elata 8 NA 2
就您的方法而言,您可以将 Map
或 mapply
与 SIMPLIFY = FALSE
一起使用,并使用 do.call
和 [= 将数据帧列表合并到一个数据帧中20=] 然后赋值。
df1[, 2:5] <- do.call(rbind, Map(populatedf1, matches, seq_along(matches)))